Mark Cartwright, Bryan Pardo


A natural way of communicating an audio concept is to imitate it with one’s voice. This creates an approximation of the imagined sound (e.g. a particular owl’s hoot), much like how a visual sketch approximates a visual concept (e.g a drawing of the owl). If a machine could understand vocal imitations, users could communicate with software in this natural way, enabling new interactions (e.g. programming a music synthesizer by imitating the desired sound with one’s voice). VocalSketch is a project in which we collected thousands of crowd-sourced vocal imitations of a large set of diverse sounds, along with data on the crowd’s ability to correctly label these vocal imitations. This dataset will help the research community understand which audio concepts can be effectively communicated with this approach.  

Related Papers

[pdf] Cartwright, M., Pardo, B. VocalSketch: Vocally Imitating Audio Concepts. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 2015. *Honorable Mention*


VocalSketch dataset