SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context

Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

Publication date: November 2, 2020

Mark Cartwright, Aurora Cramer, Ana Elisa Méndez Méndez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello

We present SONYC-UST-V2, a dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings from the “Sounds of New York City” (SONYC) acoustic sensor network, including the timestamp of audio acquisition (at the hour scale) and location of the sensor (at the urban block level). The dataset contains annotations by volunteers from the Zooniverse citizen science platform, as well as a two-stage verification with our team. In this article, we describe our data collection procedure and propose evaluation metrics for multilabel classification of urban sound tags. We report the results of a simple baseline model that exploits temporal information.

Learn More

Publications

SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context

Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)

Publication date: November 2, 2020

Mark Cartwright, Aurora Cramer, Ana Elisa Méndez Méndez, Yu Wang, Ho-Hsiang Wu, Vincent Lostanlen, Magdalena Fuentes, Graham Dove, Charlie Mydlarz, Justin Salamon, Oded Nov, Juan Pablo Bello

Research Areas: AI & Machine Learning Audio