In the last newsletter, we discussed Google's New Speech Commands Dataset, which contains 65k utterances of 30 short words. AudioSet is New Speech Command's big brother: It consists of an expanding ontology of 632 audio event classes and a collection of 2,084,320 human-labeled 10-second sound clips drawn from YouTube videos. It covers a wide range of human and animal sounds, musical instruments and genres, and common everyday environmental sounds.


