How to Build An Audio Machine Learning Dataset

We use machine learning for lots of things at Phonic. Speech recognition, sentiment analysis and emotional classification are all problems that are best solved by supervised ML systems. Generally speaking, theses systems require training on large datasets - the bigger the better. While there are many freely available datasets, the most interesting and unique problems require novel data.

At Phonic, we use our own survey platform to build custom datasets. This is how we do it, and how you can too.

1. Create a Survey With Voice Questions

For this example we'll be generated a wake word dataset. Wake words are special words or phrases used in many speech recognition systems. "Alexa", "OK Google" and "Hey Siri" are all examples of wake words.

Here we are going to add five audio questions, asking participants to speak our wake word multiple times.

question-prompts

Phonic allows responses to be recorded in either MP3 or WAV. WAV is a higher fidelity, lossless encoding and is typically preferred in machine learning applications.

wav-recording-setting

We can click "Preview" to inspect our survey draft and ensure everything looks normal.

Screen Shot 2020-10-06 at 10.27.44 PM.png

2. Deploy The Survey Live And Collect Responses

This is the fun part - actually collecting responses. You can share the survey link with friends, family and colleagues to get as large a variety of responses as possible. At Phonic we often use Amazon Mechanical Turk to build datasets with thousands of incredibly diverse voices. These responses can all be individually played from the Phonic dashboard.

Screen Shot 2020-10-06 at 10.30.56 PM.png

3. Download Responses For Training

In order to use this data in a training pipeline, we have to export it off the Phonic platform. This is easily done by clicking the "Download Audio" button in the question view. The WAV files can be downloaded in bulk into one .zip file.

Screen Shot 2020-10-06 at 10.38.30 PM.png

If you have any questions about using Phonic to build custom audio datasets, don't hesitate to contact us.