How to Build An Audio Machine Learning Dataset
We use machine learning for lots of things at Phonic. Speech recognition, sentiment analysis and emotional classification are all problems that are best solved by supervised ML systems. Generally speaking, theses systems require training on large datasets - the bigger the better. While there are many freely available datasets, the most interesting and unique problems require novel data.
At Phonic, we use our own survey platform to build custom datasets. This is how we do it, and how you can too.
1. Create a Survey With Voice Questions
For this example we'll be generated a wake word dataset. Wake words are special words or phrases used in many speech recognition systems. "Alexa", "OK Google" and "Hey Siri" are all examples of wake words.
Here we are going to add five audio questions, asking participants to speak our wake word multiple times.
Phonic allows responses to be recorded in either MP3 or WAV. WAV is a higher fidelity, lossless encoding and is typically preferred in machine learning applications.
We can click "Preview" to inspect our survey draft and ensure everything looks normal.
2. Deploy The Survey Live And Collect Responses
This is the fun part - actually collecting responses. You can share the survey link with friends, family and colleagues to get as large a variety of responses as possible. At Phonic we often use Amazon Mechanical Turk to build datasets with thousands of incredibly diverse voices. These responses can all be individually played from the Phonic dashboard.
3. Download Responses For Training
In order to use this data in a training pipeline, we have to export it off the Phonic platform. This is easily done by clicking the "Download Audio" button in the question view. The WAV files can be downloaded in bulk into one .zip file.
If you have any questions about using Phonic to build custom audio datasets, don't hesitate to contact us.