How to get more data than what you already have by using data you already have

I've read an article about Andrew Ng called Inside The Mind That Built Google Brain: On Life, Creativity, And Failure. He has earlier worked for Google and is now working for Baidu. At Baidu he's working with speech recognition. To make speech recognition work, you will need a lot of data. One clever way he found to get more data by using data he already had is:
Then one of the things we did was, if we have an audio clip of you saying something, we would take that audio clip of you and add background noise to it, like a clip recorded in a cafe. So we synthesize an audio clip of what you would sound like if you were speaking in a cafe. By synthesizing your voice against lots of backgrounds, we just multiply the amount of data that we have. We use tactics like that to create more data to feed to our machines, to feed to our rocket engines.
The rocket engine he is referring to is the speech recognition algorithm, where data is the fuel that powers the rocket.

Comments