Save the Whales!

Here's the deal: The "right whale" is threatened and is about to become extinct. The most common threat to the right whale is ships because they can't see the whales, so the ship simply run over the whales (See images of the damage here: M24 Digital). To avoid this problem, the Right Whale Listening Network (RWLN) has been established, and they have installed listening devices outside of Boston. These listening devices have recorded the sounds of the whales, and the RWLN have now asked Kaggle to help them determine if the sound is made by a right whale - or not. If they can detect the whales, the RWLN can alert the ships so the ships can slow down and hopefully avoid the nearby whale.

The goal of the Kaggle competition is to train a model with 30,000 whale sounds, and determine if 50,000 sounds are from a whale - or not. The sound files are in a aiff-format, so it's easy to read them with the following Python code:

import aifc
s =, 'r')
nframes = s.getnframes() #The total number of audio frames in the file
strsig = s.readframes(nframes) #Returned data is a string containing for each frame the uncompressed samples of all channels
y = numpy.fromstring(strsig, numpy.short).byteswap() 

We now have a list with all the amplitudes of each sound such as [74  200  -22 ..., -204  107  -11], each is 4000 long. If we plot them, we will find the following:
A right whale with noise
Noise only

We can't make anything from the above, so we need to make a spectrogram:
Background noise only
Right whale with background noise

From the last picture, we can clearly see the "Up-call" which is a common sound made by a right whale, so we need to detect those. To begin, we can plot the average of all of the sounds depending on if a whale made the sound - or not:
Average spectrogram if a right whale made the sound
Average of background noise only

So, to solve this problem, I made a series of experiments:
  1. Row average. Take the average of the row at each frequency in the Spectrogram, and train a Random Forest. I tested several parameters and the result was about 0.92, where 1 is 100 percent correct
  2. Recommendation algorithm. Use cosine distance of the average if we have a whale and a sample whale sound, like a common recommendation algorithm. The result was about 0.87
  3. Modification algorithms. Use several modification algorithms to change the amplitudes at each frequency. The first one I tested was Kakre Transform, which gave the result 0.89, and the second one was MFFC which was to slow to finish
  4. Zoom in. Zoom in on the spectrogram, we can see from the average plot that the action happens in the center, and then train a Random Forest with those magnitudes. A report by the RWLN says that the average sound is between 90 and 150 Hz, and the duration is 0.7 seconds, so we zoom in to around that area. The result here was 0.94943. You could probably make a similar Random Forest without zooming in, but in that case you will end up with an array with the size 30,000*4,000, and that will produce nasty memory errors on my poor laptop.
    • Smoothing by taking the average. The first time I tested to zoom in, I took the average of each 4 points to save some time, and the result was better compared with if I didn't take the average. So I realized that smoothing out the spectrogram produced a better result. I tested some different smoothing settings, and the best one was taking the average of 9 points in the zoomed spectrogram. The result was 0.95769 and it gave me a final position of 65 out of 249 participants in the competition.   
  5. Image recognition. You can treat the spectrogram as an image and get the eigenfaces of the image. The result of that experiment was 0.91
  6. Combinations. Sometimes you can obtain a better result if you combine several models into one, but in this case the combinations didn't get a better result compared with the best individual model. 
Zoomed in average of a right whale call

Update: The code is available at github:


  1. how do you calculate the average spectrogram? can you provide some example code?

    1. It was a while ago, but if I recall correctly I just did as you always do when you calculate an average, such as: (4+6)/2. But I've uploaded the code to github and I believe that what you are looking for is in ""

  2. Hey,

    Just curious. Did you participate in the competition. And what ranking were you able to obtain with these complicated techniques.


Post a Comment