April 17, 2013

3 books you should read to become a better writer

I'm currently writing a book on the entrepreneur Elon Musk, so I thought that it would be a good idea to improve my writing skills by reading some books on the subject.

The first book I read was On Writing Well by William Zinsser. It's considered to be a classic on the subject, and will tell you how to write fact - not fiction. The book begins with general thoughts to encourage you on your writing endeavors, continues with writing rules, and ends with how to write about certain topics like Science, Business, or Sports.
Key findings from the book is that you should replace the word one with you (one is a boring guy), and you should not overstate the facts. Don't write that someone shuts a door with the sound of an atomic bomb.    

The next book I read was On Writing by Stephen King. This is not only a book about how to write fiction, it's also a shorter biography on Stephen King, ranging from when he grew up, and ending with events how writing helped him to recover from a traffic accident. 
One of the key findings from the book is that you should avoid large amounts of alcohol and drugs. Alcohol is allowed, you don't need to be sober, but you don't need to be drunk when you write. Stephen King has experience from writing books when drunk and doing drugs, and he concluded that, 
"Substance-abusing writers are just substance abusers. Any claims that the drugs and alcohol are necessary to dull a finer sensibility are just the usual self-serving bullshit."

The last book I read was The Elements of Style by William Strunk and E.B White. It was recommended by Stephen King in the book On Writing, and The Elements of Style covers the rules you should you when writing.

To sum it up, here are some key findings from two or three of the books:
  • Less is more. The general rule is that you should embrace simplicity. The secret of good writing is to strip every sentence to its cleanest components, and remove every word that serves no function, every word that could be a short word, every adverb that carries the same meaning that's already in the verb. Stephen King's rule is that the second draft of his book should be ten percent shorter compared with the first draft.
  • You don't have to follow the rules. The basic rule is that you should write what you think is good writing in an environment you like to work in. Some people write by day, others by night. Some people need silence, others turn on the radio. Stephen King listened to Metallica while writing alone in his home. Don't try to please everyone, someone will always complain, or as Stephen King said, "You can't please all of the readers all of the time, but you really ought to try to please at least some of the readers some of the time."
  • Practice. Good writers know that very few sentences come out right the first time, or even the third or the fifth time. Good writing doesn't come naturally, though most people seem to think it does. Stephen King thought that if you want to be a writer, you must do two things above all others: read a lot and write a lot, and he read about 70-80 books a year, including audio-books.

My biography book on Elon Musk is now finished and you can find the final result, The Engineer - Follow Elon Musk on a journey from South Africa to Mars, at the link below. It got good reviews so these books probably helped a bit.

More articles in the same series: Best technical and creative writing resources

April 7, 2013

Save the Whales!

Here's the deal: The "right whale" is threatened and is about to become extinct. The most common threat to the right whale is ships because they can't see the whales, so the ship simply run over the whales (See images of the damage here: M24 Digital). To avoid this problem, the Right Whale Listening Network (RWLN) has been established, and they have installed listening devices outside of Boston. These listening devices have recorded the sounds of the whales, and the RWLN have now asked Kaggle to help them determine if the sound is made by a right whale - or not. If they can detect the whales, the RWLN can alert the ships so the ships can slow down and hopefully avoid the nearby whale.

The goal of the Kaggle competition is to train a model with 30,000 whale sounds, and determine if 50,000 sounds are from a whale - or not. The sound files are in a aiff-format, so it's easy to read them with the following Python code:

import aifc
s = aifc.open(total_path, 'r')
nframes = s.getnframes() #The total number of audio frames in the file
strsig = s.readframes(nframes) #Returned data is a string containing for each frame the uncompressed samples of all channels
y = numpy.fromstring(strsig, numpy.short).byteswap() 

We now have a list with all the amplitudes of each sound such as [74  200  -22 ..., -204  107  -11], each is 4000 long. If we plot them, we will find the following:
A right whale with noise
Noise only

We can't make anything from the above, so we need to make a spectrogram:
Background noise only
Right whale with background noise

From the last picture, we can clearly see the "Up-call" which is a common sound made by a right whale, so we need to detect those. To begin, we can plot the average of all of the sounds depending on if a whale made the sound - or not:
Average spectrogram if a right whale made the sound
Average of background noise only

So, to solve this problem, I made a series of experiments:
  1. Row average. Take the average of the row at each frequency in the Spectrogram, and train a Random Forest. I tested several parameters and the result was about 0.92, where 1 is 100 percent correct
  2. Recommendation algorithm. Use cosine distance of the average if we have a whale and a sample whale sound, like a common recommendation algorithm. The result was about 0.87
  3. Modification algorithms. Use several modification algorithms to change the amplitudes at each frequency. The first one I tested was Kakre Transform, which gave the result 0.89, and the second one was MFFC which was to slow to finish
  4. Zoom in. Zoom in on the spectrogram, we can see from the average plot that the action happens in the center, and then train a Random Forest with those magnitudes. A report by the RWLN says that the average sound is between 90 and 150 Hz, and the duration is 0.7 seconds, so we zoom in to around that area. The result here was 0.94943. You could probably make a similar Random Forest without zooming in, but in that case you will end up with an array with the size 30,000*4,000, and that will produce nasty memory errors on my poor laptop.
    • Smoothing by taking the average. The first time I tested to zoom in, I took the average of each 4 points to save some time, and the result was better compared with if I didn't take the average. So I realized that smoothing out the spectrogram produced a better result. I tested some different smoothing settings, and the best one was taking the average of 9 points in the zoomed spectrogram. The result was 0.95769 and it gave me a final position of 65 out of 249 participants in the competition.   
  5. Image recognition. You can treat the spectrogram as an image and get the eigenfaces of the image. The result of that experiment was 0.91
  6. Combinations. Sometimes you can obtain a better result if you combine several models into one, but in this case the combinations didn't get a better result compared with the best individual model. 
Zoomed in average of a right whale call

Update: The code is available at github: github.com/Trejdify/Kaggle-competitions/tree/master/SaveTheWhales