How to detect the presence of sound/audio
Problem
You want to detect the presence of sound/audio on Linux using a USB webcam/microphone or the built-in microphone. (In my case, I have a Logitech QuickCam Pro 9000.) One sample application is to start recording as soon as there is sound. Another application is to start playing a lullaby when a baby cries. (Or detect the workings of a sump pump.)Keywords
Linux, sound, audio, detect, threshold, webcam, USB, microphone, USB microphone, Logitech, arecord, plughw, sox, S16_LE.Solution
If you're better at python than I am, take a look at a solution (that didn't work for me) that uses PyAudio.
My solution below is, no doubt, very geared towards the Logitech Quickcam Pro 9000 which produces a S16_LE-formatted audio stream.
Inspired by a hint I found on the linux-uvc-devel mailing list, run cat /proc/asound/cards and take note of the number before the device you want to use as microphone. In my case, it looks like:
$ cat /proc/asound/cards 0 [Intel ]: HDA-Intel - HDA Intel HDA Intel at 0xfdff4000 irq 22 1 [Q9000 ]: USB-Audio - QuickCam Pro 9000 Logitech, Inc. QuickCam Pro 9000 at usb-0000:00:1a.7-3.3, high speedThe number, therefore, is 1. Use this number to record, say, 5 seconds of audio using arecord:
$ /usr/bin/arecord -D plughw:1,0 -d 5 -f S16_LE > sample.wav
The "1" in "plughw:1,0" is the device number as reported by /proc/asound/cards. The -d argument determines the duration of the recording—5 seconds in this case. -f specifies the format—S16_LE in my case.
The generated output, sample.wav should be a 5-second recording. Now run this file through sox to give you statistics:
$ /usr/bin/sox -t .wav sample.wav -n stat Samples read: 40000 Length (seconds): 5.000000 Scaled by: 2147483647.0 Maximum amplitude: 0.037506 Minimum amplitude: -0.056946 Midline amplitude: -0.009720 Mean norm: 0.007250 Mean amplitude: -0.002037 RMS amplitude: 0.009226 Maximum delta: 0.034729 Minimum delta: 0.000000 Mean delta: 0.004215 RMS delta: 0.005596 Rough frequency: 772 Volume adjustment: 17.561
The "Maximum amplitude" line tells you the loudest point in the sample. Based on that number you can decide whether there was any sound and do something.
We can put all this together in this ruby script that takes 5-second samples and triggers on a maximum amplitude greater than 0.15. Note that this script will likely not do anything for you unless you replace the MICROPHONE line with a string that matches the name of the microphone you wish to use, as reported by /proc/asound/cards. Also, to make it do anything interesting, you'll have to replace the system call.