top image
home  /  pages  /  tech tips  /  contact about

How to detect the presence of sound/audio

Problem

You want to detect the presence of sound/audio on Linux using a USB webcam/microphone or the built-in microphone. (In my case, I have a Logitech QuickCam Pro 9000.) One sample application is to start recording as soon as there is sound. Another application is to start playing a lullaby when a baby cries. (Or detect the workings of a sump pump.)

Keywords

Linux, sound, audio, detect, threshold, webcam, USB, microphone, USB microphone, Logitech, arecord, plughw, sox, S16_LE.

Solution

If you're better at python than I am, take a look at a solution (that didn't work for me) that uses PyAudio.

My solution below is, no doubt, very geared towards the Logitech Quickcam Pro 9000 which produces a S16_LE-formatted audio stream.

Inspired by a hint I found on the linux-uvc-devel mailing list, run cat /proc/asound/cards and take note of the number before the device you want to use as microphone. In my case, it looks like:

$ cat /proc/asound/cards
 0 [Intel          ]: HDA-Intel - HDA Intel
                      HDA Intel at 0xfdff4000 irq 22
 1 [Q9000          ]: USB-Audio - QuickCam Pro 9000
                      Logitech, Inc. QuickCam Pro 9000 at usb-0000:00:1a.7-3.3, high speed
The number, therefore, is 1. Use this number to record, say, 5 seconds of audio using arecord:
$ /usr/bin/arecord -D plughw:1,0 -d 5 -f S16_LE > sample.wav

The "1" in "plughw:1,0" is the device number as reported by /proc/asound/cards. The -d argument determines the duration of the recording—5 seconds in this case. -f specifies the format—S16_LE in my case.

The generated output, sample.wav should be a 5-second recording. Now run this file through sox to give you statistics:


$ /usr/bin/sox -t .wav sample.wav -n stat
Samples read:             40000
Length (seconds):      5.000000
Scaled by:         2147483647.0
Maximum amplitude:     0.037506
Minimum amplitude:    -0.056946
Midline amplitude:    -0.009720
Mean    norm:          0.007250
Mean    amplitude:    -0.002037
RMS     amplitude:     0.009226
Maximum delta:         0.034729
Minimum delta:         0.000000
Mean    delta:         0.004215
RMS     delta:         0.005596
Rough   frequency:          772
Volume adjustment:       17.561

The "Maximum amplitude" line tells you the loudest point in the sample. Based on that number you can decide whether there was any sound and do something.

We can put all this together in this ruby script that takes 5-second samples and triggers on a maximum amplitude greater than 0.15. Note that this script will likely not do anything for you unless you replace the MICROPHONE line with a string that matches the name of the microphone you wish to use, as reported by /proc/asound/cards. Also, to make it do anything interesting, you'll have to replace the system call.

URL: https://thomer.com/howtos/detect_sound.html
Copyright © 1994-2022 by Thomer M. Gil
Updated: 2014/09/12