For a few months now I have been rattling on about sonification. So this is an explanation of how it works, with drawings.
For this example I am going to sonify a pub table, because it was where I was sat when I apparently first explained it properly to someone.
Here is a table. A coin is rolling across it. I want to sonify this. I don’t want to listen to the sound it makes rolling along the table, because that doesn’t tell me much about it. I want to make a new sound that tells me all about the coin’s motion. A sound that contains more information that just listening to the coin or just watching it can tell me. So I measure the coins motion:
I’ve just chosen speed, number of turns and end position here, but I could also have used lots of other variables such as type of coin (material), weight, value and so on.
On the table there is also an ashtray, with a half-smoked fag in it. Because this is also on the table and I want to sonify the table, I do the same sort of measurements on the cigarette.
And then I notice an odd-looking insect, so that gets its data recorded too.
So now I have three sets of three numbers. I could go on. I could also record the leaf blowing over the table and the elbows resting on it and the pint of beer gently bubbling on it. But I’m going to stop for now because I want to explain how the numbers become something you can hear.
To “hear” the data we can map physical properties (The Data) to audible properties (The Sound) in pretty much any way we choose. For a physicist, an obvious way to do this might be to map speed to pitch. I think this is obvious for a physicist because both of these things are measured “per second” (pitch or frequency is measured in Hertz, which means vibrations per second). But we don’t have to do the obvious, we can map any physical property to any audible property.
In this example I’m going to map speed to the pitch of the note, length/postion to the duration of the note and number of turns/legs/puffs to the loudness of the note.
Now I have to choose starting positions and ranges. When I do this I have to consider that:
I want the sound to be audible, which limits the range of pitch to something like 20 – 000 Hz for humans, but I’ll play safe and keep it between 100 Hz and 1000 Hz for now. Very high-pitched sounds aren’t very pleasant after all. I’m going to limit the duration range to between 0.1 and 10 seconds, because it seems reasonable that we would be able to hear 10 different notes per second. (In fact, humans can distinguish about 50 notes per second. Here is a nice article on hearing if you are interested.)
I’m going to limit the loudness range to between 10 dB and 80 dB, but I notice that the number of puffs and turns are small numbers and the number of legs is large. There are a number of ways I could deal with this. I could just say that N=3 corresponds to 10 dB and then when N increases by 10, loudness increases by 2 dB. This would give me a 60 dB insect. But this would mean that I would have just 0.2dB difference between an insect with 253 legs and one with 254 legs. What if that extra leg is really interesting? I know my ear is not going to be able to detect a change in volume of 0.2 dB. Which brings me to the other important requirement for mapping:
I want to be able to easily hear small changes in the data; I want an insect running at a speed of 2cm per second to sound significantly different from an insect running at 3cm per second. The cigarette is burning at 2cm/min = 0.033cm/s and the coin is going at 3m/s = 3000cm/s. This means I really want to be able to distinguish speeds that differ by 0.001cm/s .
So I want to be able to distinguish sounds to within 0.001 over a range of 3000. Possible? Apparently the maximum number of frequencies that the human ear can distinguish is a whopping 330,000. By looking at data over a range of 0-3000 with a precision of 0.001, I’m asking my ear to distnguish 3,000,000 different frequencies. I can’t do it. So I should rethink my mapping in this case, now knowing that if I am looking at data which has a large range, I am either going to have to reduce the range or sacrifice some precision.
We’re not so good at noticing fluctuations in volume. We can hear over a range of about 100 dB before our eras start hurting, and can determine fluctuations of about 1dB. This gives us just 100 loudness points to map to (compared to the 330,000 frequencies) which makes me think that volume should be used for a “rougher” physical property, or for a physical property that doesn’t have a wide range.
Duration is a but easier to handle, as I can extend the duration of a note indefinitely if I want to. For this example I might choose 10mm to be mapped to 0.1 seconds and then for every extra 10mm I add on 0.1 seconds of duration