This is a book about music cognition: attempting to understand how people understand the music they hear. Temperley’s main thesis throughout the book is that a profitable way to study music perception is to pretend that the listener is doing a Bayesian analysis to determine the structure (e.g., time signature and key) behind the musical surface (the audio signal being heard).
Bayes’ Theorem seems to be pretty hot these days. It’s a statement about probabilities, the main point of which is that the a priori likelihood of certain states is useful information when trying to figure out what state actually exists. A standard example is that if a certain disease is incredibly rare, then even if you test positive for it with an only slightly fallible test, it’s still quite unlikely that you really have it.
The whole idea that musical perception is largely about creating models in your head about what’s going on, with associated expectations about the future that can either be met or thwarted, and that composers are constantly playing with those expectations, is associated in my mind with the late Leonard Meyer, who is definitely worth reading. I recommend in particular Emotion and Meaning in Music.
Temperley uses Bayesian analysis to simulate how listeners create a presumptive model for a given musical signal; for example, if a note arrives a little later than expected, the listener has to make an on-the-fly guess whether that is due to a syncopation (the tempo remains constant and the note is offset from its expected location relative to it) or rubato (the tempo has slowed down slightly and the note is exactly where you’d expect in the slowed-down tempo). If you have a model for the relative probability of the two cases, you have good grounds for guessing which is actually going on. (Note that the model can change according to what kind of music you’re listening to: the answer to the above question is much more likely to be rubato if it’s a classical performance and syncopation if it’s a rock performance).
Temperley creates these sorts of models to generate proposed probability distributions in varied domains, from time signatures to key determination to modulation detection. There are some good standard corpora of analyzed musical examples so he’s able to evaluate his models to existing ones fairly accurately, and they perform well though not groundbreakingly so. One nice thing is that the models tend to be fairly simple compared to many other ones in the field, which tend to be rather special-cased and apparently a bit fragile.
I was very pleased, by the way, to find that a scale-determination system that I had come up with independently in the late 90s for use in creating automatic accompaniment for MIDI renditions of karaoke songs was basically exactly the same as one of the standard systems that the academic world came up with. (The short version is that for a given timespan you take the total duration of each pitch class and take the dot-product of that 12-dimensional vector with a vector representing a signature for each hypothetical scale, then take the maximum. Not rocket science but it was nice to come up with a simple quantitative algorithm that performed well in practice.)
Even though there’s no silver bullet here, the ideas in it are quite interesting. I particularly like that the models end up being fairly straightforward and don’t require a ton of specialized tweaks, which is a promising feature. The math involved is real, but if all you care about is the music you can probably gloss over most of it if you trust the author. Worth reading if you’re interested in how people perceive music, either for its own sake or because you want to exploit it as a composer or performer.
3 Comments