Unresolved Speech?

On the latest episode of the Reconcilable Differences podcast, John Siracusa and Merlin Mann open the episode by discussing the "sing-song" or "uptalk" openings of some YouTube videos. John mentions the LockPickingLawyer introduction as an example of this; he feels as though the end feels like it's hanging or unresolved.

I was so intrigued by this that I stopped listening to the episode and started investigating. I went to the most recent episode of the LockPickingLawyer channel and grabbed the first couple seconds of audio.

You can certainly hear what John is talking about. I tried figuring out what the closest pitches were to each word, but my ears are not very well trained. So, I turned to technology instead. I found that there's a pretty cool Python package called crepe that uses a trained neural network model to estimate the fundamental frequencies of a given sound sample (as well as its confidence in that estimate). After quick installation of the package and its dependencies, I was able to predict the pitch from the clip.

The LockPickingLawyer introduction waveform (above) and predicted pitches (below). I’ve also marked the start of the different words and the pitches of some nearby notes. I’m only showing the higher-confidence pitch predictions, which is why there are some gaps in the lower plot.

The LockPickingLawyer introduction waveform (above) and predicted pitches (below). I’ve also marked the start of the different words and the pitches of some nearby notes. I’m only showing the higher-confidence pitch predictions, which is why there are some gaps in the lower plot.

You can see how the pitches change with the different words. I also tried to see which pitches on a piano (in equal temperament tuning) the different words were closest to. To me, it seems like it starts with "This" as an F, followed by "is" as a Bb, then down to Eb for "the" and up to F (an octave below the initial "This") and staying there for "lock-pick-ing" before ending on a Db for "lawyer." Below is my attempt to render that in musical notation.

My attempt at approximating the LockPickingLawyer speech cadence with musical notation.

My attempt at approximating the LockPickingLawyer speech cadence with musical notation.

So, can we figure out why it sounds like it is ending in an unresolved way? While I was able to compensate for my lack of ear-training with programming before, here's where my lack of musical knowledge feels like it's holding me back. If I had to guess, most of the pitches of this phrase sound like they could fit well into the key of Bb. You start on F ("the 5", or the dominant), then go down a fifth to the tonic Bb (1), then down another fifth to Eb (4). It then steps up to F (an octave below the starting point), but instead of resolving back up to someplace like the tonic Bb, it instead goes up to Db. Db is a minor third above the tonic note — that might suggest to the ear being in one of the minor modes of Db. To my ear it sounds pretty unresolved as well, but I don't have a good answer as to why. Maybe it is because it goes from the dominant (F) to somewhere besides the tonic (Bb) (as I think Merlin suggested on the show), but that's just a guess on my part.

Of course, this tonal sequence is from human speech, not a composed melody. Our musical expectations are built up from a lifetime of listening; how do these expectations translate from particular musical genres to hearing the speech of others? I don't really know, but I am quite interested in learning more about harmonic intervals in general and their relationships to human speech in particular.