Getting frequency data from sound files in JavaScript

A few months ago, Madeon's The Prince audio visualizer inspired me to create something fun with audio. A spectrogram or some kind of similar visualization maybe?

A lot of the tutorials I found took the approach "here's a full project I made and here's what my code looked like at every step." They were helpful to understand the potential of the Web Audio API, but I didn't want to recreate their project; I found myself searching for that one nugget of information "how did they extract the time frequency data?"

There's a huge wealth of cool applications out there, but I wish the guides would separate out core Web Audio API concepts before extending with other feature choices and technology choices like WebGL/canvas :| So I made the guide I wish I had!

All that said, I do think the MDN docs on the Web Audio API do a decent job of explaining in the first concepts and usage section.

Here's our plan:

Setup bits

Create things in the DOM to hold the result of the future visualization. (I've chosen to use a bunch of div elements nested in one large div. L38-L45.)

<audio controls src="/files/spectrogram-input.mp3"></audio>
<div id="spectrogram">
  <div id="bin-0"></div>
  <div id="bin-1"></div>
  (programmatically generated...)
</div>

On page load, get the input source. (I chose an <audio> element, although this could also be the microphone or the audio stream from a <video> element. L37-L54.)

Web Audio API bits

Create AudioContext. L8-L9.
Create an AnalyserNode effect. This is like a no-op effect, i.e. the output signal and input signal are the same. L10-L12.
Connect the sources to the effects, and the effects to the destination. L13-L17.

Processing bits

On an interval, pull data out of the analyser node as an array of 8-bit numbers with getByteFrequencyData. L21, L31.
Once you have the array, visualize it. (I've chosen to output a bunch of ASCII block-drawing characters █ according to the value in the array. L22-L25.)

The result

Audio input:

Future work

There are a few issues with this:

setInterval keeps running even when the sound is paused. That's wasting resources while the page is idle.
The frequency axis (vertical here) is linear. But to match our perception, the scale should be logarithmic because we perceive frequencies separated by the same ratio as equally far apart, rather than frequencies separated by the same difference. (This is the idea behind equal temperament.) We could do this with more bins and subsampling logarithmically.

But these are immaterial to my goal here :)