Week 10: Audio-reactive visuals
4h
Topic: What is sound? What are its qualities: amplitude (loudness) and frequency (pitch). Basics of computer audition: what it means for the computer to hear and how we measure the qualities. FFT as a mathematical device to deconstruct the signal and measure bass/mids/highs.
Learning materials
Sound as a wave signal (think of sine) can have different measurable qualities: the most basic ones are amplitude (associated with loudness) and frequency (associated with pitch).
Low frequency means low pitch. Look at https://alexanderchen.github.io/harmonics/. Notice that we usually work with the frequencies from lowest to the left to highest to the right.
When composed, the sound is a mixture of waves of different frequencies and amplitudes. In order to get meaningful information out of a recording, we need to decompose the signal into the waves that initially made it. We can do this using a mathematical machine called Fourier transform in a process called spectrum analysis. See 3blue1brown's introduction to FFT (see 0:30 -> 2:30): https://www.youtube.com/watch?t=50&v=spUNpyF58BY&feature=youtu.be. Computers now allow us to do this magical decomposition in real-time, called Fast Fourier transform, or FFT in short.
We can then look at the spectrum generated by FFT and try to extract some higher-level information about the sound: for example, when the bass drops. This allows us to create artwork reacting to the sound, such as transcribing music into visuals. Because sound is continuous, we simplify and sum similar frequencies together into bands of frequencies. For example, the lowest band would mean bass, with all the soundwaves with frequencies between 20 to 60 Hertz. The band can then have a single value representing the sum of amplitudes of each wave, meaning the band's loudness. See https://www.youtube.com/watch?t=209&v=4Av788P9stk&feature=youtu.be for further explanation.
The sketch below creates an array where each element represents a single band. We then use FFT to assign the loudness of each band to the array. You must click on the canvas first (a safety feature of browsers).
Again, because the sound is continuous, and we measure it only 30 frames per second (fft.analyze() in the draw function), the signal can look very random. It's then helpful to average the measurements over time to get smoother values, which you need to program yourself. I suggest using 10-20 last values.
Where to look for inspiration
Max Cooper
Very popular and contemporary ambient VJ/music performer. He uses generative techniques and makes visuals with generative artists/programmers.
You can search for the album "Yearning for the Infinite" and read about it here: https://www.yearningfortheinfinite.net/. The concept of the album is emergent behavior.
https://www.youtube.com/watch?v=j8SNmGHhfks
https://www.youtube.com/watch?v=_7wKjTf_RlI
Ryoji Ikeda
Very famous video artist, one of the older ones. Ikeda creates immersive audiovisual installations, notable is also his experimentation with sound illusions.
See Transfinite: https://www.youtube.com/watch?v=omDK2Cm2mwo
David Mrugala / Thedotisblack youtube channel
Transcribing nature sound onto paper: https://www.youtube.com/watch?v=RB1ayP9Q4Vg
was also exhibited in school https://www.youtube.com/watch?v=5sY76MRS_XQ
Sound language, print: https://www.youtube.com/watch?v=5sY76MRS_XQ
Others
This is an excellent example of what visuals VJs use for EDM, mostly techno music: you can see 9 different geometrical shapes in space manipulated by the music: https://vimeo.com/68161863.
Audio artist Joelle as an example of advanced techniques: https://vimeo.com/116097721.
You can use Spotify API to get some AI-processed information about the music: https://developer.spotify.com/console/get-audio-analysis-track/ (example here: https://spotify-audio-analysis.glitch.me/analysis.html).
A simple example of audio-reactive artwork on the web: https://therewasaguy.github.io/p5-music-viz/demos/08_echonestPitchSegment/.