Supplementary material

"Insights on harmonic tones from a generative music experiment"

MML 2024

A. BassNet interface (Nov. 2021 session)

Figure 1. BassNet interface, version used during the Hyper Music session. The colormap gives access to the latent space. Parameters "Articulation", "Portamento", "Harmonics", "Odd/Even Harmonics", "LPF Cutoff", "LPF Q", and "LPF Order" relate exclusively to the sonification process. Parameters "Onset Threshold", "Harmonic Variation", and "Inharmonicity Tolerance" involve the model's behaviour.

B. Hyper-Music, "Melatonin", workflow

This section illustrates the processes involved in the making of "Melatonin". For more details, see Deruty and Grachten (2022).

Workflow, part 1
media/audio/C01_Melatolin_Manual_chords_Rhodes_wet.mp3 media/audio/C02 Melatolin-basses+drum loop-vocal.mp3 media/audio/sec_1/07 p1 solo 1 O ext.mp3 media/audio/sec_1/08 p1 solo 1 T ext.mp3 media/audio/sec_1/09 p1 solo 2 T ext.mp3 media/audio/sec_1/10 High patterns 1 T ext.mp3 media/audio/sec_1/11 High patterns 2 T ext.mp3 media/audio/sec_1/01 Bass A3 O ext.mp3 media/audio/sec_1/02 Bass A2 T ext.mp3 media/audio/sec_1/03 Bass A1 T ext.mp3 media/audio/sec_1/04 Bass B1 T ext.mp3 media/audio/sec_1/05 Bass B2 O ext.mp3 media/audio/sec_1/06 Bass B2 T ext.mp3
Figure 2. Workflow for Melatonin, part 1. Click on numbered elements in the figure to hear the corresponding audio.
Workflow, part 2 media/audio/C03 Melatonin-double bass loop.mp3 media/audio/sec_2/12 Bass C1 O ext.mp3 media/audio/sec_2/13 Bass C1 T ext.mp3 media/audio/sec_2/14 Bass C2 O ext.mp3 media/audio/sec_2/15 Bass C2 T ext.mp3 media/audio/sec_2/16 Bass D1 O ext.mp3 media/audio/sec_2/17 Bass D1 T ext.mp3 media/audio/sec_2/18 Bass D2 O ext.mp3 media/audio/sec_2/19 Bass D2 T ext.mp3 media/audio/sec_2/20 Bass p2 solo 1 O ext.mp3 media/audio/sec_2/21 Bass p2 solo 1 T ext.mp3 media/audio/sec_2/22 Bass p2 solo 2 T ext.mp3
Figure 3. Workflow for Melatonin, part 2. Click on numbered elements in the figure to hear the corresponding audio.

 

C. BassNet dry outputs, analysis and transcription

The videos below contain transcriptions and analyses of the dry, monophonic BassNet outputs shown in Figures 1 and 2. The sample numbers remain the same (1, 7, 12, 14, 16, 18, and 20). Sample numbers 1 and 7 belong to part 1, therefore originating from the same conditioning (Figure 2, sample C1). Samples 12 to 20 belong to part 2, also originating from the same conditioning (Figure 3, sample C3). The outputs, each one of them a sequence of unique complex tones, are sorted according to increasing trascription complexity.

In the videos below, diagrams (a) show transcriptions the outputs, made by ear. The transcriptions are prone to error. Pitches that were transcribed may have variable pitch strength - see Zwicker and Fastl (1990). Pitches originating from different partials are highlighted using different colors. The vertical blue lines denote the frames that are focused on in (c) and (d).

Diagrams (b) show short-term Fourier transforms of the outputs weighted using ISO226-2003 at 50 phon (ISO, 2003). The horizontal yellow lines denote the pitch as transcribed in (a). As in (a), the vertical blue lines denote the frames that are focused on in (c) and (d).

Diagrams (c) and (d) show power spectra for the frames signalled by the blue lines in (b). The x-axis grid is set on the sound's fundamental. The written pitches correspond to the annotated pitch for this frame.

BassNet dry output, sample 14
Video 1. Sample 14, transcription and analyis. (a) The lower transcribed pitches originate from the fundamental. The higher pitches originate from harmonic 3. (b)(c) The spectrum only involves the fundamental and harmonic 3. Both result in transcribed pitch. The absence of harmonic 2 is not the result of the learning, but of a sonification setting present on the interface. At 50 phon, harmonic 3 is louder than the fundamental.
BassNet dry output, sample 07
Video 2. Sample 07, transcription and analysis. (a) The lower transcribed pitches originate from the fundamental. The higher pitches mainly originate from harmonic 3. One note originates from harmonic 4. (b)(c) The spectrum include three partials, the upper partial being the loudest at 50 phon.
BassNet dry output, sample 16
Video 3. Sample 16, transcription and analysis. (a) The lower transcribed pitches originate from the fundamental. The higher pitches originate from harmonic 3 and 5. (b) Although partials around 400Hz are the loudest at 50 phon, they are not the ones transcribed. (c)(d) Only odd harmonics were used (sonification settings).
BassNet dry output, sample 20
Video 4. Sample 20, transcription and analysis. (a) The lower transcribed pitches originate from the fundamental. (b)(c)(d) The fundamental is either missing or weak. The higher pitches originate from harmonic 3 and 5. (c) Only odd harmonics were used, and the fundamental was attenuated (sonification settings). (d) The higher annotated pitch is A3 (harmonic 5), even though this harmonic is surrounding by louder partials.
BassNet dry output, sample 01
Video 5. Sample 01, transcription and analysis. (a) Up to three simultaneous pitches may be transcribed. This multiplicity of perceivable pitches may originate from the combination of (1) loud harmonics and (2) the use of odd harmonics with a missing fundamental (Yost, 2009). Pitches in the bottom stave correspond to the second harmonic, which is missing or very weak - a situation exemplified by the spectra in (b)(c). Pitches in the middle stave mainly originate from harmonic 3. Note the very audible harmonic 4. Pitches in the top stave originate from a variety of higher harmonics. (c)(d) The vertical solid lines illustrate how harmonic 3 can be interpreted as a fundamental, leading to pitch ambiguity.
BassNet dry output, sample 12
Video 6. Sample 12, transcription and analysis. (a) The lower transcribed pitches originate from the fundamental, either present or missing. The higher transcribed pitches originate from harmonics 4, 5 and 8. Notice the very obvious harmonic 8. (b) The transcribed pitches originate from two loud harmonics. The formants are less well-defined than in most of the other BassNet output spectra. (d) The highest transcribed pitch is not the loudest harmonic.
BassNet dry output, sample 18
Video 7. Sample 18, transcription and analysis. (a) A variety of harmonics are involved. Some transcribed notes from the lower staff are not harmonics. Some correspond to harmonic 3, one octave below. Others couldn't be linked to harmonics. (c) The spectrum only involves odd harmonics, and the fundamental is weak. Furthermore, it is inharmonic, which complicated the analysis, made the transcription very uncertain, and justifies the term "partial" in place of "harmonic" in (a). (d) Harmonic 2 is missing, and yet clearly audible and transcribed.

 

D. Control of harmonics in Contemporary Popular Music

808 Woofer Warfare Modes
Video 8. The 808 Woofer Warfare patch from the Seismic Shock6 Omnisphere7 library, a Roland 808-style bass generator, offers presets (‘modes’) in which particular upper harmonics are selectively amplified.

E. "Melatonin", final result

Hyper Music, "Melatonin" — Final Result
Video 9. "Melatonin", final result. Waveform and semiotic segments according to Bimbot at al. (2012). Blue segments constitute part 1, and green segments constitute part 2. From all the pitches, can you guess which are fundamentals, and which are harmonics?

References

Bimbot, Frédéric, Deruty, Emmanuel, Sargent, Gabriel, and Vincent, Emmanuel (2012). Semiotic structure labeling of music pieces: Concepts, methods and annotation conventions. In Proceedings of the 13th International Society for Music Information Retrieval Conference, pages 235–240. ISMIR. https://inria.hal.science/hal-00758648

Deruty, Emmanuel and Grachten, Maarten (2022b). “Melatonin”: A case study on AI-induced musical style. In Proceedings of the 3rd Conference on AI Music Creativity, 13-15 Sep. 2022, online. AIMC. https://doi.org/10.5281/zenodo.7088302

ISO (2003). Normal equal-loudness level contours-ISO 226: 2003. Standard, International Organization for Standardization, Geneva, Switzerland. https://www.iso.org/standard/34222.html

Yost, William A. (2009). Pitch perception. Attention, Perception, & Psychophysics, 71(8):1701–1715. https://doi.org/10.3758/APP.71.8.1701

Zwicker, Eberhard and Fastl, Hugo (1990). Pitch and pitch strength. In Psychoacoustics: Facts and Models (Springer-Verlag, New York), pp. 111-149. https://doi.org/10.1007/978-3-662-09562-1_5