I’m interested in music from the formal languages point of view. One of my first experiments was to try and create «random music» in the true sense of the word random. That is, generate absolutely random notes and listen to what happens. It was extremely easy to program, but results were not pleasant at all.
Western world music has evolved in such a way that our ears rarely ever appreciate music, unless it has some internal structure. Eastern world music is a bit more open minded. But still, if you sit down with a 12-sided dice and just play a note depending on what you roll, it will obviously sound chaotic, but also ugly and annoying.
I still think there’s value in generating random music, though. It can show whether or not you understand how unadultered binary data, note selection, tuning, and digital encoding work. At least, that was the purpose of this exercise I came up a couple of decades ago, when I was teaching these subjects.
The following experiment assumes a GNU/Linux system. The code is written in Perl and the lines are not in the order they appear in the program, and there are a few blanks you’ll need to fill. Have fun.
Randomness as a device
The Linux kernel provides a couple of device drivers producing pure unadultered random data based on environmental entropy.
$ ls -l /dev/*random
crw-rw-rw- 1 root root 1, 8 Feb 27 12:22 /dev/random
crw-rw-rw- 1 root root 1, 9 Feb 27 12:22 /dev/urandom
That is, if you read from those devices you get an arbitrarily
long stream of random bytes, one at a time (they are
c
haracter devices). A firehose of bits that came to
be thanks to all the goings on with the network, keyboard, mouse,
and the universe. I will use /dev/urandom
because it is
a pseudorandom number generator
seeded with entropy gathered from the environment. This means
there will always be data there and reads will never block.
So, we start by
open(my $r,"<:raw","/dev/urandom");
while (read($r,$b,1)) {
to open the random generator in «raw» mode: give me all the bytes,
don’t do any Perl-magic interpretation. That stream of bytes will
be read one at a time using read(FILEHANDLE,BUFFER,LENGTH)
so
that $b
will hold the next byte read. I’m not going for
efficiency, but practicality.
This is a raw byte that I can interpret any way I want. I choose to
interpret it as a signed character so it will range from -128
to 127
– check your CI3641 notes. Perl
is extremely powerful in terms of translating from or to raw data,
and for this particular problem
my $h = unpack('C*',$b);
is the incantation for $h
to become a signed charater. That’s all the
operating systems, programming languages, and machine organization
knowledge will need.
The vibe and the Fourier
Sounds are vibrations that propagate as acoustic waves. We want to create a sound wave out of this random numbers, and then use the sound card to… make noise. Sound waves can be thought as increasing (positive) and decreasing (negative) pressure values, which hopefully clarifies why I chose the numbers to be signed.
A combination of musical tones can be built out of this random stream as long as we select the particular frequencies western ears are used to. That is we need to «tune» (as in tuning a musical instrument) these numbers so they match the traditional music notes (or pitches). Since I am making singular notes without a particular timbre or modulation, the math is straightforward.
If you’ve ever enjoyed a live orchestra concert, you’ve certainly noticed that after every musician has sat down and settled, a violin player ceremoniously comes out. They then play a single particular note, and the rest of the orchestra fine tunes their instruments to that tone. They become in tune to the A440 («concert A» or «La de concierto») and they are ready to go. This is a very intense moment, specially if you know the violin player and can’t help but shed a tear, even after fifteen years of listening to him do his thing…
This A440
has (spoiler alert!) 440Hz frequency. A vibration having double that
frequency (880Hz), would have the same tone but with a higher pitch
– a full octave. The western world has agreed to subdivide a full
octave in twelve steps according to the
equal temperament
tuning system, using a logarithmic scale based on the twelfth
root of 2. Therefore, the frequency for the n
-th semitone of an
interval starting at A440 can be computed by
440 × 2(n/12)
counting from n = 0
, because that’s how civilized people count.
Building a sinusoid wave out of these random numbers means
the sin()
function is involved, and it typically works on
radians. The range of sin()
is [-1,1]
so scaling will
also be needed, to increase the sinusoids amplitude (pump up
the volume!). Let’s start with something like
my $n = int( $v * sin( $a4 * 2 ** ( $h / 12 ) ));
where
$v
is a constact factor for amplitude (volume) arbitrarily set at100
elsewhere in the script.$a4
is a constant factor1382
set elsewhere in the script. This is 440π so the value is turned to radians forsin()
to work properlyThe whole operation is truncated with
int()
because the hardware wants integers. That’s for later.
Now, the above line as it is would still generate annoying noise
because $h
is a random number that would jump up and down,
without the patterns that our western ears are used to.
Most of the «agreeable» western music is organized using either major or minor scales. That is, you don’t play random notes: you select the semitone progression from the base note, and «stick to it». A cursory review of those two references (or you being a musician) should make clear why I have three lines like these in the script
my @ma = qw(0 2 4 5 7 9 11 12);
my @mi = qw(0 2 3 5 7 8 10 12);
my @s = @ma;
my $p = 8;
so that the default scale (@s
) uses the major (@ma
) scale
progression, having exactly eight steps ($p
). When I feel sad
or pensive, I switch to the minor scale. These additions lead
to improving our computation to
my $n = int( $v * sin( $a4 * 2 ** ( $s[ $h % $p ] / 12 ) ));
using modular arithmetic so our random number $h
falls into
the [0..7]
range, and then picking the particular position
from the default scale. This will select the proper exponent
to which to raise 2
to, so that only tempered tones are
generated: a Perl integer corresponding to a pure sinusoid
tuned to the corresponding pitch within the scale.
The next step would be to simply turn that Perl integer into an unsigned machine byte representation of one byte. If you’re wondering why I need unsigned bytes now, it has to do with the hardware I have, you’ll see.
In order to convert the Perl integer into an unsigned machine byte, we can write
my $w = pack('c',$n);
print "$w";
to spit out bytes. It is very possible that a badly chosen
$v
produces an $n
too large to fit on a byte («too loud»)
and there will be implicit clipping – I’m not trying to deal
with this, I’m just playing.
But we are missing another aspect of music (and parametric simulations such as these): time. The above computation uses a random number to produce a particular note within the interval, but we want them to play for a certain amount of time. Given that a pure note is a sinusoid, consider this perlseudocode.
for (my $t = 0.0; $t < 1; $t += 0.0001 ) {
my $n = sin( $t * 3.14 );
}
this would compute a pure sinusoid. We need to interleave the construction of a sinusoid, with changing the phase at each time step, to get the tone we want. That means changing the code to look like this
for (my $t = 0.0; $t < 1; $t += 0.0001) {
...my $n = int( $v * sin( $a4 * 2 ** ( $s[ $h % $p ] / 12 ) * $t ));
... }
You should be able to figure out how to combine all the pieces and put them into a single Perl script. I will not try to code the part that interacts with the sound card, because it doesn’t add any value and delays the fun we’re after.
Drop them beats
After putting all the code in m8.pl
, we can produce all
the random numbers we need running something like
$ perl m8.pl | od -b | head -5
0000000 000 021 042 061 100 114 126 135 142 143 142 136 126 114 100 062
0000020 042 022 000 360 337 317 301 265 252 243 236 235 236 242 251 263
0000040 277 315 335 356 377 017 040 060 077 113 125 135 142 143 142 136
0000060 127 115 101 063 044 023 002 361 340 321 302 265 253 243 236 235
0000100 236 242 251 262 276 314 334 354 376 016 037 057 075 112 124 134
Those look like random bytes printed in octal, but in reality they have been carefully computed to match the many sinusoids corresponding to the randomly chosen pitches within the same octave, as a sequence of unsigned 8-bit values.
ALSA is a combination of Linux kernel drivers and CLI utilities to operate sound hardware. The hardware in this particular machine supports playing raw unsigned 8-bit values, so making noise is as simple as running
$ perl m8.pl | aplay -c 2 -f U8 -r 12000
The stream of bytes produced by m8.pl
will be fed to aplay
.
Each byte will be considered an U
nsigned 8
-bit sample,
and they will be used at a sampling speed of 12kHz. Each byte
will be used for both channels (a «fake stereo»). And this
will work until you interrupt it with Ctrl-C.
You can improve the pipeline by adding something in the middle
that only reads a certain amount of bytes and copies them over,
such as dd
.
You can obviously capture the output of the script into a
file, and then feed it to aplay
. Everything is a file for
the Unix-enlightened. If you change the sampling rate, you can
listen to the same random melody with different pitches.
$ perl m8.pl > sample.raw
(... wait as long as you want and hit Ctrl-C ...)
$ ls -l sample.raw
-rw-r--r-- 1 emhn emhn 3293184 Feb 27 17:26 sample.raw
$ aplay -c 2 -f U8 -r 12000 sample.raw
Playing raw data 'sample.raw' : Unsigned 8 bit, Rate 12000 Hz, Stereo
$ aplay -c 2 -f U8 -r 8000 sample.raw
Playing raw data 'sample.raw' : Unsigned 8 bit, Rate 8000 Hz, Stereo
I’ve converted the RAW samples to MP3 files, so you can hear what I heard coming out my headphones.
The conversion can be done without having to touch the mouse or click
a button, by simply using sox
and lame
. Did I mention you can add
sound effects using sox
, as in echo, reverbs, as well as do mixing?
Go read some manual pages to Get Good.
Random yet reasonably pleasing 8-bit music.
Generating better sounds
The above script generates 8-bit music because, well, we’re using one byte at a time. This gives us 256 different possible values to represent the sinusoid’s amplitude modelling the frequency. Those are very few possible values for such a function, and it accounts for the raspy sound. We say there’s poor quantization because our sample size (8-bits) lacks enough detail to accurately represent the curve.
We can increase the sample size to anything our hardware supports. I know for a fact my sound hardware supports signed 32-bits litte endian samples. So, if we grab four bytes at a time, and turn them into a 32-bit signed little endian machine integer, each sample is going to be extremely high quality.
This requires two changes to the script that are so obnoxious to refactor, that I decided to write another script. We need to group by fours, and every time we have four samples pack them into the desired binary form. You should be able to figure out where these changes fit
my @w;
my $i = 0;
...push @w,$n;
unless (++$i % 4) {
my $2 = pack('N*',@w);
print "$w";
@w = ();
}
and now
$ perl m32.pl | aplay -c 2 -f S32_LE -r 12000
Playing raw data 'stdin' : Signed 32 bit Little Endian, Rate 12000 Hz, Stereo
Now, raw samples, waveform files, and MP3 files are «final versions», in the same way an executable is to your source code. Wouldn’t it be nice to capture this raw samples and turn them into higher-level data we can manipulate as algebraic things, instead of having to rely on the physics and math of it? I certainly think so, and will have something to say about it Real Soon Now®.