O4S - Proper audio stream compression

Introduction and summary

As expected, throwing advanced techniques at my music data is not giving me overwhelmingly agressive results. Using my Go Figure data, we increase the compression from 21% in this article to 30% at maximum this time or 38% with the smart bundling of the files.

The stream format is identical to what is implemented in O4S - Stream compression light article. —I reuse the existing game/z80 code where the stream code can jump to other reused locations in the stream and automatically return when X amount of reg-pairs have been sent to the OPL4 chip. It's just that this time, I spend more time figuring out which parts are repeated and can be reused.

This effort consists of two parts:

Making of an audio-file collection, with no gaps between clips or segments.
Compression across all data, not only per tune.

Compression technique: Context free grammar (CFG).

Algorithm - and why was it chosen?

At an earlier point in time Grauw did great research into compression methods and found that CFG could be a great fit for retro computers. There is no need for RAM during decompression and depending on the data, it can compress pretty significantly. Decompression is often extremely efficient.

I have made my own PoC of CFG earlier with very good compression results, so I guess I fell in love with this tech! However, the jumps in my implementation were recursive which resulted in quite costly jumps-in-jumps. The music data and the playback algorithm at that point were very different, but I looked forward to see how it would perform this time.

Apart from the performance of my player, Grauw's replayer is superior in most other aspects and is likety the goto-replayer for devs who does not want to delve into the details of Yamaha audio chip commands. The replayer is open to everyone, it handles multiple audio chips, bundles your tracks and offers CFG compression. It will also let you juggle around with wave headers.

Fundamentally, the compression method uses Suffix trees and a Longest Repeated Substring (LRS) algorithm.

How it works

I made a tool in python which takes json-files as input and outputs binary files which fits ROM segments.

Data structures and method in short

I use Generalized Suffix Trees (GST) implemented with Ukkonen's algorithm (for performance reasons)
A suffix tree lets you easily know if there are repeated patterns in your data stream. A generalized suffix tree lets you throw multiple data streams into the tree and you can see repeated patterns across data-sources.
A jump in my stream costs 4 bytes, so any repeated pattern worth reusing must be at least 3-regpairs = 6 bytes.
I use Longest Repeated Substring (LRS) substitution because that gave the best compression over "shortest first" (SRS), "highest net-saving first" and even a per-entry exhaustive variant (gave same result as LRS).
Nested jumps may impose high runtime frame cost, so I never implemented support for this in the replayer. This presumably greatly reduces the compression results.
I only compress the regpair-data, not the control codes (this data is only 1.25% of the stream)

Splitting data streams

The reader should ideally read this article first, to understand how the data is organized. Below I use a drawing from that article. The purple outline is an example of how a frame's WAV-channels' data may get transformed: In this case, the algorithm has found a repeated part in the middle of this chunk (see red line) and needs to single it out. From being one chunk with one data length indicator in front, it splits it into 3 parts. Each part needs to have a data length indicator in front (size: 2 bytes), which means that this particular case will have its local data grown by 4 bytes.

When creating the final stream we parse data from start to end, split data chunks and replace recurring patterns with links/jumps as we go.

Constraints - know your data

The compression ratio is not massively impressive and the reasons are several:

We do not allow nested repeats/jumps.
We compress only singled out parts of the data - the regpairs - chunks of 6-250 bytes each.
Wolf's composing is advanced. The 'intro' in particular.
The data has undergone surgery in the process from VGM => O4S (my format), and patterns may be been broken.
The soft vibrato in MS2 does not help - it adds "variable" commands in the stream.
Turns out, there is no sufficiently long, repeated patterns across FM1, FM2 or WAVE channels in my current music material.
Splitting patterns adds 2 or 4 bytes per split.

Analyzing the results

In the below table we can compare to original and v1.5. We see that the frame cost has increased more than 1000 cycles (per frame) on some of the clips, but they are still below 6000 cycles, so it is likely not a concern. If it is, my tool supports preserving frames where the cost is too great. However, this typically affects the compression ratio somewhat.

In the below table the final compression reduction is 29%. 1% lower than maximum as the files, in this case, are generated with a ("surgical") frame offloading feature which is used to lower the peak cycle costs.

From the image above, we can see that, in Go Figure, a hefty 52 * 16kB segments are used for the music data. The new bundle will give the possibility of using only 32.35 * 16kB which is ~518kB, a 38% reduction all in all.

In the image below we can see the stats on substitutions, local-stream-jumps and out-of-current-stream-jumps.

"Substs" means unique substitution patterns

It is designed so that a jump can only be backwards, so it makes sense that the first clip(s) do not have jumps out of its own stream.

With reductions of 0.5MB, its pretty much given that I'll use this compression and bundling in upcoming products.

Search This Blog

Bengalack on MSX