O4S - The encoder

The purpose of the encoder

Reduce file size.
Reduce the cost of single frame.
Handle segment boundary checks.
Pre-calculate correct values for REG_WRITES_BLOCK jump addresses and place other jump addresses.

The best way to reduce the data size would be to implement compression, but this time we needed the best possible performance, so it was not prioritized.

Result examples

Performance

Some numbers from select tracks in Go Figure:

Words matter peak frame reg-writes, before: 141, after: 91
Go Figure intro music peak frame reg-writes, before: 137, after: 90

Size

We cannot directly compare O4S file size to VGM, as the VGM is more verbose. For example, every reg-value-pair is represented as a port-reg-value-triple. But we know how much we removed.

Words matter final size: 59738 bytes, removed: 17948 bytes ⇒ shrunk by 25%
Go Figure intro final size: 86314 bytes, removed: 28194 bytes ⇒ shrunk by 23%

The state of the encoder

All in all, this started as a huge PoC, and still is.

Warning - I don't know what I'm doing 😅

It must be stated: In the pursuit of smaller file size and reduced amount of reg-writes, there might very well be errors on my approach and assumptions. I do not really know music technology or music chips, so this is a best effort. Let me know if I do something wrong.

Background

It started with me wanting to understand more of this chip, so I started to make a tool that desciphered the data stream and printed what the each byte meant. I could then user this to analyse data. Output is like this:

How to read the output?

Example: What does it mean when 0x69 is sent to port 0x7E and 0x40 is sent to port 0x7F on line 6?

[W: 5] 2 69 40 [WAV] - Ch 1 | KEY = OFF*, DAMP = 1*, LFO RST = 0 , CH = 0 , PanPot = 0

"W: 5" means Wait 5 VGM samples. One VGM sample is 1/44100 per second. There was a delay between the writes when this was recorded.
"2 69 40" is the command. The value 0x40 was written to register 0x69 on the WAVE ports ("2" is VGM-format-only data), see register 0x69 from the manual:

KEY OFF* is the KEY ON/OFF value deduced from the bits of 0x40. The * means that this bit-based value has changed since last value was sent to this register.
DAMP = 1* is similar to the KEY OFF above in the way that DAMP-bits has a new value here.
LFO_RST, CH and PanPot are also stated, but their values did not change with this regiser write.
Redundance: If a line contains a ~ it means that the parser see that the register write is identical to the register's current value (no update) + that it has also deemed that it does not need to be re-issued (will be removed).

The most important command is KEY ON/OFF, which is the start and stop of sound generation on a channel. Well, 100% stop or not is dependent on the envelope, the release rate, volume, level direct and damp values. But that just becomes complex, so the key here, is that key is key😊

Looking at this output, we will also see that there is normally not data written every single frame. The exception is when the composer is using MS²'s soft vibrato or similiar. This is an effect which is not natively supported by the chip, thus the software is issuing updates to the stream every frame.

Limitations

My encoder only support what we needed for our music, so it is not complete. We only support 2-OP channels, not 4-OP at this point. We only support OPL4 timer. We do not support FM drum mode, etc.

Internal model

In general, the tool wants to analyze and control each frame easily, so it creates so-called PlayFrame objects, and puts them in an array.

The class obviously has lots of properties and methods.
The illustration above shows a few only for simplicity.

This Playframe object also holds the reg-val-pairs for each channel for this particular frame. I have put these in arrays per "port" and per channel, and I like to view it like below. The gray verticals are supposed to illustrate arrays of reg-val-pairs of various lengths.

When I produce the stream from the above PlayFrame I reorder the ouput and it is organized like this (read from left to right). WAV, FM1, FM2:

Assumptions

Looking at the vgm-stream, it seems that some reg-writes are redundant. There might be a good reason for re-sending these to the chip, but I do not really know.
I see that occasional KEY-ON (this is what triggers the production of audible output) reg-writes to the same register in the same frame will come so close that I assume no audible difference if the first is removed.
Apart from duration between frames, any wait recorded between commands recorded by the vgm-recording is ignored, and assumed being there due to general cpu usage during music generation.
Assumes that grouping all reg-writes for one port is ok, despite the original sometimes can have writes in this order: FM1, FM2, FM1, WAVE. I also assume that changing the order of channels is ok.

What it does

Redundancy

From frame to frame it tracks the current value in the registers, and 90% of the reg-val-pairs that are identical to existing value are removed (the remaining 10% needs to remain as some registers trigger special effects.)

Grouping reg-writes to the same ports

Every time we change ports, this takes up data and performance, so we minimize this.

Scrutinize immediate note-linking tricks

Composers in MS² may use note-linking tricks (note-linking is like "portamento" with immediate speed). Data wise, this trick starts off with a set of data which immediately is replaced. The first data was never needed and is removed.

Changing WAVE instrument / Wave header loading

Using a new sample set mid-tune kicks off "wave-header loading". That is, the chip lets this channel point to other sample data and corresponding wave header data needs to be copied into the envelope registers. The chip takes care of this, but it takes time and the particular channel should not used until chip is ready. It takes around 300 µs or ~1100 Z80 MSX cycles. The replayer must ensure it waits before using the channel, and can query the chip for this at runtime.

If the wave header loading cannot be offloaded to the frame prior (see smooth out frames below), we try to minimize the impact of the loading within the current frame. Consider this PlayFrame where we have identified change of instrument in wave channel 14. We split the channel's data at the point right before VOLUME is raised or where we have KEY ON (red part).

When we produce the stream from this, we do it this way:

Doing it this way hides the 1100 cycle cost in cases where the cost of the rest is greater than 1100 anyway.

Obeying speed

I'm catering for the issue "Key-off and key-on should not be instant", as described here: https://github.com/openMSX/openMSX/issues/1543 but at the time of writing I can't remember if this is even an issue in the MS² tunes.

Smooth out frames

The concept is to look at frames that are heavy, and see if it is possible to offload some of the reg-writes to the frame prior. This trick can be very effective.

Ignoring frame 0, the tool targets frames with a cost above a certain level or has wave-header loading. If the channel is in already KEY OFF, it is a candidate for having all its commands "up to" any KEY ON, offloaded to the previous frame. If there is no PlayFrame object for the prior frame, we make one.

This one absolutely comes with a risk as we cannot easily be sure if there is audible output still being outputted on the particular channel. It may have envelope values where audio lingers for a long time. However, the music so far, has survived human verification😅

In the case we find an extreme case/frame, the composer can manually help by giving (non-audible) hints directly in the music data about a channel being allowed to trigger offload-feature despite it is not in KEY OFF. Such a feature would be superior to just putting the channel to KEY OFF one line earlier in the composer tool, as one line earlier in reality is ~8 frames earlier in most Go Figure tunes. I built support for this in the tool, but it turned out that we never needed to use it in Go Figure.

Offloading wave-header loading

This would be to move the blue part in the image above to the frame prior. In this particular case we skip the LOAD_WAIT command all together.

O4S - Introduction | O4S - The format | O4S - The replayer | O4S - The encoder

Search This Blog

Bengalack on MSX