PART I | PART II | PART III

Table of Contents

Part II: How it works

Confines and key numbers

Key numbers, facts and decisions

Generation	MSX
Video Mode	PAL, NTSC
FPS	50/60 on scrolling + main sprites
Audio	FM-PAC
RAM	64 kB
VRAM	128 kB
ROM	MegaROM
Mapper	ASCII16

	Free version	Premium *
Total ROM size	1216 kB (76 segments)	1264 kB (79 segments)
Music size	896 kB (74% of total)	944 kB (75% of total)
Number of scores	21	22
Music in minutes	25	26
Number of sfx	33	33

* Premium version comes with additional music, save-games on the cartridge and a 70-page manual.

Facts in no particular order: Code is written using Visual Code, using C and Z80 assembly for the game, and python for the tools. C is using SDCC. Started off with SDCC v3.9 (which had some bugs), then v4.0, then v4.1 and in the end v4.2, which is a really good version. Development is done using the openMSX emulator, graphics are made in Photoshop, animations are made in Piskel, levels are made in Tiled, music is made in MoonBlaster, music ingame is replayed indirectly via VGM-files and sound effects are made in ayFXEdit.

Game flow

During development I always kept the following drawing up to date. Things were changing all over the place, all the time, so I typically needed a few key drawings updated to hold my sanity at bay. Here is the final screen and flow overview.

Ingame mode or not

At any given time, the game runs in one of two modes. The term “Ingame” will be referred to several times in this document. In short, it’s those black boxes above, where cycle count is of the essence.

“APP_MODE_INGAME”: The skeleton-code for this mode is very simple and written in C, but almost all parts of the body are replaced by assembly code. The main function here is sceneRun() and its execution time never exceeds a frame. This mode has a special ISR (Interrupt Service Routine) at 0x0038 with line interrupts, timed VDP access and is without audio. See more about this in Making "smooth execution" possible.

“APP_MODE_OUT_OF_GAME”: There are 20 screens available and they are the non-black boxes in the drawing above. This part of the code is not as time-critical as ingame, so this part is mainly written in C. I have no strict control over the cycle usage in all this code, so this mode has a simplified ISR that does audio only, as audio needs to be called every 50th or 60th frame.

Final memory model

Just to give the basic idea of where the various parts reside. Yes, that game-box is a funny one, it starts in RAM in page 0 and glides over in ROM in page 2. Who does crazy things like this, and why? 🙂

What came to be. After many iterations.

Startup, sfx, wrldload and so on, are called modules and/or segments and are switched in and out of the pages at runtime. Find details in part III: Memory, pages and module dependencies.

VRAM arrangement

VRAM arrangement (pop-up full size).

From the early days of development, the game would run from disk and loading took a long time. From a user experience, loading times in the midst of gameplay breaks the immersion. It was therefore decided to make graphic assets reusable and load as much of this data as possible into VRAM at start-up. After going megarom such a need wasn’t such a hard requirement, but the whole engine was built around this concept. So I kept this architecture.

How levels are constructed

Level setup in the tool

This game uses Tiled to create its level content. Tiled is a fantastic, open and flexible tool which keeps on being improved all the time. Make sure you donate.

The image below shows my typical setup. A horizontal level only with a grid size of 8x8. The main features used by Lilly’s Saga is:

tilemap
tilesets
layers
various object-types
custom properties on both objects and tiles.

Properties can be values like string and integer, but also objects which are very handy when you want to make one object reference another object. This can be seen as arrows between objects in the image below.

Layers are very handy. They can be turned on and off or even be shown with a custom transparency set. I have some typical layers set up:

Layer	Description
Comments (convenience)	Keeping track of details and maybe ramp build-ups (see more about ramps over at Chaotic Stupid)
Cloud-placement guidance-layer (convenience)	The clouds are fundamentally built up to be placed at specific intervals in the horizontal direction. I use a helper layer to indicate which positions these are.
256x176 guidance-layer (convenience)	This is the size of one full screen and often it is handy to easily see the boundaries of the different screens. One screen is 32 characters by 22 lines.
64x32 guidance-layer (convenience)	This layer is normally not used, but indicates a grid which is used by the "Area-system".
16x16 guidance-layer (convenience)	Gives you a good indication of 2x2 tiles, a block, which is fundamental to the placement of objects in this game. In later versions of Tiled, a better grid-visualization has become available, making this layer redundant.
Backup-layers (convenience)	When testing out different types of gameplay it's very handy to move objects temporarily into other layers which your export-tool ignores. Just right-click the object and choose "move to <layer>". Typically I have this layer turned on with a high transparency setting.
Pathways (required)	This layer holds rectangles which divide the full level into individual areas or rooms.
Coins (required)	Coins are placed in its own layer in the level-editor although the coins end up merged in with the background tiles in the game engine.
Sprites (required)	All enemies are placed in its own layer. I've chosen to show sprites and sprite-boundaries as rectangles.
Objects (required)	The various objects are placed in its own layer. I've chosen to show most objects and object-boundaries as rectangles, but some that have a static size have proper visuals. This is PlayerStartObject, clouds and crates.
Background (required)	This is the tilemap and is placed at the bottom.

Tiles

The game tries to look like the blocky games from the eighties. They would have blocks of 16x16 pixels. Lilly uses tiles of 8x8 and just mimics that old look.

Tilesets

The game has five different themes, which use tiles from three different tilesets. Each tileset is a set of 256 different tiles.

Tileset 1: This tileset is used in the outdoor themes: Mountain Road and Outside Castle.

Tileset 2: This tileset is used in the indoor themes: Caves and Inside Castle.

Tileset 3: This tileset is used in the outdoor theme Shoreline. This tileset is the only one with unused (old) tiles.

Tile codes

Each tile id has an additional property-byte made up from bit-values (see also .BCM-files). The byte will hold any sensible combination of the eight following bit-properties:

Tile #64 cannot be moved into/over from any of the sides.

Most properties should be quite self explanatory. If the COLLISION_YOFFSET was set, the sprites walking on top of it are offset a few pixels (more details at Baking information into tile-ids removes storage and calculation needs).

Special tiles

Some tiles are special in the way that they change appearance. That is, the pattern table, and sometimes the color table, is updated at runtime. These tiles are:

the ones marked with an X
the candle-flame - tileset 2 (indoors) only
waterline tiles - tileset 2 and 3

Level construction

Below is a sample level. Lines with arrows are objects that reference other objects.

We can easily see that it fills 3 screens when enabling the “screen”-layer:

We also have a 16x16-grid or block-layer that can be visualized:

The block-layer is effectively its own map.

Looking closer at the objects, we see that they are placed perfectly in this grid:

The 16x16 grid is the same as the AIM – the Action-ID-map. In our current example, the various objects will get an ID, and the map will look like this:

AIM.

During gameplay, the game will be able to look up a specific object using the ObjectLUT, like this:

The LUT is populated when you load the level. The object data is created and populated at this time as well. The latter data is stored in a custom heap-like memory space using primitive malloc-like functions.

The AIM may be seen as a waste of space, after all it takes one fourth of a tilemap. As normal, space has been sacrificed on the altar of speed. While running the game, we need a fast, cpu-usage-predictable way of accessing an object. Using lists and collision-checks for this would be out of the question. With this scheme, getting any potential object behind the player, would be as easy as this pseudocode:

Object* p = pTilemap + (player_x<<4) + (player_y<<4) * level_tile_length

Every frame there are many other reasons for figuring out player_tile_x and player_tile_y, so these values are already stored away, making the above line be even cheaper:

Object* p = pTilemap + (player_tile_x<<1) + (player_tile_y<<1) * level_tile_length

Output files from level construction

Tiled stores its level files in json-files. Pål’s custom make_level-tool takes these as input and produces the five output files types below.

Extension	Description
.LTM Level Tile Map	This is the tilemap and is the biggest part of the level. Most of the space in page 2 will end up occupied by this data.
.AIM Action Id Mapping	This is a map of the same area as the level tile map, with a 16x16 px (2x2 tiles) grid which holds information of the objects' locations.
.BCM Block Collision Map	Every tile in each tilemap has its own properties. See Tile codes to learn more about these.
.OAP Object Action Property	The different objects have different properties. This file holds the initial values for all the objects.
.PWS PathWay Set	This file holds a set of rectangles. Some rectangles as they are directly defined by the creator in the level as well as tool-generated rectangles. Such a rectangle informs the engine about the boundary of the current level and is dynamically changed as the player moves around.

Pathway rects

The need for good, effective usage of the level map became apparent. In the early development phase, the level was loaded from disk which took time. Ideally all rooms in one level would be present in RAM so warping could happen swiftly. This required each level map to be split into separate areas. I also wanted to enable level designs to be more flexible than just going from left to right or from bottom and up. The need for a defined pathway in a complex shape emerged. An example of what was wanted can be seen below.


Sketch of what kind of level paths we want in a 3-room/area level.

I came up with yet another homebrew solution. The outset is a level of 4x4 screens, like this:

Each screen is shaded to easily see their boundaries.

I ended up with defining paths by plain rectangles which could stand alone or intersect with other rectangles to make a complex shape.

10 rectangles define two rooms and one complex shaped area.

Note: In the image above the pathway rects are shrunk by four tiles at each side only to better show how the rectangles overlap. In reality, these rectangles will always precisely cover one or more screens.

Given these rectangles, I needed to make some logic that the game engine could use to set correct boundaries for the viewport.The idea is that I (automatically) place some small and hidden objects (GeneratedPathwayObject) at different places in the level, and as these are part of the ticklist, their tick-function will be run when they are inside the current viewport.Their sole purpose is to tell the engine about the outer boundary of the level - the area in which the viewport can move freely. In the image below these are red dots in the middle of a screen.

In areas with overlapping rectangles, I pre-create new rectangles which are the combination of two overlapping rectangles. This is automatically done by a tool. In the image below, two of these combined rectangles have been visualized, in green and yellow, and are prefixed with an “X”.

Example level with its 11 pathway objects visualized.

For the sake of simplification, I’ve only named five pathway objects (A, B, C, D, E) with their corresponding plain rectangles (1, 2, 3) and combined rectangles (x1 and x2) in this example. The logic is as follows: In any screen with pathway objects: if it is covered by one rectangle only, the object will point to this rectangle, if it is covered by two rectangles, the object needs to point to the combination of the two rectangles. In the example above, the pathway objects will point to rectangles like this:

A ➔ 1
B ➔ X1
C ➔ X1
D ➔ 3
E ➔ X2

Note: The combination of rectangle 1 and 2 produces X1. The combination of rectangle 2 and 3 produces an identical rectangle, so the engine reuses the same rectangle.

As for how the engine knows about the viewport boundaries at the start of a level and in those bonus rooms: Both the PlayerStartObject and WarpObjects have a pointer to their corresponding pathway rect which the engine uses when starting.

Getting this to work visually is all up to the code for the viewport. Viewport is covered in Detached camera/viewport.

The ingame part of this is rather easy. A fully generic algorithm to calculate all this was more complex. This was done in the make_level-tool.

It might be that there are other solutions to this which are better and much less complex, but the solution I came up with worked automatically on any level without any special coding. However, it came with a few design restrictions, but they weren’t too brutal.

Fundamentals for ingame visuals

Scrolling

Due to performance reasons, the tiledata for this is stored in “raw” fashion. This means that when we move the player/viewport around, there is no preparation of data before sending it to the VDP, we just view what is stored in RAM directly. This is a bit demanding on memory, but is optimal in terms of cpu cycle spend. Some games could benefit from reading directly from ROM, but not this game. This game changes the tilemap during gameplay, for example by using visual effects, and thus RAM is needed.

Showing a part of the level can be visualized like this:

In the sample image above, the viewport is placed at tile_x = 20 (0x14). To display this on screen you would loop through 22 lines. Memory, pages and module dependencies we see that this is in page 2, ie. at 0x8000. So, initially you start to read in memory at 0x8000 + tile_x = 0x8014. You would first send one line to VRAM in the current target-page (the current “write_buffer”). For each new line you will need to advance the read location by (level_tile_length-screen_width) = (96-32) = 64.

In fact we loop 23 lines and not 22. This is because we also support vertical scrolling. This scrolling is handled by VDP register R#23 ”vertical scroll” and R#23 will always have a value in the range [0-7] (pixels). Because of this, the number of (tile) lines we need to send from RAM to the VDP is (screen_tile_height+1) = 23.

To avoid tearing and other visual artifacts we use a technique called double-buffering. From VRAM arrangement we see that screen 4 page 0 has an area dubbed “Name table / ingame (buf 1)” and screen 4 page 1 has “Name table / ingame (buf 2)”. One of them is always the active name table, meaning set in VDP as the one to show (“active_buffer”). When we send data to the name table, we always send to the one that is not active (“write_buffer”). At the beginning of every frame we swap active_buffer and write_buffer.

Double buffering.

When we are using double buffering, it means that everything you see on screen actually happened in the previous frame. A delay of 0-100% of a frame, but you will not notice.

This game features a horizontal pixel scroll, as opposed to a blocky tile scroll used in most games on MSX1/MSX2 and as we know from Konami games. The pixel scroll is achieved by using the VDP register R#18, “screen location adjustment (ADJUST)”. The horizontal value in this register will move in a range of [0,7] (pixels). For every 8 pixels moved, the game scrolls one tile (like in Konami games), and resets x-value of R#18. Doing it like this, and running with Lilly at two pixels per frame, you will get the following result:

How it looks behind the scenes.

Unlike on MSX2+ with the V9958 VDP, there is no hardware support for hiding these glaring, and utterly ugly (my opinion), jerky edges on the MSX2 and V9938.

In Lilly’s Saga we have chosen to hide these edges using sprites. Like this:

In the image above, sprites on the sides are made a few pixels too small and with a outline for visual clues only.

The sprites on the sides will, in every frame, be offset in x-direction according to the same x-value as in the adjust-register.

Sprite split

The chosen solution, to use sprites to hide jerky edges, comes with a problem. From the SAT’s 32 available sprites, 22 are spent as blocks on the borders. This leaves only 10 for the gameplay. Given that most sprites are “dual-sprites”, we end up with a maximum of 5 different characters/sprites, which is clearly too little.

We therefore use the VDP register R#19 “Scanning line number when the interrupt occurs” twice during a frame, and reposition eight of the sprites. That is, we wait until the raster beam has drawn the selected sprites, and then we give the sprites a new Y-position (ahead of the raster beam), which in turn makes the VDP draw the sprites once again. Using only eight sprites for this, we are left with 24 sprites for the gameplay, which is sufficient for this game.

In the image above, sprites on the sides are made a few pixels too small and with a outline for visual clues only.

Page split

We also use a third split ingame, and this is the page split. Similar cases are often dubbed screen split as you would split the screen mode, but in our case, we only change the current addresses for the name, pattern and color tables.

As shown in VRAM arrangement we can see that screen 4, page 2 holds the font pattern and colors as well as the name table for “points” i.e. the scoreboard at the bottom of the page.

Getting the timing of this split correct across all the different VDPs or MSX engine / MSX system is not straightforward, but I I think I managed. To achieve this I’ve had to sacrifice showing the last pixel line of the tilemap. I guess no one ever noticed.

Also at this point, the adjust register (R#18) is reset and the vertical scroll register (R#23) is repositioned.

If I hadn’t put the font in its own page, I would have had to have the font-characters as part of the other tilesets, drastically reducing the amount of available tiles in the game. This wouldn’t have worked well. I found 256 tiles quite limited already.

Sprites and objects

Anything that can be interacted with on the screen is either a sprite or an object. Some are part of a ticklist and some are not.

The ticklist is a list of objects that will have their own “run”-function that will be called once every frame. Examples of objects that are ticklist objects could be MovingPlatformVerticalObject which typically moves one pixel every frame and AnimatedObject (like Fire) which needs to swap out tiles every x frame).

Sprites have their own ticklist, the so-called SpriteTicklist.

The rectangle given in the level-editor marks the boundary for every object, and not necessarily the size. For example, the MovingPlatformVerticalObject may have a rectangle-height bigger than one screen height, but the bar in the platform is only 8 pixels.

All enemy sprites, except the Boss, are 16x16 pixels (i.e. just as 2x2 tiles) and any greater spritespawn-rectangle than this defines the enemy’s roaming area.

The object types

The following objects can be used directly in the level-editor:

In the image above, we see that RezzControl has an attribute called connect. This is a potential reference to another object. Several objects have one or more attributes like this. In Tiled these are shown as lines with arrows between the objects.

Object type matrix

Below is a matrix which describes certain aspects of the object types. In general objects are passed on from the level-editor. Tick means that the object is part of the ticklist, LUT means that is is listed in the LUT, AIM means that it uses AIM, Gen means that this is a generated object from the make_level-script, Script means that the object is used in the make_level-script only and Outside means that the object is kept outside the ingame-engine - it ends its life in the world load-module.

Object type	Tick	LUT	AIM	Gen	Script	Outside
MovingPlatformVertical	✓	✓	✓
MovingPlatformHorizontal	✓	✓	✓
Flag	✓	✓	✓
Animated	✓	✓	✓
PopIn	✓	✓	✓
RezzControl	✓	✓	✓
RevealObject	✓	✓	✓
HiddenHitBlock		✓	✓
MultiReveal		✓	✓
Trigger		✓	✓
ObjectTrigger		✓	✓
Exit		✓	✓
Slowdown		✓	✓
Warp / Trampoline		✓	✓
Crate		✓	✓
LethalObject		✓	✓
PlayerStart
SpriteSpawn (Enemy)						✓
GeneratedPathwayObject	✓			✓
GeneratedTicklistObject				✓
PathwayRectangle				(✓)
CloudObject					✓
Coin (not really an object)			✓

All objects have a static location with the exception of Crate, which can be repositioned in AIM, and Coin which is removed from AIM once picked up by the player.

Object breakdown

Object	About
MovingPlatformVertical	This object (as can be seen in Level construction) is always 6 tiles wide, but has various heights. The height only depicts the bounds. The types supported in the game are called: Pendular One-way-loop Hit'n'drop Hopefully their behaviors are easily understood from their names. This platform is used in different themes and comes in three different looks.
MovingPlatformHorizontal	This object is always 5 tiles wide and there is only one type of this object and it is pendular. Only the Shoreline theme uses this gameplay element. The general idea is the same as for the MovingPlatformVertical, so the width defines the bounds. The height is in practice only 8 pixels, although the object bounds are 16 pixels vertically due to the AIM architecture.
Flag	Most of the time, this object does not do anything. But if hit, then there is a counter that is increased, and if it reaches the max-hits, it will start a sparkle animation which ends with a given set of tiles. Points will be awarded.
LethalObject	Just an area that is lethal, meaning that the player dies as soon as it enters the bounding rectangle.
Animated	It changes the tiles within its boundary in the tilemap every x frame. The object may or may not be lethal (like a LethalObject), but as this object type ended up being used for fire only, it is lethal.
RezzControl	This control is connected to a sprite. It will reposition (/reuse) the connected sprite according to some timing values set. Used on the Rolling Stone and Cannonball sprites.
HiddenHitBlock	If hit, it replaces its face permanently and runs a bump animation or just blinks (depending on a property set) and increases a counter. If max count is reached, it will set its final face and go into a stale mode. If connected to a RevealObject, it will "kick off" this object.
RevealObject	This object is connected by other objects which potentially can visually kick off a revealed object. It will then start with a sparkle animation (hence ticklist), and end up with a given representation. It also holds information about which object it reveals, such as CHEST_POWERUP_2X, CHEST_POWERUP_WINGS, CHEST_LETTER and so on.
MultiReveal	This makes one connect-reference into three. Made out of necessity where I needed to have three sparkles kicked off at once. Ended up being used only one place in the game (removal of entrance to the castle).
Trigger	Connected to a sprite. On level startup, the sprite/enemy is put in a hidden, sleeping mode. Once the player enters the boundary of this object (dark blue below), the sprite/enemy is brought into its original location and will be woken up.
ObjectTrigger	Just like Trigger except that an object is triggered instead of a sprite. ObjectTrigger may for example show or hide a PopIn. It may be set up like this example, where orange is ObjectTrigger and white is PopIn.
PopIn	This object may be visually on or off. This is triggered by the ObjectTrigger -object. Main attributes: Mode: Enable \| Disable \| Swap DefaultVisible: Enable \| Disable RestoreOnKill: True \| False
Exit	Enter the boundary of this, and the game sets the g_uInputMode to INPUT_MODE_AUTO_PILOT, as well as sets some dedicated end-of-levels values for the autopilot system. Autopilot just controls the character for a short while doing a pre-made sequence of movements.
Slowdown	As it looks like in Tiled. When the player is inside this rectangle the movement speed is reduced.
Warp / Trampoline	Enter this, and g_PlayerState becomes STATE_WARP (and player control is lost for a little while). Depending on the warp, but in case of a suction-into-a-trunk-variant, a singleton dual sprite for the trunk is placed on top of the trunk, and the player is animated "into" it before the reverse happens after a change of scenery. The same object can also do the same effect without showing the occluding dual sprites. The object also supports the player being "shot out", like a trampoline.
Crate	The crate is a very special object: It is the only object in the game that repositions itself in AIM. When stopping it will store a copy of the tiles it occupies in memory, and replace level data with crate-tiles. When it starts moving, it will first replace its current crate., tiles with the copy of the original background tiles (which it has stored) in the level/background data and pop up a reusable dual sprite showing a crate. There will never be more than one crate-sprite showing at any given time. Crate fall speed is set quite high to ensure this. All sprites that are part of the SpriteTicklist will get a tick every other frame. This did not look smooth when the player is updated every frame. Therefore, the crate dual sprite is tightly tied to the player movement and is updated every frame. Coding wise the result is ugly, so don't tell anyone🙂
PlayerStart	Normally there is only one of these objects on a level, but if there is a boss room as well, there will be two. Main purpose is to hold the location of the level kick-off.
SpriteSpawn (Enemy)	These objects will create the different enemies (a Sprite-object) during the loading of a level. Enemies will be placed at the correct place and put to sleep.
GeneratedPathwayObject	These are special objects made by the level_tool, and are based on Pathway rects set by the level designer. They are never interacted with by the player, but informs the engine about the (changing) bounding rectangles for the viewport/camera.
GeneratedTicklistObject	Instead of having one big global ticklist, the make_level-script splits up the bulk and generates reduced size lists for various places in the level.
PathwayRect	Most of these rectangles are made by the designer during level creation. In addition the make_level-script will combine some of the rectangles and add to the list of rectangles used by the engine.
CloudObject	A cloud is a set of "cloudy" patterns that moves across two lines with 16 tiles. To make sure that these patterns do not rotate out of the tiles, the correct tiles need to be placed out in correct order and at specific x-values in the level (only every fourth is allowed). The tool uses the objects to place the correct tiles into the tilemap, and the cloud object ends its life when this is done in the make_level-script.
Coin (not really an object)	From the outset coins are only tiles placed in a coin-layer. In the make_level-script, these coin-tiles are found and copied into the level data. A volatile, special coin-id (255) is put into the AIM. This id is a special value and cannot be used as an index in the LUT.

Triggering or hitting objects

Normally, when the player moves around, the code does not check whether the player collides with some of these objects, with the exception of the following which are checked for every frame:

Animated (fire)
Lethal (spikes)
Slowdown (pond)
Trigger
ObjectTrigger
Exit

For other objects to be found/identified/triggered, a tile collision needs to happen first (see Environment collision and tile codes). For example, the chest tiles have collision-codes in them, so a collision will be triggered, and then the object will be found and in turn the corresponding chest-open functions will be called. Performance wise if it is nice to avoid checking for everything every frame.

Sprite setup

The sprite looks

The main sprite sheet looks like this:

This sheet effectively produces 106 different 16x16 sprites.

The game also has an alternate set of sprites, which in boss rooms swap out 46 normal sprite patterns at indexes 6-51 (see also Sprite pattern table(s)):

The game uses a mix of single layer and dual layer sprites. The notion of a dual layer is to indicate that two sprites are placed on top of each other to enable the possibility of using a technique called the OR-color, effectively producing a third color on a single line using 2 sprites only.

How two colors becomes three.

The SAT

The sprites are meticulously set up in the Sprite Attribute Table (SAT) with these pattern names:

The SAT can only display 32 sprites at any given time.

The SAT is placed in RAM and read and written to multiple times during a frame. The whole SAT is sent to the VDP at the beginning of every frame and when the raster is in VBLANK.

Despite having 128 kB VRAM at our disposal, we use only one SAT – and as we can see in VRAM arrangement, this is the green block in page 0. That is 128 bytes for the main SAT and 512 bytes for the corresponding color information. The color information for each sprite is kept in RAM, and sent to VRAM only when an update is needed.

Sprite pattern table(s)

From VRAM arrangement we can also see that the game uses all (eight) pages for sprite patterns. One sprite pattern table holds 64 16x16 sprites. By using all eight of them, we get a theoretical amount of 8*64 = 512. Now, we can only have one page active at a time, but still, we can utilize this and this is how we are able to get 106 sprites available in this game.

In non-boss-rooms, the sprite patterns tables are set up like in the following table.

Here you can see that we can easily enable one animation frame on the player by just setting the page. Given that the SAT had sprite pattern 0 and 1 set for the player dual sprite entries (#18 and #19), then, moving the player rightwards would imply setting page 0, 1, 2, 3 and then repeat. If we need to change from walking to flying, we also need to modify the SAT by setting pattern 18 to value 4 and pattern 19 to value 5. Any other enemy does its animations by changing the pattern-value in the SAT only.

Due to performance reasons, the colors for the player sprites have been made identical across all frames, and hence no need to update color information in VRAM between frames.

All player sprites have the same color definition.

Colors in animation frames for the various enemies are set up in the same static manner.

When in boss-mode, we just replace all the dual layer enemies in VRAM and it becomes like this:

Sprite/Enemy variants

Main sprites

The “main” sprites are shown in The sprite looks and is dubbed single or dual sprites. Common for all of them, is that they have a 16x16 pixels dimension. The sprite-object has a function pointer called pTickFnc. If a sprite is awake, this function will be called once every other frame. There are many tick-functions for sprites in the code, and a few examples are:

sprTckEnemyBulletLeft
sprTckEnemyBulletRight
sprTckEnemyBulletDown (bullet dropped from Cloudy)
sprTckEnemyUniDirOneStep (shared behavior by several enemies)
sprTckEnemyCloudy
sprTckEnemyGhost

Sprite flags

The sprites may also come with some special flags either from the level-editor or from the spawning process. Some of these flags are:

SPR_FLAGS_KILL_ON_BUMP
SPR_FLAGS_ALTERNATE_ENEMY
SPR_FLAGS_YOFFSET_AWARE

SPR_FLAGS_KILL_ON_BUMP will kill you even if you try to stomp it, like the Rolling Stone or the Evul Blob. Some sprites support the SPR_FLAGS_ALTERNATE_ENEMY-flag. This flag is used when the normal visual representation is ok, but an alternate behavior is wanted. For example, this is used on the Rolling Stone that moves vertically. The pTickFnc of these objects becomes:

sprTckEnemyRollingStoneAlt

Some sprites are placed on hanging bridges, and to make them look good, they need to spend some extra cycles to align to the lowered ground. We inform the code up front about this by using the SPR_FLAGS_YOFFSET_AWARE-flag.

Boss mode sprites

The sprite-handling in the game is built around the aforementioned 16x16-sprites, but there is another set of sprites too - the boss-mode sprites, as shown in Sprite setup. The boss himself is 32x32 pixels or merely 4 sprites placed in a 2x2 grid.

➔

This code for this was somewhat an afterthought and isn’t overwhelmingly elegant and pretty much abuses the 16x16-system and the rental system (see below). The upper left sprite works as the “master sprite”. It directly controls the three others, the followers, as well as itself. Controlling means location, and sprite patterns (the sprite colors for sprite 0 equals those for sprite 1, similar for sprite 2 and 3, hence colors do not need to be updated during boss fight). The boss sprites enters the game with these flags:

SPR_FLAGS_BOSS
SPR_FLAGS_BOSS_FOLLOWER (only three sprites has this flag)

And to elaborate, the boss can only be inflicted by stomping the hunchback, so during the boss fight, the two upper sprites, 0 and 1 will alternate on having the SPR_FLAGS_KILL_ON_BUMP-flag set, all depending on the direction of the boss.

The bats are normal 16x16 sprites with their own tick-function.

Sprites rental system and reused sprites

A level may have, say, 40+ dual-sprite enemies and some of them would be respawning all the time. Many other sprites would also be temporarily shown during a level, tree-trunks, crates, exploding bricks and floating points. A certain flexibility for sprites to come and go on the screen was needed in the engine.

The VDP has a maximum capacity of 32 sprites on the screen at any given time, and the idea of sprites “renting” a slot for a limited time came up. Some slots' content would be rather static, while others would change it dynamically.

As shown in The SAT we see that all entries except the eight border masks have so-called rental-ids. In practice, this means that there are 24 visual sprite “slots” available to gameplay (note that indexes start on 1, as 0 is “illegal value” in this system).

The rental system (special: the player data is not placed in the heap).

From the table in The SAT we can also see that colors are written to VRAM once (i.e. upfront) for all sprites but the enemies. Enemies need to update color in VRAM according to which enemy that is shown at a given moment. Because of this we keep track of a sprite’s colors via its “SpriteSetID”. No need to update VRAM with colors in a slot, if the previous enemy was identical, and VRAM already had the correct colors set.

The rental table is the source for content in the SAT. Like, once every frame a routine traverses the rental-table and translates their 16-bit positions into screen space (8-bit values).

Beware: As 216 is a special value for the Y-position, I had to make a special check to avoid it. If you place a sprite at y-pos 216, all sprites with lower sprite-id/layer will be hidden. A good example where a magic number is harmful.

Player movement and viewport

Player movement

Player states

At any given time, the player will be in one of 8 states. The states are as follows:

State	Explanation
STATE_NORMAL*	Currently standing or walking on something.
STATE_JUMP_UPWARDS*	Currently moving upwards. jump-state is ended as soon as the end of the jump curve is reached. State enters STATE_FALLING after this.
STATE_FALLING*	Currently moving downwards.
STATE_CLIMBING*	In a ladder.
STATE_WARP	In or out of a warp. User cannot control the heroine in this state
STATE_ENTER_KILLED	Actually killed, but a transition into the killed state is nice to separate some prep-work in one state, and continue with the STATE_KILLED "work" in the next frame. The user cannot control the heroine in this state.
STATE_KILLED	Being killed animation and similar stuff running. The user cannot control the heroine in this state.
STATE_YEAH_POSE	The user cannot control the heroine in this state. Mostly a state to tell the sceneCalcPlayerFrame function to show the correct sprite face.

* in these states the player can always (attempt to) move left or right, but the vertical movement control varies.

Movement curves

Horizontally there are no movement curves and no inertia in this game. This is on purpose and gives a somewhat unnatural movement, but gives maximum control over the player. Vertically on the other hand, a certain gravity is added.

The jump curve is as follows (numbers are pixel additions in y-position per frame.

STATIC_PLAYER_JUMP_PATH[] = { 6, 5, 4, 3, 2, 2, 2, 1, 1, 1, 0, 1, 0, 0 };

The game has a feature that lets the player jump higher the longer he/she holds the jump-button down. In the code this is handled so that the pointer to the current location (which is at the first element) in STATIC_PLAYER_JUMP_PATH isn’t progressing while the button is held down. The duration is measured in frames, and the maximum frames for this is set to 5. I.e. if the player holds down the button for max jump, the jump-pointer will “spin” five times on the first element, and in practice become like this:

STATIC_PLAYER_JUMP_PATH[] = { 6, 6, 6, 6, 6, 5, 4, 3, 2, 2, 2, 1, 1, 1, 0, 1, 0, 0 };

For falling, things are similar:

STATIC_PLAYER_FALL_PATH[] = { 1, 0, 1, 1, 1, 2, 2, 3, 3, 4 };

In the case that the player didn’t land before the end of the fall-curve was reached, it will “spin” on the last value until landed. We see that the maximum fall speed is 4.

Environment collision

You need proper collision to create magic movement.

As the most basic environment interaction, we have player collision. Collision with tiles and with objects (AIM). There are many things that needs to be checked for, like:

Obstacle in front of player (left or right)?
Standing on something or falling?
When standing on something, how much is it offset?
Bumping head into something?
Standing in something lethal (fire or spikes) or something that slows down (water)?
On, or close to, a ladder?

A variation of the above points needs to be figured out and is depending on whether the player is standing still, moving horizontally and/or falling or jumping or climbing or being on a moving platform.

Initially the collision checking was placed in all kinds of various places in the code. As soon as new features were added, these checks became quite costly (wrt. performance) as many of them were carried out during the same frame. The execution time for collision checking during a frame varied a lot. The idea of putting extraction of the relevant tile codes and potential objects into a common, per-frame function with a more predictable execution time, came to be. It made sense to quickly retrieve that neighbor tile data when you already had done all the heavy calculations finding data for one tile. An analysis of all needs and combinations was carried out, before a little system was designed.

Visualizing which tiles that needs to be checked for collision.

The player is 16x16 pixels. If the player is perfectly aligned with the 8x8 grid in the back, the player would occupy the brightest/saturated tiles in the drawing above. Most of the time, the player is offset 1-7 pixels in both x-direction and y-direction inside the upper-left tile which is marked in red in the image above. That means that the player is occupying the medium bright tiles as well. In theory touching 9 tiles and 4 objects. In practice, and depending on the implementation you might end up checking a few more tiles/objects as you add jump speed to the position when jumping, walk speed when walking and fall speed when falling.

Below is a mapping of what kind of data that needed to be stored once up-front in each frame. The data was stored in some global variables, easily accessible to the rest of the code down the line (i.e. same frame). Note: “Below”-designation in the matrix below may seem misleading – it just means the tiles below the ideal 16x16 rectangle.

This matrix shows which values we need to store for later and easy use in the current frame.

The above scheme looks easy enough, but in practice it is a bit more complicated. For horizontal movement you would care for values in either the left or the right column – not both during the same frame. Initially tile codes, object-ids and tile-ids are reset, and if moving left, the x-pos will get current_walk_speed subtracted before the left column data is retrieved. In case of a rightward movement, x-position will get current_walk_speed added instead, before the data is retrieved.

It is similar in vertical direction. In case of g_PlayerState being STATE_JUMP_UPWARDS the current_jump_speed is subtracted from the y-position before retrieving the data. In case of STATE_FALLING current_fall_speed is added to the y-position.

In the actual implementation of this, there are various optimizations done. For example, if a state and/or direction dictates that some values will not be used in the current frame, they are not retrieved and stored. Say, in the case of falling and moving rightwards, upper-right and lower-right cells in the matrix will never be used.

It is up to the surrounding code to make good use of this data later in the frame.

Input modes

The game was also in need of a few variants of input modes, meaning handling the input for the player movement. The follow modes are in play:

Input mode	Explanation
INPUT_MODE_NORMAL	Normal control/gameplay.
INPUT_MODE_AUTO_PILOT	A few times, we need to take control of the heroine. The user's input has no effect at this point. For example when coming to the end of the level, walking out of the screen. Or when entering the boss room. In this case there is an autopilot system that kicks in with a state machine which does various things depending on different steps in defined sequences.
INPUT_MODE_RECORDING	This state is just as INPUT_MODE_NORMAL but the user input is stored in RAM for use later in INPUT_MODE_PLAYBACK. This mode is not available to the players in the final game, but is used as part of the development to record scenes which are going to be demoed when the game is waiting for players.
INPUT_MODE_PLAYBACK	Used for demoing levels. This state is just as INPUT_MODE_NORMAL but the user input from an earlier INPUT_MODE_RECORDING is fed into the engine. If the user actually touches his/her input devices, the demo is aborted.

Detached camera/viewport

How it used to be

Early versions of the game used to have a strict relationship between the position of the player and the viewport. As in many classical games, the player would always be centered in the middle of the screen, except when the player was close to the boundaries of the level. Example:

Rigid viewport.

I wanted to add a smoother, more modern approach to the viewport movement and, in a way, detach it from the player.

Horizontal direction smoothness

he majority of the game’s levels are horizontally oriented, so the best experience is implemented here. I wanted the player to see more of the upcoming parts of the level, and less of the things behind. About 65% of the scene in favor of player facing direction, like illustrated here:

In addition, I wanted the viewport to ease-out and ease-in as shown below.

Like a rubberband.

The mechanism behind this is something I called the “accelerometer” in the code. A bit misleading maybe, as it doesn’t really measure the acceleration, but it controls it. It’s an array of speed increments as illustrated below:

The AccelPointer is an indexed, unsigned char, value. From the outset, the AccelPointer will be placed at the midpoint (green). At this position, 0 pixels will be added to the viewport-movement during a frame, i.e. standing still. If your player moves in a direction, the game will try to uphold the ~65% visibility. When needed, it will move the AccelPointer one step in the wanted direction, either left or right. As long as the direction/movement continues (usually by holding a key down, but it could also be a moving platform that moves your player) the Accelpointer will increase its position one step per frame all until it reaches the min- or max-position. It will stay here as long as the player moves (no obstacle or level boundary is met). As soon as the player stops, the viewport will (soon) reach ~65% visibility and when that happens, the Accelpointer will seek to reach the green midpoint by moving towards it one step per frame.

The game has a small shortcoming here. The speed of the player is 2 pixels per frame, and the max speed in the accelerometer is 2 as well. Ideally the max speed of the viewport could be higher than the player speed to be able to catch up when the player runs with the viewport lagging behind. However, optimizations in the parallax-scroll restricts the game from scrolling faster than 2 pixels per frame, so this was not feasible.

The drawing above is a bit simplified, showing only 8 steps in one direction. In the actual game we use 23 steps, but the concept is the same. Also, the real implementation needs speedier movement when the player is in full action and running around, compared to when standing still. Tests showed that fast paced viewport movement when the player was relaxing (standing still), looked stressful. To accommodate this, the game has two “accelerometer”-arrays. One fast, and one slow. The slow one is pointed to when the player is not moving. The AccelPointer is shared between the two arrays.

Vertical direction simplicity

The vertical part is handled differently. In this case we can move freely in 20% of the middle area. When moving out of that, the viewport moves rigidly and according to the player.

40% player facing visibility in vertical direction.

Input-handling

The game vision was that this game should be fully usable by a joystick. Just like a classic Konami game or a game on a Nintendo console. I also needed the keyboard to work as many prefer playing with that, so some keyboard handling was needed. The game used two joystick buttons, called A and B. A is always SPACE, but I needed to decide something for B. As in console games, a button could mean different things depending on where you are in the game. I needed a key for Pause (F1) and Back (ESC). Joystick B-button was mapped to F1 and ESC.

Due to performance-reasons mentioned in Dedicated ISR, no H.TIMI or BIOS usage, key handling is not using BIOS. This means that you are on your own, and cannot get any help from the system to handle repeat-keys, key combinations or different keyboard layouts (see https://www.msx.org/wiki/Keyboard_Matrices - notice the Russian keyboard being really different from all the others). The two first weren’t needed in this game, so that part is easy, but the latter one can be more difficult and had to be taken into account. To easily comply with the latter I had to make sure that I use keys that are located at the same place in every known keyboard matrix out there. Arrows, space, F1 and ESC complies with this.

Once you do your design like this, you can read input data incredibly predictably (cpu cost wise) and efficiently, just a few cycles needed when going directly on the PPI ports (0xA9/0xAA and 0xA0/0xA1/0xA2).

Animations

The animation processes vary a bit, and the different types would be:

Update part of the tile pattern in vram. (“part pat”)
Update the full tile pattern in vram. (“full pat”)
Update the full tile pattern in vram, plus changing the palette entry. (“full pat / pal”)
Update both parts of tile pattern and tile colors in vram. (“part pat, col”)
Update by replacing select tiles in the tilemap in RAM. (“tiles”)

Water waves (type 1 - part pat)

There are two types of wave animation, the sea-animation and the pond-animation. Both are type 1 in the above list. Each animation uses two tiles, and out of eight bytes per tile/pattern, only three bytes are modified in each tile when updated. The update patterns are just pre-calculated and stored in ROM.

Three sets of “duo-tiles” are animated in shoreline levels.

Candle (type 2 - full pat)

The candle animation is only to be found in tileset 2. Only one tile is used. This one is implemented as type 2 in the above list (but type 1 would have been a bit faster). The process is that we make a plain copy of the tile in RAMat startup, as well as a mirrored copy. Then the game alternates between these two.

Coins (type 3 - full pat / pal)

Coins consist of four tiles and they are the only ones with type 3 above. Color index 0 is reserved for the coin animation and animates constantly between a few shades of yellow.

Bump-tiles (type 4 - part pat, col)

Only one 16x16-block is allowed to have a visual bump on the screen at any given time. It is therefore possible to replace six original tiles with six dedicated “bump” tiles in the tilemap for a short time while a type-4 animation is running on these tiles. The tiles used for this are the six X-like tiles in row 3 in all the tilemaps. Any four tiles can be copied into these bump tiles, making it very flexible, and no premade patterns are needed.

Moving platforms - vertical (type 5 - tiles)

Visually, only two rows of tiles are drawn at a given tile position inside the bounding rectangle of the object.

The tiles are picked from eight premade pairs. These are the ones used in the Shoreline theme, but there are two other variants used in the game as well.

Moving platforms - horizontal (type 5 - tiles)

This animation uses 6 tiles which are “printed“ inside the boundary.

... where premade tiles like these are used.

Fire (type 5 - tiles)

The object changes the tiles within its boundary in the tilemap every x frame. Each time it will look at the tiles inside the boundary and alternate between adding or subtracting the value 4 to the current tile id.

... defined like this in the editor.

The tile map must have the correct tiles laid out in groups of four, like this.

Sparkle (type 5 - tiles)

The tiles used for making sparkles. They are in turn put into a 2x2 grid in two slightly different sets of animations. The cannon firing and the reveal-effect, below.

Audio support

PSG for sound effects using all three channels

I used an open source PSG ayfx-replayer for a while and was, technically, quite happy with the ayfx-format. It was quite easy to understand, closely mapped to the PSG-internals and worked similarly to the VGM-format which I had become familiar with. There was an easily available editor to create and play effects for windows (ayFXEdit) and there was a huge library available as well to get you started.

After some time it became clear to me that this replayer typically used one channel only and cut off an ongoing sfx if you chose to kick off a new one. This was likely done because that would enable users of the replayer to mix in PSG-effects with PSG-music (leaving at least two other channels for the music at any given time). This didn’t apply to this game, as the music was decided to be MSX Music only.

I had potentially many sound effects running at the same time, as the player would typically jump (➔ sfx) or land (➔ sfx) when there is a chest being revealed behind some sparkle (➔ sfx), and maybe the player was almost running out of time (➔ sfx)... and on top of this, maybe there is bullet firing (➔ sfx). And so on. –There are many situations where we can easily get 3 or 4 effects running at the same time.

So, yet again, I needed to make my own solution, “ayfxplayer.s”.

I wanted to use a free channel for every new sfx, until all were in use, that is, all three. Furthermore, if there was a fourth sfx to be played, I added a priority system to the various sound effects. If all channels were in use, the algorithm would look at the channel with the lowest priority, and match up with the priority of the new sfx, and replace it (cut off). For simplicity, and maybe reusability, I wanted the replayer to be based on the ayfx-format.

Sfx-overview with Priority from the design doc.

Sfx clip length (“Len”) and “ingame priority” (“Pri”) are mostly directly linked (marked red if not). Short clips were unlikely to be cut off (and wouldn’t matter too much if it happened), but cut-off of a longer clip was very audibly apparent, and thus got higher priority.

AyFXEdit is not storing envelope data, so use of the envelope-register is “optimized away”. As the clips are so short, I didn’t bother to cater for the PAL/NTSC-situation. The only thing we need to be aware of with this solution, is that the channels share noise generator.

MSX Music (FM) replayer for the tunes

I had researched the possible audio-chips up front, and landed on FM - MSX-Music aka OPLL or FM-PAC. Mostly because of the widespread and that MSX2+ had chosen this as the standard for MSX going forward.

The replayer functionality.

I didn’t want to spend more than 10% of the cycles per frame on music and ideally even less than that. After researching the replayers of various formats, I found that “streaming” the raw music data directly to the chip is the fastest you can go, so I pursued that idea. First I needed that raw data, and then I needed a tool to get that data into my own internal format. I ended up calling this format FMR, as in “FM Raw”.

This is where the VGM-format and VGM-recording software enters the scene. The composer could use the composing tool of choice, MoonBlaster in this case, and while playing the music in this tool we made sure to record this playback using openMSX, which would dump the VGM-file to disk (see vgm_rec).

Music workflow.

The VGM-file has a bit more information that we needed, so a lot of this is stripped out and a more suitable format is used ingame. Around 2/3s of the file VGM-file were removed in the conversion process. In the end we ended up with FMRs from 3 kB to 168 kB, i.e. 1 to 11 segments.

By default, using VGM on an OPLL chip, you will have these issues:

There is no “master volume” enabling volume-control or making fading easy.

You cannot easily stop a tune, and resume at the same point.
Usually the first frame is very cpu-heavy as all instruments are being set up.
You need to obey special waits between each byte of data you send to the chip.
The tempo is given by the tempo in which the tune was recorded.
Starting at a “given” location in a tune is hard - finding the correct byte is hard, and many sounds are dependent on data that was sent to the chip somewhere earlier in the data stream.

Also, in general, the OPLL is made so that:

You cannot query the chip to check for what values the different registers/channels contain.

The replayer ended up with these features:

It handles files sizes bigger than 16 kB (multi segment switching).
Segments are fully utilized, no empty data at the end of the segment because of aligning data.
It is nicely optimized and obeys the wait-cycles described by the chip vendor, 12 and 84 cycles (see the manual).
It supports fade-down.
It supports looping.
It supports “jump-in”, meaning starting mid-tune, at pre-selected places (we use this in the game, to kick-off mid-tune instead of at the start when you return from bonus-levels, chest-openings and so on).
It plays back the music at “identical” speed in 50 Hz and 60 Hz.

The converter ended up with these features:

Reduces the data size.
Making the stream “frame accurate”.
Keeps track of the volume of each channel.
Supports workaround for an infrequent bug in the YM2413-chip related to drums.
Cuts off silence at the end of the tune (due to recording).
Shows analysis/visualization of the data-stream.
Produced stubs and new entry-points for starting “mid-tune” as well as loop-point.
New-starts and loop-points were manually added in the stream by exploiting PSG-instruments, which were not used anyway.

Wolf’s tunes are often quite advanced with lots of writes to the registers every frame. All channels are also in use in most tunes. Despite this, the average cpu-spend lingers around 1000 cycles, with frequent frames around 3-4000 and the worst case around 7000.

Find more details about this under vgm2fmr.py.

Non-ingame features worth mentioning

Proportional font

Proportional font used in the intro/story.

sing “standard” monospaced characters with a width of 8 pixels gives you only 32 characters on a line. This makes sense in tile-based screens, like screen 2 or 4, but not necessarily in screen 5. I quickly designed a proportional font, and with this I would typically get 50% more characters on a line. This is more efficient with regards to space and is more pleasing to the eye. It comes with a certain performance penalty, but it didn’t affect me in this game. The penalty is occurring because my characters, including space, would sometimes have odd pixel widths, and this means that you cannot use the “high speed” VDP copy commands which works on bytes, so you need to use the slower, logical commands that work on pixels (and in screen 5 each byte holds 2 pixels).

The way it is done in this game, is that the characters in the font sheet are placed out in an 8x8 pixel grid. Just as would be normal for tilemaps in for example screen 2 and screen 4. This makes it easy to calculate the upper left corner of the source location for a character. In addition I just read out the width from a simple table of bytes stored in ROM. The height is set to a static 8 pixels. The text-drawing algorithm adds space by a pixel between every character. This solution enforces a restriction on the width of the characters or symbols, and that is that the width of a given symbol cannot exceed 8 unless the character next to it will not be used.

The screen 5 font file, stored in a non-visible page.

From the font file we see that there are four fonts stored.Only the first one has numbers. The system distinguishes them by having a different “origin” or offset for each.

The (A) and (B) symbols have widths of 9 and 10 and the “v^ SCROLL - (A) BACK” has a width of 76.

Slot selection

With 5 populated slots.

Slot selection mimics the behavior we always see in console games. This is not often seen on the MSX. The hefty performance-requirements ingame due to full-screen scrolling was not present in this particular screen and we could therefore improve the visuals somewhat, by using screen 5. Finally we got rid of the requirement of only 2 colors per 8 pixels. First, the backdrop was taken from the game in screen 4, and converted to screen 5, and then we started adding elements with higher density of colors. A slot gets its graphics from some special theme thumbnails as well as some symbols from the font file.

The background image for the slot selection.

The theme-thumbs stored are in a non-visible page.

At some point I also got the idea of making a press-down 3d-effect which made sense to explore in this screen. This is done by offsetting/moving an image or rectangle by (+1,+1). You can also see that some fonts and symbols in the font file in the previous chapter have shadows as well as no-shadows for pressed down/up states.

3d-effect in action.

The VDP commands are really slow, so to get this to look decent, I had to prepare the different buttons and slots in a non-showing page (at runtime at selection change), as can be seen in the VRAM-dump below. The results are then copied over to the page currently being shown in one copy-command at a time.

How the backbuffer (page 3) looks at runtime. The visual noise is sprite-information from screen 4’s page 7 and 8.

To be able to write text in both black and white, using the same font file, I made sure to set up the palette properly and use the logical operations for copy-commands found on the v9938. The TEOR-operation does an XOR with target color for all pixels != 0, ie. color 0 in source (font file) is “transparent”. This gives a different result when using TEOR on the white background. Yes, the level flag, coin and heart are also using TEOR. The theme bitmap is not. TEOR comes with a performance penalty, so I don’t use it more than strictly necessary.

Slideshow

To present the story I wanted something simple. Something clean, subtle and not over the top, technically. I did not want the presentation to be more technically advanced than the game itself.

Showing two "slides" from the intro / main story

Slideshow was needed at:

The intro / main story.
After each boss when a stone has been found.
End of game, game completed.

The key concepts were:

Scriptable.
Typewriter-effect.
Support images.
Fade in/out between slides.
Crossfade from one palette to another.
Ability to abort.
Should have the same duration when run in PAL as in NTSC due to music sync.
Support for various/custom waits.
Simple - presentation (meaning just a few different slide “templates”).
Simple - technically (no stress on optimization or code structure).

C source code snippet. For … lazy reasons the scripting was part of the source file.

As screen 5 images are 32 kB at full size, the images are split into 2 parts (for example ROMLOC_STORY_3_A and ROMLOC_STORY_3_B above). They are unpacked into memory and uploaded to VRAM in turns.

Newline (\n) is supported and there is also support for special “commands” which trigger at the time it is supposed to be typewritten to the screen:

| : non-printed command followed by a single-char parameter with special meaning

Parameter:
~ : kick off crossfade

^ : abort (do not fade out - program execution continues without change on screen)
<1-9>: wait half-seconds, i.e. to get 3 seconds wait, you must state 6
0 : wait 2 PAL frames (needed for fine tuning)
* : wait 28 PAL frames (needed for fine tuning)

A possible shortcoming of the solution is that the duration of each slide depended on:

the amount of text (each typewriter print spends x frames)
the specified delays added in the script

This meant that changing the wording on a “slide” suddenly changed the duration of the slideshow, unless you adjusted waits/delays accordingly. As we needed to synchronize the slides with the music, all this was a tad cumbersome, but it worked out well in the end.

Open chest screens

How it looks when you open a chest with the 2x token.

Despite having a megarom available, I’ve been trying to be conscious of memory usage in several places in the code. The chest open has 8 almost identical screens. The screen is composed of a big base image and one smaller showing the item.

The base image.

It’s been hard to find documentation on image formats on the MSX. There seems to be some raw, uncompressed screen 4 (.sc4) and screen 5 (.sc5) formats in use here and there, but I failed to find proper documentation on them. They also seemed to always be fullscreen, so I baked up my own simple sc5-format which has a width and height in the header.

Example 1: The 2x token.

Example 2: The wings power up.

Example 3: The pouch pick-up.

The base image uses the first 9 colors of the palette (they are static/shared and part of all images), while the remaining 7 can be used freely by the other, small image(s). At runtime, the palette used for each chest open screen is taken from the small content image.

Boss

Boss and his bat up close.

Contrary to many games where the boss is the ultimate encounter, I never wanted this part to be the highlight of the game. My focus was on the “other” part of the game. The other 95% where the gameplay was different and where the player spent most of his/her time. Still, I needed a boss to round off the world and to adhere to the story. We needed a non-scrolling screen / room, inspired by games like Castlevania and similar games. It also made sense to make the boss bigger than the player. 4 times bigger was the maximum of what we could do, given that we needed animation sprites as well.

A boss would need its own behavior, which meant custom code, as well as custom graphics and animations. With limited gfx-resources we settled with the idea that you meet Abaddon four times, where he escapes the three first times. He tricks you by offering a false white flag 🙂

Down and beaten, but not dead. Recycling FTW.

As this is a non-violent game, stomping is still the way to fight. To inflict damage the player needed to stomp Abaddon’s weak point, the hunchback, i.e. hitting (“falling” from above) one of the four boss sprites only. If you come near any of the other three sprites, you lose a life. See Boss mode sprites about how this is set up. As a result of this setup, the sprite 0 (the main sprite in the upper left corner) and sprite 1 alternates getting the SPR_FLAGS_KILL_ON_BUMP-flag set according to the direction of the boss.

Abaddon’s symbol was a bat, so his weapon had to be bats. These cannot be stomped or killed (exception is during the first 3 seconds after a player's resurrection when she is blinking). The sprite rental system allows for 6 dual sprites or enemies on the screen at once, and as 4 are taken by the boss, only 2 are left for bats.

Performance wise, adding in a lot of custom code for the boss was not a problem. As no scrolling was being performed, there were lots of cycles available every frame. I could also write it all in plain C, which is way more effective for this kind of programming.

Boss movement

I wanted the boss to walk towards a point and against the player, to do a small jump and to spawn bats by charging. I also wanted him to stand and think a bit before doing his next move.This ended in a state machine as described below. The main state is BossAttackState and sub-state is BossState.

BossAttackState possibilities:

BOSS_ATTACK_STATE_FROZEN - Relax before battle starts.
BOSS_ATTACK_STATE_THINKING - Waiting, may turn at the end of wait, may decide to enter charging or attack.
BOSS_ATTACK_STATE_ATTACKING - Moving towards player (has a target pos to reach).

BossState possibilities:

BOSS_STATE_NORMAL - Nothing happens.
BOSS_STATE_MOVING - Moving and will enter jumping when target pos is reached.
BOSS_STATE_JUMPING - Will soon enter falling.
BOSS_STATE_FALLING - Will enter normal state and start thinking at the end of falling.
BOSS_STATE_CHARGING - If thinking ended in charging, bat spawns.
BOSS_STATE_POST_CHARGE - A small delay after spawning bat, enter moving after this (and turn if needed).

To set up different difficulty in the levels I decided on these parameters:

HitAmount: Amount of stomps before defeated.
ThinkMaxCount: Number of frames thinking (standing still).
WeaponChance: After thinking, a random number, val, in range [0,127] is generated. Weapon triggers if val is within [0,WeaponChance].
MaxConcurrentBossWeapons: How many possibly present on the screen at once.

In the final game, the difficulty definitions ended like this:

World	Hit Amount	Think Max Count	Weapon Chance	Max Concurrent Boss Weapons
1	3	60	0 (N/A)	0
2	5	60	80	1
3	5	60	80	2
4	7	20	128	2

Bat movement

I went for a super simple path and simple logic for the bats: face the same direction as the boss when spawned with the same speed upwards, which means that bats always move in 45 degrees. Bounce when hitting the wall. After a maximum of 6 bounces, the bat flies out of the screen, and can be re-spawned.

Visual effects

When entering the boss room I added a full screen shake effect. This is easily made possible using the VDP screen adjust registers, the same being used for the smooth horizontal scrolling. A dedicated “slam door” sound effect is played at the same time. The screen shake is also used on the final blow on the boss, and a reduced variant is used when the boss spawns a bat.

Entering the boss room in 50% speed showing screen shake.

When Abaddon charges and spawns a bat, I added full screen blinks, which are supposed to illustrate lightning occurring. This is achieved by changing the black palette entry for a few frames. I kick off a lightning sound effect at the same time.

Lightning effect when spawning bats at 25% speed.

When you have finally beaten Abaddon, he dissolves, as can be seen below. This effect is used in one place only in the whole game. The effect is quite costly, but there isn’t much else going on at the time, so that works fine within each frame after all. The algorithm uses an array of pre-calculated masks in ROM which I made using google sheets or something. The algorithm runs through all 144 boss 8x8 sprite patterns and applies a mask onto the original pattern and uploads to VRAM. After 64 runs, all pixels are masked out. Boss sprites are originally 16x16 but can also be treated as 8x8.

Boss finally beaten and facing dissolvement. Rendering at real time speed.

Full screen text scroll (options)

I was in need of a simple text scroll for game info, credits and such. Going smooth was a requirement of course, so the decision landed on screen 5 and use of the screen hard scroll register, VDP R#23. In a screen in the options section, not much other than music playback is going on, so I decided to write functions in C to handle this scroll. For everything except the low level proportional font “blitting”, that is.

The basic idea is that the VDP-register(#23) can get any value (offset) from 0 to 255, which means that the contents of every VRAM address in the current page can risk being shown as pixels/colors at a given point. Thus, the sprite data in this page could not be active.

The algorithm uses a slightly modified variant of the normal text drawing routine described in Proportional font. The difference is that the height of the source bitmap is 1 vs 8 in the original routine. And when calling it, an y-offset is added to the source’s y-pos. The result is that it only “prints” a thin slice of each character, and this makes it fast enough for a 60 FPS scroll.

With an offset of 3, only this part of the character ‘R’ is copied from the tileset.

The y-offset will always be in the range [0,7].

With this functionality, we are able to scroll one pixel and also “print” one slice of each character in the text line at the same time. When scrolling upwards, this slice-printing happens right “under the border” at the bottom of the screen, and when scrolling downwards this printing happens right “under the border” on top of the screen. This printing happens after first clearing that thin area/line with a fast and black rectfill.

Making "smooth execution" possible

Never exceed frame time

I wanted the game to run smoothly. Particularly ingame. That means that I need full control over what is done in one frame. The callstack is like this:

   mainLoop()
       sceneLoop() // stays in this function while mode is "ingame"
           sceneRun() // called once per frame

The scenerun function is the function that should not exceed a frame. That is, the cycle-spend should not exceed these numbers:

PAL (“50 FPS”): 71364 cycles (3579545/50.159)
NTSC (“60 FPS”): 59736 cycles (3579545/59.923)

This is given by Z80 running at ~3579545 Hz (or so) and PAL running at 50.159 Hz and NTSC running at 59.923 Hz.

Obviously there is a huge difference between PAL and NTSC, and “unfortunately” NTSC will be your target if you want it to run everywhere.

One of the best tools to measure your cycle-counts is to use the openMSX emulator with the profiler add-on script, profile.tcl.

I also use a frame-counter (counted in the ISR) to control when to enter sceneRun, instead of using the halt operation. This ensures that you do not get any extra delay/frame-drop in the case you should be so unlucky to exceed a frame once in a while.

A positive side effect from this is that, in ingame-mode I do not have to put the music replay-routine on the ISR-hook. The ISR-hook is always run at VBLANK, first in every frame, and is running at a crucial time, namely when the (imaginary) raster-beam is in the border area. The time in this area is much better used for visual things that need to be done before the drawing of the screen occurs. In fact, many screen modes (not screen 4 though) have even faster VDP-access in this area (see VATT), which should make you favorize VDP work here. In addition, the music routine execution time is highly invariant which is another reason to place execution of this routine much later in the frame. And then, my music routine can run with interrupts enabled as well, which is yet another reason to not put it on the VBLANK-interrupt.

Visualizing where the time is spent

By using the good old “color-the-border” we can see where our time is spent, and that we are within our bounds.

Showing cycle spend with openMSX 18 in NTSC mode on a Panasonic FS-A1GT with speed set to 2%.

BLACK	Waiting - except the first black part after the first interrupt (=scenerun start).
RED	Interrupt (marked) and parallax.
CRIMSON	Mixed visuals: Time, timers, VRAM-time, VRAM-coin, VRAM-score + set ticklist.
LIGHT YELLOW	Bump bricks.
YELLOW	Handle objects (scene tick, visibility).
ORANGE	Place player.
BLUE	Handle sprites - visibility, sprite tick, put sprites in RAM (sprites 1).
DARK BLUE	Putting sprites to sleep (sprites 2).
GREEN	Sprite collision & awake sprites (sprites 3).
LIGHT GREEN	Animations: coin, candle.
BROWN	Set player frame and sprite pattern page.
DARK BROWN	Pickup coins.
PETROL	Handle input.
GRAY	Top: Viewport handling. Below line interrupts: Tiles➔VRAM (2 parts).
WHITE	Audio (first sfx, then music).

The red lines that come from the interrupt always happen on the exact same y-position.

From the colors we can see that functions vary in their duration and that they do not necessarily run every frame.

This animation is from a debug build, but it is quite representative. The C-compiler can be run with optimisation settings which will make a few of the code paths slightly faster. The final ROM is compiled using this. In addition, the code to change the color of the border comes with a small cost, which of course is not included in the final build. And in debug there are a lot of extra shortcuts available, so the petrol-colored “Handle input” will be much smaller/faster in the released version.

When doing visuals like this, make sure you test different machines, as there are different performance numbers amongst the different MSX-models on the market. This is mostly due to different, so-called “wait-states” in the different MSX-Engines or MSX-System chipsets. There are 0, 1 or 2 waitstates added (see https://www.msx.org/wiki/Toshiba_T9769#Differences_between_revisions).

Comparison using openMSX of how the different wait-states affect the execution duration, singling out the tile-”blitting”. The first gray block is 612 OUTs and OUTIs.

Dedicated ISR, no H.TIMI or BIOS usage

Last time I hooked up the profiler on the default, “empty” interrupt, it showed ~5000 cycles on a Philips NMS 8255 and ~2000 on a Panasonic FS-A1GT. That is a lot of cycles. Cycles I’d rather use myself. I planned to handle keyboard routines myself and I did not use disk I/O or any other things from the system. In such a case, I gathered that it is better to just hook up my own ISR at 0x0038 using IM1.

Aside from a call or two at startup I also stayed away from using the BIOS altogether. The reasons for this was:

Having your running code in page0 makes it complicated to get efficient BIOS usage (which also resides in page0).
Because of the custom ISR and my need to fully control when DI and EI is set.
My need to have full control over all the cycles spent.

When going rogue like this you also miss system support for extensions that may use non-standard ports, “external” or alternate ports. For example, the NEOS MSX2-upgrade uses ports 0x88-0x8B instead of the normal ones at 0x98-0x9B. Lilly’s Saga decided to not support external ports. To my knowledge the install base of external VDPs is very small.

Craft your levels to perfectly match your engine

Several things have been carefully considered when making the levels, things like:

Number of enemies in one location: Maximum dual-sprites in rental is 6. From testing I know that 6 at the same time is pushing the cycles. Locations with maximum sprites can have no or very few tick-objects.
No enemy-enemy collision is implemented, so to avoid exposing this, I avoid multiple enemies on the same spot. Exceptions are ghosts and cannonballs thrown from Cloudy.
Avoid more than 2 enemies per line to avoid flicker (not about performance though).

High level and structural optimizations

This chapter includes higher level ways of making the code run more effective. Lower level optimizations can be found in Low level optimizations.

Spreading work over multiple frames

Splitting the work into sensible chunks and running them at specified times or frequency, can be done in different ways. Ideally each function has a predictable running time as well.

Frame load balancer

This idea takes into account that some things, like visuals must happen every frame, while others, like parallax runs only every second frame or updating the visual timer on screen runs every fourth frame or the palette needs only a visit every eighth frame. As a fundamental mechanism I have a counter called jiffy which counts up, first thing in every frame. This number can easily be used for spreading out the work.

To be able to set up a sensible spread of work across frames you will need to use the profiler on the various functions to see their cost, as well as making a call on how often they should run. A visual representation of how this works in Lilly’s Saga, is shown below.

As an example, the cost of running frame number 9 will only involve the marked functions.

If you are really tight on cycles and want to cram in as much as possible, this work may end up comprehensive and advanced. I had to make some spreadsheets to play with different scenarios and to get it optimal.

In effect, this is what you see in the animation in previous chapter where the colors are jumping up and down, and are only present in some frames. For example, the light yellow color happens only every fourth frame (after line interrupt 2) and is running the bump-brick code. As the bump-brick code uses double-buffering, the visual effect of this code is shown in the succeeding frame.

In practice, it won’t end up looking too nice in the code. In my sceneRun-function I have lots of tests that look quite ugly. Flow can be hard to follow in code, but it’s working and part of a plan. Cost wise, in the grand scheme of things, a test like this is really cheap.

Large number of tests pollute the code in sceneRun (the framecounter variables are made from jiffy).

Hangover-frame

The hangover-frame kicks in exactly when you bump your head into something.

The hangover frame happens when you hit something with your head, and a series of things needs to be carried out. As the name (almost🙂) implies, work is carried over from one frame to another. The work can be blinking tiles, kicking off animated tiles (which in turn should kill an enemy) and/or kick off some sound effects or other stuff. Trying to put all this into the current frame which is already pushed on cycles proved difficult. However, the placePlayer-function which does the player movement and collision detection (which is a very costly function as well) can be ignored for one frame at the exact point where the player hits something (!).

In this case, the player is changing from moving upwards to moving downwards, so we just allow the player to hang in the air for one frame, while doing the other stuff mentioned earlier. No one will notice that there is no player movement in that frame. This is attempted illustrated in the outstanding piece of art found below.

The above is run every frame as part of sceneRun. Skipping the heavy PlacePlayer if a “HangOverFrame” is pending from the previous frame/sceneRun. Lilly will be “hanging in the air” for one frame.

The only risk is that people could sense that the effect is coming too late. Maybe some high-sensitivity people notice that the sfx is coming in a bit too late? Well, I have accepted that risk.

Player kills enemy - delayed work using states

When an enemy is killed there is also a lot of work involved, like:

The state of the enemy changes.
The enemy is starting moving on a path out of the screen.
We are kicking off a sound.
We are adding points.
We are probably also putting the player onto a bouncing path.

This work can easily be spread over multiple frame using states on the particular enemy:

(SPRITE_MODE_NORMAL)
SPRITE_MODE_ENTER_KILLED
SPRITE_MODE_KILLED

When I detect an enemy kill, I only change the state to SPRITE_MODE_ENTER_KILLED, change the pointer to the run-function and then return. All the rest of the stuff is handled in succeeding frames.

No dynamic object creation during gameplay

In the initial design for this game and the rental system, the sprites were dynamically created and deleted on the fly. For example, the exploding bricks and the floating points could be typical examples of short-lived objects. However, over time, I found that I could not afford creating and deleting objects on the fly. I wrote a super-efficient memory manager, “pre-populated” template objects in memory, wrote code in asm and managed to get it fast. But it was not fast enough. And it was unpredictable. As the creation or deletion could occur at any time in a frame, it could accidentally happen at “the wrong time”, causing exceeding the cycle budget.

I ended up with this:

Many objects are pre-created, singular re-usable sprites (exploding bricks, floating points, halo, tree-trunk, and crate).
All enemies are spawned/created at level load and then put to sleep.

Creating all sprites up front naturally demands enough heap memory and does not scale for bigger games or levels. –For this game though, it was all part of the plan.

Ticklist and reduction of active objects

A level will typically have around 50+ objects, and maybe half of these are ticklist-objects, which means that an object has its own “run”-function. Running 25 objects every frame will be very heavy for the poor Z80. We want to only do the run on the objects that are visible.

Doing visibility testing on all the objects on the level all the time is also very costly. So we also want to reduce the number of objects that we iterate on at any given time.

The location of any object that is part of the ticklist (i.e. ticklist-object) is always static. We use this fact to premake lists of objects for the different parts of the level. The game has a concept of a screen (Ref. Level construction). In the level-creation-script I generate a list of ticklists for every screen in the level.

Click to enlarge. Huge cpu savings by using the technique.

In the illustration above, we have 7 screens. The player is currently tied to screen 6 because the viewport’s midpoint is in this screen. The contents of the ticklist for any screen is the inside plus 50% in each direction horizontally and vertically (only showing a horizontal level here for simplicity, but this works identically for levels with a vertical span)). For screen 6 above, the ticklist’s contents are:

4 moving platforms (gray)
2 reveal objects (pink)

This means that we only need to iterate over 6 objects instead of 25 at runtime. The algorithm in sceneTick() is like this:

   foreach object in current_ticklist:
       if object.boundary intersects viewport:
           object.run()

For the example above, we end up running the run-function in 4 objects, 2 moving platforms and 2 reveal objects.

Across all the levels in the game, the maximum number of objects in the ticklists for the levels varies between 3 and 9.

Sleeping sprites via sprite-ticklist and the Area system

Sprites use a different management system for ticks than objects, because of these reasons:

Enemy-sprites does not always have a defined boundary to move within
Enemy-sprites should move/tick outside the current viewport (to a certain extent) otherwise the game looks crap

In the game the different levels have between 11 (first level) and 32 (the two last levels) enemies.

The main idea is to have two lists instead of one. Any enemy in the level is created up front and is referenced in either of these two lists:

SpriteTickList
SleepList

Ideally the amount of sprites in SpriteTickList is as few as possible as this means that there are few sprites to iterate over when running the SceneSpriteTick()-function.

To move sprites between the two lists I needed an effective system, so I created a system called the Area system. It is a grid with cells of 64x32 pixel size.

The reason behind yet another coordinate system on top of pixels (1x1), tiles (8x8), objects (16x16) is computation speed. Thus I needed a coordinate system that could give the location of a sprite with 8-bit values instead of 16-bit. To see if a sprite needs to go into sleep (putToSleep()) or out of sleep (awakeSprites()), I compare their area values with the area value of the player. 8-bit comparisons are significantly faster than 16-bit.

Click to enlarge.

The algorithm in sceneSpriteTick() is like this:

foreach sprite in spriteticklist:
sprite.run()

Prior to its execution the functions named sceneCalculateViewport() and sceneCheckSpriteVisibility() are called. They set the viewport, and check if sprites in the SpriteTickList are inside this viewport. If outside, they are removed from the rental system or if entering inside, they are added.

The run-function for any sprite that has an animation in addition to movement, will check whether it is currently visible and performs animation/visuals only if it is visible (i.e. is part of the rental-system).

The related calls are then:

   sceneCalculateViewport()
   sceneCheckSpriteVisibility()
   sceneSpriteTick()
   putToSleep()
   awakeSprites()

The putTosSleep-function became slightly too costly, so I made it alternate between working on the first half, and the second half (the so-called “putToSleepIterative()” function). In other words, it iterates over 50% of the list every frame.

The awakeSprites() needs to iterate over almost all sprites in the level, so this one is obviously expensive. To deal with this, I made it iterate over a maximum of 4 sprites every frame, then return and resume where it left in the next frame.

There are other sprites in the game or level that are part of this system as well, but they are almost always inside the viewport and not making too much difference. These are:

Exploding bricks (4)
Floating points (100/200/100/2000)
Halo / Circle around player
Blocking tree trunk
Crate

Calculating and storing another set of coordinates (i.e. the Area system, see Part III) comes with overhead, but my calculations at development time showed me that there were some cycles to be won by doing it this way.

Objects + Sprites tick every other frame (25/30 FPS)

Every object and sprite has their own run-function. Running multiples of these every frame is costly, so the function itself should be efficient. Still, I ended up with many situations with too high a load. I chose to skip every other frame for both objects and sprites. This certainly has a negative visual impact, especially when the speed is high (like enemy bumped off screen), but I can live with it.

Instead of alternating objects and sprites, I run both every frame, but each handles 50% of their respective list at each run.

There are some exceptions to this scheme:

Player runs and animates every frame.
Crate runs every frame.
Boss and bats run every frame.

The crate had to be updated in the same frequency as the player, as pushing the crate looked very buggy when they differed.

Avoid enemy roaming if you can

If you want the enemies to roam free, like in Mario, you need to have collision detection towards the environment. That is, towards background tiles and possibly towards other enemies as well. The latter is not implemented and not exposed visually in this game. The former was implemented on all enemies originally, but it came with a cost. Because of this, I removed this ability from most enemies and adjusted level designs accordingly. The alternative is to have your enemy move inside a defined rectangle only (setting SPR_FLAGS_BEHAVIOUR_BOUNDARY). Simple movement back and forth. With no collision checking you must know that the ground will never be removed under the enemy, because if it did, the enemy would be flying in the air.

If hammer helmet was given in this level, crushed bricks would leave the blob walking in the air (blob’s boundary in blue).

The only enemy left with (a simplified) collision checking is the rolling stone.

All in all, this comes with a great impact on your level design and gameplay.

Turn off code paths not needed (the boss scene)

Obviously, the boss-scene has no scrolling, so it makes sense to take advantage of this, and disable the scrolling-functionality altogether. The logic to run the boss-behavior is much more advanced than all the other enemies, so this was very much needed anyways.

Baking information into tile-ids removes storage and calculation needs

In the effect where the player can walk on a curved surface, tiles were in need of a place to store information about their vertical displacement (y-offset). The value would be in the range [1,7] (0 is no offset). I needed something that stored info efficiently and had little performance impact.

Rope bridge - walking on curved surface shown in half speed.

I already had a look-up system for properties of the different tiles, so I hinged it onto this, by using one of the bits for a y-offset property.

When the player is placed out on screen, there is a collision detection towards the background tiles. If the tile below the player has the COLLISION_TOP bit set, the player is placed on top of this tile. If COLLISION_YOFFSET bit is set as well, the player is placed on top of this tile plus an offset tied to the specific tile

By baking this offset-value into the tile-id itself, there is no extra storage needed and the execution code has the value calculated already.

Tile-id 100 (0x64) immediately gives 4 pixels as offset via the id itself.

The game uses the following simple c-code to displace the player, and SDCC converts this to perfectly efficient assembly-code.

   if( ( tileCodeBelow[1]&BLOCK_YOFFSET )!=0 )
   playerYOffset = tileTileIDBelow[1]&0x07;
   else
   playerYOffset = 0;
   nGlobalPlayerPosY = (nGlobalPlayerTileY*8) + playerYOffset;

Scrutinize handling of lists

This part is leaning towards low level optimisations, but at least the few cycles gained comes from the structure of things, so I included it.

In the beginning of the project I was trying out different kinds of optimisations to be used when traversing long lists of either objects or sprites (which meant elements would hold a 16-bit memory address). Inner loops are always important to optimize, as the instruction costs multiply fast.

“Not only should it be fast to traverse, but also adding/removing elements needed to be fast as well. Normally it is hard to get both.”

When traversing, the fastest possible structure is a plain array and a given number of elements. When adding and removing I was not sure what structure would be the fastest, so I tried out several variants like linked and double-linked lists, as well as different array variants. In the end I ended up with something I called a StackList, as the best overall solution. It consists of both a stack and a list. The list part is a plain array which is used when traversing, and the stack part is used when adding and removing elements, as it keeps track of which positions in the array that are available. As a performance measure I use 8 bit values in the stack, so the maximum size of the list/stack becomes 127. This is more than enough for my purposes. The drawback with this solution is that the list will sometimes contain NULL values (right after a removal), and hence the traversing code must do a quick check for NULL. This can be done quite efficiently and way faster than any of my other suggestions (apart from a plain array with no checking of course).

List 1: The list when initialized and empty.

List 2: 5 elements have been added in sequence.

List 3: Building on list 2 and first removing element 3, and then removing element 1.

This structure risks getting “holes” in it, but the stack ensures that any hole is filled before extending the list, which in general keeps it pretty condensed.

This Stacklist is used in the game in these places:

Sprite ticklist
Sprite sleeplist

Originally it was also used for objects in their ticklist. However, due to Ticklist and reduction of active objects these lists are now plain arrays with no NULL-checking when traversing.

Low level alert: During traversal the reading of a 16-bit value into registers, the checking for NULL and loop handling costs around 50 cycles per element (“the loop overhead cost”). Cost of adding an element is 206 cycles (which would be only 180 if the list address would be aligned to 0x0100). Removing an element costs 243 cycles. This was the fastest I could do.

Taking (risky) shortcuts in the name of performance

One may always take shortcuts in the code to optimize. There might be extra checks that should be done to make the code more robust. Almost always, robustness is the way to go. However in tight loops or on a tight cycle budget you might think otherwise. In Lilly’s Saga I’ve done this a couple of places, despite not recommended.

For example, if crates are pushed outside of the screen it will overwrite memory in other places as there is no checking for this. Instead, it became a design decision that level design should not allow this.

Furthermore, a less risky approach, a large portion of level build-up and data consistency is not checked during runtime as would be common in modern or good practice programming, but is moved into the level creation tool instead. See more in Error checking (part III).

- PROCEED TO PART III -

Under The Hood - Part II - Lilly's Saga - The stones of Evergreen