Under The Hood - Part II - Lilly's Saga - The stones of Evergreen
Table of Contents
Part II: How it works
Confines and key numbers
Key numbers, facts and decisions
Generation | MSX |
Video Mode | PAL, NTSC |
FPS | 50/60 on scrolling + main sprites |
Audio | FM-PAC |
RAM | 64 kB |
VRAM | 128 kB |
ROM | MegaROM |
Mapper | ASCII16 |
Free version | Premium * | |
---|---|---|
Total ROM size | 1216 kB (76 segments) | 1264 kB (79 segments) |
Music size | 896 kB (74% of total) | 944 kB (75% of total) |
Number of scores | 21 | 22 |
Music in minutes | 25 | 26 |
Number of sfx | 33 | 33 |
Game flow
Ingame mode or not
Final memory model
What came to be. After many iterations. |
VRAM arrangement
VRAM arrangement (pop-up full size). |
How levels are constructed
Level setup in the tool
- tilemap
- tilesets
- layers
- various object-types
- custom properties on both objects and tiles.
Layer | Description |
---|---|
Comments (convenience) | Keeping track of details and maybe ramp build-ups (see more about ramps over at Chaotic Stupid) |
Cloud-placement guidance-layer (convenience) | The clouds are fundamentally built up to be placed at specific intervals in the horizontal direction. I use a helper layer to indicate which positions these are. |
256x176 guidance-layer (convenience) | This is the size of one full screen and often it is handy to easily see the boundaries of the different screens. One screen is 32 characters by 22 lines. |
64x32 guidance-layer (convenience) | This layer is normally not used, but indicates a grid which is used by the "Area-system". |
16x16 guidance-layer (convenience) | Gives you a good indication of 2x2 tiles, a block, which is fundamental to the placement of objects in this game. In later versions of Tiled, a better grid-visualization has become available, making this layer redundant. |
Backup-layers (convenience) | When testing out different types of gameplay it's very handy to move objects temporarily into other layers which your export-tool ignores. Just right-click the object and choose "move to <layer>". Typically I have this layer turned on with a high transparency setting. |
Pathways (required) | This layer holds rectangles which divide the full level into individual areas or rooms. |
Coins (required) | Coins are placed in its own layer in the level-editor although the coins end up merged in with the background tiles in the game engine. |
Sprites (required) | All enemies are placed in its own layer. I've chosen to show sprites and sprite-boundaries as rectangles. |
Objects (required) | The various objects are placed in its own layer. I've chosen to show most objects and object-boundaries as rectangles, but some that have a static size have proper visuals. This is PlayerStartObject, clouds and crates. |
Background (required) | This is the tilemap and is placed at the bottom. |
Tiles
Tilesets
Tileset 1: This tileset is used in the outdoor themes: Mountain Road and Outside Castle. |
Tileset 2: This tileset is used in the indoor themes: Caves and Inside Castle. |
Tileset 3: This tileset is used in the outdoor theme Shoreline. This tileset is the only one with unused (old) tiles. |
Tile codes
Each tile id has an additional property-byte made up from bit-values (see also .BCM-files). The byte will hold any sensible combination of the eight following bit-properties:Tile #64 cannot be moved into/over from any of the sides. |
Special tiles
- the ones marked with an X
- the candle-flame - tileset 2 (indoors) only
- waterline tiles - tileset 2 and 3
Level construction
Below is a sample level. Lines with arrows are objects that reference other objects.
We can easily see that it fills 3 screens when enabling the “screen”-layer:
We also have a 16x16-grid or block-layer that can be visualized:
The block-layer is effectively its own map. |
Looking closer at the objects, we see that they are placed perfectly in this grid:
The 16x16 grid is the same as the AIM – the Action-ID-map. In our current example, the various objects will get an ID, and the map will look like this:
AIM. |
Output files from level construction
Extension | Description |
---|---|
.LTM Level Tile Map | This is the tilemap and is the biggest part of the level. Most of the space in page 2 will end up occupied by this data. |
.AIM Action Id Mapping | This is a map of the same area as the level tile map, with a 16x16 px (2x2 tiles) grid which holds information of the objects' locations. |
.BCM Block Collision Map | Every tile in each tilemap has its own properties. See Tile codes to learn more about these. |
.OAP Object Action Property | The different objects have different properties. This file holds the initial values for all the objects. |
.PWS PathWay Set | This file holds a set of rectangles. Some rectangles as they are directly defined by the creator in the level as well as tool-generated rectangles. Such a rectangle informs the engine about the boundary of the current level and is dynamically changed as the player moves around. |
Pathway rects
Sketch of what kind of level paths we want in a 3-room/area level. |
Each screen is shaded to easily see their boundaries. |
10 rectangles define two rooms and one complex shaped area. |
Example level with its 11 pathway objects visualized. |
- A ➔ 1
- B ➔ X1
- C ➔ X1
- D ➔ 3
- E ➔ X2
Fundamentals for ingame visuals
Scrolling
Double buffering. |
How it looks behind the scenes. |
In the image above, sprites on the sides are made a few pixels too small and with a outline for visual clues only. |
The sprites on the sides will, in every frame, be offset in x-direction according to the same x-value as in the adjust-register.
Sprite split
In the image above, sprites on the sides are made a few pixels too small and with a outline for visual clues only. |
Page split
Getting the timing of this split correct across all the different VDPs or MSX engine / MSX system is not straightforward, but I I think I managed. To achieve this I’ve had to sacrifice showing the last pixel line of the tilemap. I guess no one ever noticed.
Also at this point, the adjust register (R#18) is reset and the vertical scroll register (R#23) is repositioned.
If I hadn’t put the font in its own page, I would have had to have the font-characters as part of the other tilesets, drastically reducing the amount of available tiles in the game. This wouldn’t have worked well. I found 256 tiles quite limited already.
Sprites and objects
Anything that can be interacted with on the screen is either a sprite or an object. Some are part of a ticklist and some are not.
The ticklist is a list of objects that will have their own “run”-function that will be called once every frame. Examples of objects that are ticklist objects could be MovingPlatformVerticalObject which typically moves one pixel every frame and AnimatedObject (like Fire) which needs to swap out tiles every x frame).
Sprites have their own ticklist, the so-called SpriteTicklist.
The rectangle given in the level-editor marks the boundary for every object, and not necessarily the size. For example, the MovingPlatformVerticalObject may have a rectangle-height bigger than one screen height, but the bar in the platform is only 8 pixels.
All enemy sprites, except the Boss, are 16x16 pixels (i.e. just as 2x2 tiles) and any greater spritespawn-rectangle than this defines the enemy’s roaming area.
The object types
The following objects can be used directly in the level-editor:
In the image above, we see that RezzControl has an attribute called connect. This is a potential reference to another object. Several objects have one or more attributes like this. In Tiled these are shown as lines with arrows between the objects.
Object type matrix
Below is a matrix which describes certain aspects of the object types. In general objects are passed on from the level-editor. Tick means that the object is part of the ticklist, LUT means that is is listed in the LUT, AIM means that it uses AIM, Gen means that this is a generated object from the make_level-script, Script means that the object is used in the make_level-script only and Outside means that the object is kept outside the ingame-engine - it ends its life in the world load-module.
Object type | Tick | LUT | AIM | Gen | Script | Outside |
---|---|---|---|---|---|---|
MovingPlatformVertical | ✓ | ✓ | ✓ | |||
MovingPlatformHorizontal | ✓ | ✓ | ✓ | |||
Flag | ✓ | ✓ | ✓ | |||
Animated | ✓ | ✓ | ✓ | |||
PopIn | ✓ | ✓ | ✓ | |||
RezzControl | ✓ | ✓ | ✓ | |||
RevealObject | ✓ | ✓ | ✓ | |||
HiddenHitBlock | ✓ | ✓ | ||||
MultiReveal | ✓ | ✓ | ||||
Trigger | ✓ | ✓ | ||||
ObjectTrigger | ✓ | ✓ | ||||
Exit | ✓ | ✓ | ||||
Slowdown | ✓ | ✓ | ||||
Warp / Trampoline | ✓ | ✓ | ||||
Crate | ✓ | ✓ | ||||
LethalObject | ✓ | ✓ | ||||
PlayerStart | ||||||
SpriteSpawn (Enemy) | ✓ | |||||
GeneratedPathwayObject | ✓ | ✓ | ||||
GeneratedTicklistObject | ✓ | |||||
PathwayRectangle | (✓) | |||||
CloudObject | ✓ | |||||
Coin (not really an object) | ✓ |
All objects have a static location with the exception of Crate, which can be repositioned in AIM, and Coin which is removed from AIM once picked up by the player.
Object breakdown
Object | About |
---|---|
MovingPlatformVertical | This object (as can be seen in Level construction) is always 6 tiles wide, but has various heights. The height only depicts the bounds. The types supported in the game are called:
|
MovingPlatformHorizontal | This object is always 5 tiles wide and there is only one type of this object and it is pendular. Only the Shoreline theme uses this gameplay element. The general idea is the same as for the MovingPlatformVertical, so the width defines the bounds. The height is in practice only 8 pixels, although the object bounds are 16 pixels vertically due to the AIM architecture. |
Flag | |
LethalObject | Just an area that is lethal, meaning that the player dies as soon as it enters the bounding rectangle. |
Animated | It changes the tiles within its boundary in the tilemap every x frame. The object may or may not be lethal (like a LethalObject), but as this object type ended up being used for fire only, it is lethal. |
RezzControl | This control is connected to a sprite. It will reposition (/reuse) the connected sprite according to some timing values set. Used on the Rolling Stone and Cannonball sprites. |
HiddenHitBlock | |
RevealObject | This object is connected by other objects which potentially can visually kick off a revealed object. It will then start with a sparkle animation (hence ticklist), and end up with a given representation. It also holds information about which object it reveals, such as CHEST_POWERUP_2X, CHEST_POWERUP_WINGS, CHEST_LETTER and so on. |
MultiReveal | This makes one connect-reference into three. Made out of necessity where I needed to have three sparkles kicked off at once. Ended up being used only one place in the game (removal of entrance to the castle). |
Trigger | Connected to a sprite. On level startup, the sprite/enemy is put in a hidden, sleeping mode. Once the player enters the boundary of this object (dark blue below), the sprite/enemy is brought into its original location and will be woken up. |
ObjectTrigger | Just like Trigger except that an object is triggered instead of a sprite. ObjectTrigger may for example show or hide a PopIn. It may be set up like this example, where orange is ObjectTrigger and white is PopIn. |
PopIn | This object may be visually on or off. This is triggered by the ObjectTrigger -object. Main attributes: Mode: Enable | Disable | Swap DefaultVisible: Enable | Disable RestoreOnKill: True | False |
Exit | Enter the boundary of this, and the game sets the g_uInputMode to INPUT_MODE_AUTO_PILOT, as well as sets some dedicated end-of-levels values for the autopilot system. Autopilot just controls the character for a short while doing a pre-made sequence of movements. |
Slowdown | As it looks like in Tiled. When the player is inside this rectangle the movement speed is reduced. |
Warp / Trampoline | Enter this, and g_PlayerState becomes STATE_WARP (and player control is lost for a little while). Depending on the warp, but in case of a suction-into-a-trunk-variant, a singleton dual sprite for the trunk is placed on top of the trunk, and the player is animated "into" it before the reverse happens after a change of scenery. The same object can also do the same effect without showing the occluding dual sprites. The object also supports the player being "shot out", like a trampoline. |
Crate | The crate is a very special object:
|
PlayerStart | Normally there is only one of these objects on a level, but if there is a boss room as well, there will be two. Main purpose is to hold the location of the level kick-off. |
SpriteSpawn (Enemy) | These objects will create the different enemies (a Sprite-object) during the loading of a level. Enemies will be placed at the correct place and put to sleep. |
GeneratedPathwayObject | These are special objects made by the level_tool, and are based on Pathway rects set by the level designer. They are never interacted with by the player, but informs the engine about the (changing) bounding rectangles for the viewport/camera. |
GeneratedTicklistObject | Instead of having one big global ticklist, the make_level-script splits up the bulk and generates reduced size lists for various places in the level. |
PathwayRect | Most of these rectangles are made by the designer during level creation. In addition the make_level-script will combine some of the rectangles and add to the list of rectangles used by the engine. |
CloudObject | A cloud is a set of "cloudy" patterns that moves across two lines with 16 tiles. To make sure that these patterns do not rotate out of the tiles, the correct tiles need to be placed out in correct order and at specific x-values in the level (only every fourth is allowed). The tool uses the objects to place the correct tiles into the tilemap, and the cloud object ends its life when this is done in the make_level-script. |
Coin (not really an object) | From the outset coins are only tiles placed in a coin-layer. In the make_level-script, these coin-tiles are found and copied into the level data. A volatile, special coin-id (255) is put into the AIM. This id is a special value and cannot be used as an index in the LUT. |
Triggering or hitting objects
Normally, when the player moves around, the code does not check whether the player collides with some of these objects, with the exception of the following which are checked for every frame:
- Animated (fire)
- Lethal (spikes)
- Slowdown (pond)
- Trigger
- ObjectTrigger
- Exit
For other objects to be found/identified/triggered, a tile collision needs to happen first (see Environment collision and tile codes). For example, the chest tiles have collision-codes in them, so a collision will be triggered, and then the object will be found and in turn the corresponding chest-open functions will be called. Performance wise if it is nice to avoid checking for everything every frame.
Sprite setup
The sprite looks
The main sprite sheet looks like this:
This sheet effectively produces 106 different 16x16 sprites. |
How two colors becomes three. |
The SAT
The sprites are meticulously set up in the Sprite Attribute Table (SAT) with these pattern names:
The SAT can only display 32 sprites at any given time. |
The SAT is placed in RAM and read and written to multiple times during a frame. The whole SAT is sent to the VDP at the beginning of every frame and when the raster is in VBLANK.
Despite having 128 kB VRAM at our disposal, we use only one SAT – and as we can see in VRAM arrangement, this is the green block in page 0. That is 128 bytes for the main SAT and 512 bytes for the corresponding color information. The color information for each sprite is kept in RAM, and sent to VRAM only when an update is needed.
Sprite pattern table(s)
From VRAM arrangement we can also see that the game uses all (eight) pages for sprite patterns. One sprite pattern table holds 64 16x16 sprites. By using all eight of them, we get a theoretical amount of 8*64 = 512. Now, we can only have one page active at a time, but still, we can utilize this and this is how we are able to get 106 sprites available in this game.
In non-boss-rooms, the sprite patterns tables are set up like in the following table.
Here you can see that we can easily enable one animation frame on the player by just setting the page. Given that the SAT had sprite pattern 0 and 1 set for the player dual sprite entries (#18 and #19), then, moving the player rightwards would imply setting page 0, 1, 2, 3 and then repeat. If we need to change from walking to flying, we also need to modify the SAT by setting pattern 18 to value 4 and pattern 19 to value 5. Any other enemy does its animations by changing the pattern-value in the SAT only.
Due to performance reasons, the colors for the player sprites have been made identical across all frames, and hence no need to update color information in VRAM between frames.
All player sprites have the same color definition. |
When in boss-mode, we just replace all the dual layer enemies in VRAM and it becomes like this:
Sprite/Enemy variants
Main sprites
The “main” sprites are shown in The sprite looks and is dubbed single or dual sprites. Common for all of them, is that they have a 16x16 pixels dimension. The sprite-object has a function pointer called pTickFnc. If a sprite is awake, this function will be called once every other frame. There are many tick-functions for sprites in the code, and a few examples are:
- sprTckEnemyBulletLeft
- sprTckEnemyBulletRight
- sprTckEnemyBulletDown (bullet dropped from Cloudy)
- sprTckEnemyUniDirOneStep (shared behavior by several enemies)
- sprTckEnemyCloudy
- sprTckEnemyGhost
Sprite flags
The sprites may also come with some special flags either from the level-editor or from the spawning process. Some of these flags are:
- SPR_FLAGS_KILL_ON_BUMP
- SPR_FLAGS_ALTERNATE_ENEMY
- SPR_FLAGS_YOFFSET_AWARE
- sprTckEnemyRollingStoneAlt
Some sprites are placed on hanging bridges, and to make them look good, they need to spend some extra cycles to align to the lowered ground. We inform the code up front about this by using the SPR_FLAGS_YOFFSET_AWARE-flag.
Boss mode sprites
This code for this was somewhat an afterthought and isn’t overwhelmingly elegant and pretty much abuses the 16x16-system and the rental system (see below). The upper left sprite works as the “master sprite”. It directly controls the three others, the followers, as well as itself. Controlling means location, and sprite patterns (the sprite colors for sprite 0 equals those for sprite 1, similar for sprite 2 and 3, hence colors do not need to be updated during boss fight). The boss sprites enters the game with these flags:
- SPR_FLAGS_BOSS
- SPR_FLAGS_BOSS_FOLLOWER (only three sprites has this flag)
And to elaborate, the boss can only be inflicted by stomping the hunchback, so during the boss fight, the two upper sprites, 0 and 1 will alternate on having the SPR_FLAGS_KILL_ON_BUMP-flag set, all depending on the direction of the boss.
The bats are normal 16x16 sprites with their own tick-function.
Sprites rental system and reused sprites
A level may have, say, 40+ dual-sprite enemies and some of them would be respawning all the time. Many other sprites would also be temporarily shown during a level, tree-trunks, crates, exploding bricks and floating points. A certain flexibility for sprites to come and go on the screen was needed in the engine.
The VDP has a maximum capacity of 32 sprites on the screen at any given time, and the idea of sprites “renting” a slot for a limited time came up. Some slots' content would be rather static, while others would change it dynamically.
As shown in The SAT we see that all entries except the eight border masks have so-called rental-ids. In practice, this means that there are 24 visual sprite “slots” available to gameplay (note that indexes start on 1, as 0 is “illegal value” in this system).
The rental system (special: the player data is not placed in the heap). |
From the table in The SAT we can also see that colors are written to VRAM once (i.e. upfront) for all sprites but the enemies. Enemies need to update color in VRAM according to which enemy that is shown at a given moment. Because of this we keep track of a sprite’s colors via its “SpriteSetID”. No need to update VRAM with colors in a slot, if the previous enemy was identical, and VRAM already had the correct colors set.
The rental table is the source for content in the SAT. Like, once every frame a routine traverses the rental-table and translates their 16-bit positions into screen space (8-bit values).
Beware: As 216 is a special value for the Y-position, I had to make a special check to avoid it. If you place a sprite at y-pos 216, all sprites with lower sprite-id/layer will be hidden. A good example where a magic number is harmful.
Player movement and viewport
Player movement
Player states
At any given time, the player will be in one of 8 states. The states are as follows:
State | Explanation |
---|---|
STATE_NORMAL* | Currently standing or walking on something. |
STATE_JUMP_UPWARDS* | Currently moving upwards. jump-state is ended as soon as the end of the jump curve is reached. State enters STATE_FALLING after this. |
STATE_FALLING* | Currently moving downwards. |
STATE_CLIMBING* | In a ladder. |
STATE_WARP | In or out of a warp. User cannot control the heroine in this state |
STATE_ENTER_KILLED | Actually killed, but a transition into the killed state is nice to separate some prep-work in one state, and continue with the STATE_KILLED "work" in the next frame. The user cannot control the heroine in this state. |
STATE_KILLED | Being killed animation and similar stuff running. The user cannot control the heroine in this state. |
STATE_YEAH_POSE | The user cannot control the heroine in this state. Mostly a state to tell the sceneCalcPlayerFrame function to show the correct sprite face. |
* in these states the player can always (attempt to) move left or right, but the vertical movement control varies.
Movement curves
Horizontally there are no movement curves and no inertia in this game. This is on purpose and gives a somewhat unnatural movement, but gives maximum control over the player. Vertically on the other hand, a certain gravity is added.
The jump curve is as follows (numbers are pixel additions in y-position per frame.
STATIC_PLAYER_JUMP_PATH[] = { 6, 5, 4, 3, 2, 2, 2, 1, 1, 1, 0, 1, 0, 0 };
The game has a feature that lets the player jump higher the longer he/she holds the jump-button down. In the code this is handled so that the pointer to the current location (which is at the first element) in STATIC_PLAYER_JUMP_PATH isn’t progressing while the button is held down. The duration is measured in frames, and the maximum frames for this is set to 5. I.e. if the player holds down the button for max jump, the jump-pointer will “spin” five times on the first element, and in practice become like this:
STATIC_PLAYER_JUMP_PATH[] = { 6, 6, 6, 6, 6, 5, 4, 3, 2, 2, 2, 1, 1, 1, 0, 1, 0, 0 };
For falling, things are similar:
STATIC_PLAYER_FALL_PATH[] = { 1, 0, 1, 1, 1, 2, 2, 3, 3, 4 };
In the case that the player didn’t land before the end of the fall-curve was reached, it will “spin” on the last value until landed. We see that the maximum fall speed is 4.
Environment collision
You need proper collision to create magic movement. |
- Obstacle in front of player (left or right)?
- Standing on something or falling?
- When standing on something, how much is it offset?
- Bumping head into something?
- Standing in something lethal (fire or spikes) or something that slows down (water)?
- On, or close to, a ladder?
A variation of the above points needs to be figured out and is depending on whether the player is standing still, moving horizontally and/or falling or jumping or climbing or being on a moving platform.
Initially the collision checking was placed in all kinds of various places in the code. As soon as new features were added, these checks became quite costly (wrt. performance) as many of them were carried out during the same frame. The execution time for collision checking during a frame varied a lot. The idea of putting extraction of the relevant tile codes and potential objects into a common, per-frame function with a more predictable execution time, came to be. It made sense to quickly retrieve that neighbor tile data when you already had done all the heavy calculations finding data for one tile. An analysis of all needs and combinations was carried out, before a little system was designed.
Visualizing which tiles that needs to be checked for collision. |
The player is 16x16 pixels. If the player is perfectly aligned with the 8x8 grid in the back, the player would occupy the brightest/saturated tiles in the drawing above. Most of the time, the player is offset 1-7 pixels in both x-direction and y-direction inside the upper-left tile which is marked in red in the image above. That means that the player is occupying the medium bright tiles as well. In theory touching 9 tiles and 4 objects. In practice, and depending on the implementation you might end up checking a few more tiles/objects as you add jump speed to the position when jumping, walk speed when walking and fall speed when falling.
Below is a mapping of what kind of data that needed to be stored once up-front in each frame. The data was stored in some global variables, easily accessible to the rest of the code down the line (i.e. same frame). Note: “Below”-designation in the matrix below may seem misleading – it just means the tiles below the ideal 16x16 rectangle.
This matrix shows which values we need to store for later and easy use in the current frame. |
The above scheme looks easy enough, but in practice it is a bit more complicated. For horizontal movement you would care for values in either the left or the right column – not both during the same frame. Initially tile codes, object-ids and tile-ids are reset, and if moving left, the x-pos will get current_walk_speed subtracted before the left column data is retrieved. In case of a rightward movement, x-position will get current_walk_speed added instead, before the data is retrieved.
It is similar in vertical direction. In case of g_PlayerState being STATE_JUMP_UPWARDS the current_jump_speed is subtracted from the y-position before retrieving the data. In case of STATE_FALLING current_fall_speed is added to the y-position.
In the actual implementation of this, there are various optimizations done. For example, if a state and/or direction dictates that some values will not be used in the current frame, they are not retrieved and stored. Say, in the case of falling and moving rightwards, upper-right and lower-right cells in the matrix will never be used.
It is up to the surrounding code to make good use of this data later in the frame.
Input modes
The game was also in need of a few variants of input modes, meaning handling the input for the player movement. The follow modes are in play:
Input mode | Explanation |
---|---|
INPUT_MODE_NORMAL | Normal control/gameplay. |
INPUT_MODE_AUTO_PILOT | A few times, we need to take control of the heroine. The user's input has no effect at this point. For example when coming to the end of the level, walking out of the screen. Or when entering the boss room. In this case there is an autopilot system that kicks in with a state machine which does various things depending on different steps in defined sequences. |
INPUT_MODE_RECORDING | This state is just as INPUT_MODE_NORMAL but the user input is stored in RAM for use later in INPUT_MODE_PLAYBACK. This mode is not available to the players in the final game, but is used as part of the development to record scenes which are going to be demoed when the game is waiting for players. |
INPUT_MODE_PLAYBACK | Used for demoing levels. This state is just as INPUT_MODE_NORMAL but the user input from an earlier INPUT_MODE_RECORDING is fed into the engine. If the user actually touches his/her input devices, the demo is aborted. |
Detached camera/viewport
How it used to be
Early versions of the game used to have a strict relationship between the position of the player and the viewport. As in many classical games, the player would always be centered in the middle of the screen, except when the player was close to the boundaries of the level. Example:
Rigid viewport. |
I wanted to add a smoother, more modern approach to the viewport movement and, in a way, detach it from the player.
Horizontal direction smoothness
he majority of the game’s levels are horizontally oriented, so the best experience is implemented here. I wanted the player to see more of the upcoming parts of the level, and less of the things behind. About 65% of the scene in favor of player facing direction, like illustrated here:
In addition, I wanted the viewport to ease-out and ease-in as shown below.
Like a rubberband. |
The mechanism behind this is something I called the “accelerometer” in the code. A bit misleading maybe, as it doesn’t really measure the acceleration, but it controls it. It’s an array of speed increments as illustrated below:
The AccelPointer is an indexed, unsigned char, value. From the outset, the AccelPointer will be placed at the midpoint (green). At this position, 0 pixels will be added to the viewport-movement during a frame, i.e. standing still. If your player moves in a direction, the game will try to uphold the ~65% visibility. When needed, it will move the AccelPointer one step in the wanted direction, either left or right. As long as the direction/movement continues (usually by holding a key down, but it could also be a moving platform that moves your player) the Accelpointer will increase its position one step per frame all until it reaches the min- or max-position. It will stay here as long as the player moves (no obstacle or level boundary is met). As soon as the player stops, the viewport will (soon) reach ~65% visibility and when that happens, the Accelpointer will seek to reach the green midpoint by moving towards it one step per frame.
The game has a small shortcoming here. The speed of the player is 2 pixels per frame, and the max speed in the accelerometer is 2 as well. Ideally the max speed of the viewport could be higher than the player speed to be able to catch up when the player runs with the viewport lagging behind. However, optimizations in the parallax-scroll restricts the game from scrolling faster than 2 pixels per frame, so this was not feasible.
The drawing above is a bit simplified, showing only 8 steps in one direction. In the actual game we use 23 steps, but the concept is the same. Also, the real implementation needs speedier movement when the player is in full action and running around, compared to when standing still. Tests showed that fast paced viewport movement when the player was relaxing (standing still), looked stressful. To accommodate this, the game has two “accelerometer”-arrays. One fast, and one slow. The slow one is pointed to when the player is not moving. The AccelPointer is shared between the two arrays.
Vertical direction simplicity
The vertical part is handled differently. In this case we can move freely in 20% of the middle area. When moving out of that, the viewport moves rigidly and according to the player.
40% player facing visibility in vertical direction. |
Input-handling
The game vision was that this game should be fully usable by a joystick. Just like a classic Konami game or a game on a Nintendo console. I also needed the keyboard to work as many prefer playing with that, so some keyboard handling was needed. The game used two joystick buttons, called A and B. A is always SPACE, but I needed to decide something for B. As in console games, a button could mean different things depending on where you are in the game. I needed a key for Pause (F1) and Back (ESC). Joystick B-button was mapped to F1 and ESC.
Due to performance-reasons mentioned in Dedicated ISR, no H.TIMI or BIOS usage, key handling is not using BIOS. This means that you are on your own, and cannot get any help from the system to handle repeat-keys, key combinations or different keyboard layouts (see https://www.msx.org/wiki/Keyboard_Matrices - notice the Russian keyboard being really different from all the others). The two first weren’t needed in this game, so that part is easy, but the latter one can be more difficult and had to be taken into account. To easily comply with the latter I had to make sure that I use keys that are located at the same place in every known keyboard matrix out there. Arrows, space, F1 and ESC complies with this.
Once you do your design like this, you can read input data incredibly predictably (cpu cost wise) and efficiently, just a few cycles needed when going directly on the PPI ports (0xA9/0xAA and 0xA0/0xA1/0xA2).
Animations
The animation processes vary a bit, and the different types would be:
- Update part of the tile pattern in vram. (“part pat”)
- Update the full tile pattern in vram. (“full pat”)
- Update the full tile pattern in vram, plus changing the palette entry. (“full pat / pal”)
- Update both parts of tile pattern and tile colors in vram. (“part pat, col”)
- Update by replacing select tiles in the tilemap in RAM. (“tiles”)
Water waves (type 1 - part pat)
There are two types of wave animation, the sea-animation and the pond-animation. Both are type 1 in the above list. Each animation uses two tiles, and out of eight bytes per tile/pattern, only three bytes are modified in each tile when updated. The update patterns are just pre-calculated and stored in ROM.
Candle (type 2 - full pat)
The candle animation is only to be found in tileset 2. Only one tile is used. This one is implemented as type 2 in the above list (but type 1 would have been a bit faster). The process is that we make a plain copy of the tile in RAMat startup, as well as a mirrored copy. Then the game alternates between these two.
Coins (type 3 - full pat / pal)
Coins consist of four tiles and they are the only ones with type 3 above. Color index 0 is reserved for the coin animation and animates constantly between a few shades of yellow.
Bump-tiles (type 4 - part pat, col)
Only one 16x16-block is allowed to have a visual bump on the screen at any given time. It is therefore possible to replace six original tiles with six dedicated “bump” tiles in the tilemap for a short time while a type-4 animation is running on these tiles. The tiles used for this are the six X-like tiles in row 3 in all the tilemaps. Any four tiles can be copied into these bump tiles, making it very flexible, and no premade patterns are needed.
Moving platforms - vertical (type 5 - tiles)
Visually, only two rows of tiles are drawn at a given tile position inside the bounding rectangle of the object.
Moving platforms - horizontal (type 5 - tiles)
Fire (type 5 - tiles)
Sparkle (type 5 - tiles)
The tiles used for making sparkles. They are in turn put into a 2x2 grid in two slightly different sets of animations. The cannon firing and the reveal-effect, below.
Audio support
PSG for sound effects using all three channels
I used an open source PSG ayfx-replayer for a while and was, technically, quite happy with the ayfx-format. It was quite easy to understand, closely mapped to the PSG-internals and worked similarly to the VGM-format which I had become familiar with. There was an easily available editor to create and play effects for windows (ayFXEdit) and there was a huge library available as well to get you started.
After some time it became clear to me that this replayer typically used one channel only and cut off an ongoing sfx if you chose to kick off a new one. This was likely done because that would enable users of the replayer to mix in PSG-effects with PSG-music (leaving at least two other channels for the music at any given time). This didn’t apply to this game, as the music was decided to be MSX Music only.
I had potentially many sound effects running at the same time, as the player would typically jump (➔ sfx) or land (➔ sfx) when there is a chest being revealed behind some sparkle (➔ sfx), and maybe the player was almost running out of time (➔ sfx)... and on top of this, maybe there is bullet firing (➔ sfx). And so on. –There are many situations where we can easily get 3 or 4 effects running at the same time.
So, yet again, I needed to make my own solution, “ayfxplayer.s”.
I wanted to use a free channel for every new sfx, until all were in use, that is, all three. Furthermore, if there was a fourth sfx to be played, I added a priority system to the various sound effects. If all channels were in use, the algorithm would look at the channel with the lowest priority, and match up with the priority of the new sfx, and replace it (cut off). For simplicity, and maybe reusability, I wanted the replayer to be based on the ayfx-format.
Sfx-overview with Priority from the design doc. |
Sfx clip length (“Len”) and “ingame priority” (“Pri”) are mostly directly linked (marked red if not). Short clips were unlikely to be cut off (and wouldn’t matter too much if it happened), but cut-off of a longer clip was very audibly apparent, and thus got higher priority.
AyFXEdit is not storing envelope data, so use of the envelope-register is “optimized away”. As the clips are so short, I didn’t bother to cater for the PAL/NTSC-situation. The only thing we need to be aware of with this solution, is that the channels share noise generator.
MSX Music (FM) replayer for the tunes
I had researched the possible audio-chips up front, and landed on FM - MSX-Music aka OPLL or FM-PAC. Mostly because of the widespread and that MSX2+ had chosen this as the standard for MSX going forward.
The replayer functionality. |
I didn’t want to spend more than 10% of the cycles per frame on music and ideally even less than that. After researching the replayers of various formats, I found that “streaming” the raw music data directly to the chip is the fastest you can go, so I pursued that idea. First I needed that raw data, and then I needed a tool to get that data into my own internal format. I ended up calling this format FMR, as in “FM Raw”.
This is where the VGM-format and VGM-recording software enters the scene. The composer could use the composing tool of choice, MoonBlaster in this case, and while playing the music in this tool we made sure to record this playback using openMSX, which would dump the VGM-file to disk (see vgm_rec).
Music workflow. |
The VGM-file has a bit more information that we needed, so a lot of this is stripped out and a more suitable format is used ingame. Around 2/3s of the file VGM-file were removed in the conversion process. In the end we ended up with FMRs from 3 kB to 168 kB, i.e. 1 to 11 segments.
By default, using VGM on an OPLL chip, you will have these issues:
There is no “master volume” enabling volume-control or making fading easy.
- You cannot easily stop a tune, and resume at the same point.
- Usually the first frame is very cpu-heavy as all instruments are being set up.
- You need to obey special waits between each byte of data you send to the chip.
- The tempo is given by the tempo in which the tune was recorded.
- Starting at a “given” location in a tune is hard - finding the correct byte is hard, and many sounds are dependent on data that was sent to the chip somewhere earlier in the data stream.
Also, in general, the OPLL is made so that:
- You cannot query the chip to check for what values the different registers/channels contain.
The replayer ended up with these features:
- It handles files sizes bigger than 16 kB (multi segment switching).
- Segments are fully utilized, no empty data at the end of the segment because of aligning data.
- It is nicely optimized and obeys the wait-cycles described by the chip vendor, 12 and 84 cycles (see the manual).
- It supports fade-down.
- It supports looping.
- It supports “jump-in”, meaning starting mid-tune, at pre-selected places (we use this in the game, to kick-off mid-tune instead of at the start when you return from bonus-levels, chest-openings and so on).
- It plays back the music at “identical” speed in 50 Hz and 60 Hz.
The converter ended up with these features:
- Reduces the data size.
- Making the stream “frame accurate”.
- Keeps track of the volume of each channel.
- Supports workaround for an infrequent bug in the YM2413-chip related to drums.
- Cuts off silence at the end of the tune (due to recording).
- Shows analysis/visualization of the data-stream.
- Produced stubs and new entry-points for starting “mid-tune” as well as loop-point.
- New-starts and loop-points were manually added in the stream by exploiting PSG-instruments, which were not used anyway.
Wolf’s tunes are often quite advanced with lots of writes to the registers every frame. All channels are also in use in most tunes. Despite this, the average cpu-spend lingers around 1000 cycles, with frequent frames around 3-4000 and the worst case around 7000.
Find more details about this under vgm2fmr.py.
Non-ingame features worth mentioning
Proportional font
Proportional font used in the intro/story. |
sing “standard” monospaced characters with a width of 8 pixels gives you only 32 characters on a line. This makes sense in tile-based screens, like screen 2 or 4, but not necessarily in screen 5. I quickly designed a proportional font, and with this I would typically get 50% more characters on a line. This is more efficient with regards to space and is more pleasing to the eye. It comes with a certain performance penalty, but it didn’t affect me in this game. The penalty is occurring because my characters, including space, would sometimes have odd pixel widths, and this means that you cannot use the “high speed” VDP copy commands which works on bytes, so you need to use the slower, logical commands that work on pixels (and in screen 5 each byte holds 2 pixels).
The way it is done in this game, is that the characters in the font sheet are placed out in an 8x8 pixel grid. Just as would be normal for tilemaps in for example screen 2 and screen 4. This makes it easy to calculate the upper left corner of the source location for a character. In addition I just read out the width from a simple table of bytes stored in ROM. The height is set to a static 8 pixels. The text-drawing algorithm adds space by a pixel between every character. This solution enforces a restriction on the width of the characters or symbols, and that is that the width of a given symbol cannot exceed 8 unless the character next to it will not be used.
The screen 5 font file, stored in a non-visible page. |
From the font file we see that there are four fonts stored.Only the first one has numbers. The system distinguishes them by having a different “origin” or offset for each.
The (A) and (B) symbols have widths of 9 and 10 and the “v^ SCROLL - (A) BACK” has a width of 76.
Slot selection
With 5 populated slots. |
Slot selection mimics the behavior we always see in console games. This is not often seen on the MSX. The hefty performance-requirements ingame due to full-screen scrolling was not present in this particular screen and we could therefore improve the visuals somewhat, by using screen 5. Finally we got rid of the requirement of only 2 colors per 8 pixels. First, the backdrop was taken from the game in screen 4, and converted to screen 5, and then we started adding elements with higher density of colors. A slot gets its graphics from some special theme thumbnails as well as some symbols from the font file.
The background image for the slot selection. |
The theme-thumbs stored are in a non-visible page. |
At some point I also got the idea of making a press-down 3d-effect which made sense to explore in this screen. This is done by offsetting/moving an image or rectangle by (+1,+1). You can also see that some fonts and symbols in the font file in the previous chapter have shadows as well as no-shadows for pressed down/up states.
3d-effect in action. |
The VDP commands are really slow, so to get this to look decent, I had to prepare the different buttons and slots in a non-showing page (at runtime at selection change), as can be seen in the VRAM-dump below. The results are then copied over to the page currently being shown in one copy-command at a time.
How the backbuffer (page 3) looks at runtime. The visual noise is sprite-information from screen 4’s page 7 and 8. |
To be able to write text in both black and white, using the same font file, I made sure to set up the palette properly and use the logical operations for copy-commands found on the v9938. The TEOR-operation does an XOR with target color for all pixels != 0, ie. color 0 in source (font file) is “transparent”. This gives a different result when using TEOR on the white background. Yes, the level flag, coin and heart are also using TEOR. The theme bitmap is not. TEOR comes with a performance penalty, so I don’t use it more than strictly necessary.
Slideshow
To present the story I wanted something simple. Something clean, subtle and not over the top, technically. I did not want the presentation to be more technically advanced than the game itself.
Showing two "slides" from the intro / main story |
Slideshow was needed at:
- The intro / main story.
- After each boss when a stone has been found.
- End of game, game completed.
The key concepts were:
- Scriptable.
- Typewriter-effect.
- Support images.
- Fade in/out between slides.
- Crossfade from one palette to another.
- Ability to abort.
- Should have the same duration when run in PAL as in NTSC due to music sync.
- Support for various/custom waits.
- Simple - presentation (meaning just a few different slide “templates”).
- Simple - technically (no stress on optimization or code structure).
C source code snippet. For … lazy reasons the scripting was part of the source file. |
As screen 5 images are 32 kB at full size, the images are split into 2 parts (for example ROMLOC_STORY_3_A and ROMLOC_STORY_3_B above). They are unpacked into memory and uploaded to VRAM in turns.
Newline (\n) is supported and there is also support for special “commands” which trigger at the time it is supposed to be typewritten to the screen:
~ : kick off crossfade
<1-9>: wait half-seconds, i.e. to get 3 seconds wait, you must state 6
0 : wait 2 PAL frames (needed for fine tuning)
* : wait 28 PAL frames (needed for fine tuning)
A possible shortcoming of the solution is that the duration of each slide depended on:
- the amount of text (each typewriter print spends x frames)
- the specified delays added in the script
This meant that changing the wording on a “slide” suddenly changed the duration of the slideshow, unless you adjusted waits/delays accordingly. As we needed to synchronize the slides with the music, all this was a tad cumbersome, but it worked out well in the end.
Open chest screens
How it looks when you open a chest with the 2x token. |
Despite having a megarom available, I’ve been trying to be conscious of memory usage in several places in the code. The chest open has 8 almost identical screens. The screen is composed of a big base image and one smaller showing the item.
The base image. |
It’s been hard to find documentation on image formats on the MSX. There seems to be some raw, uncompressed screen 4 (.sc4) and screen 5 (.sc5) formats in use here and there, but I failed to find proper documentation on them. They also seemed to always be fullscreen, so I baked up my own simple sc5-format which has a width and height in the header.
Example 1: The 2x token. |
Example 2: The wings power up. |
Example 3: The pouch pick-up. |
The base image uses the first 9 colors of the palette (they are static/shared and part of all images), while the remaining 7 can be used freely by the other, small image(s). At runtime, the palette used for each chest open screen is taken from the small content image.
Boss
Boss and his bat up close. |
Contrary to many games where the boss is the ultimate encounter, I never wanted this part to be the highlight of the game. My focus was on the “other” part of the game. The other 95% where the gameplay was different and where the player spent most of his/her time. Still, I needed a boss to round off the world and to adhere to the story. We needed a non-scrolling screen / room, inspired by games like Castlevania and similar games. It also made sense to make the boss bigger than the player. 4 times bigger was the maximum of what we could do, given that we needed animation sprites as well.
A boss would need its own behavior, which meant custom code, as well as custom graphics and animations. With limited gfx-resources we settled with the idea that you meet Abaddon four times, where he escapes the three first times. He tricks you by offering a false white flag 🙂
Down and beaten, but not dead. Recycling FTW. |
As this is a non-violent game, stomping is still the way to fight. To inflict damage the player needed to stomp Abaddon’s weak point, the hunchback, i.e. hitting (“falling” from above) one of the four boss sprites only. If you come near any of the other three sprites, you lose a life. See Boss mode sprites about how this is set up. As a result of this setup, the sprite 0 (the main sprite in the upper left corner) and sprite 1 alternates getting the SPR_FLAGS_KILL_ON_BUMP-flag set according to the direction of the boss.
Abaddon’s symbol was a bat, so his weapon had to be bats. These cannot be stomped or killed (exception is during the first 3 seconds after a player's resurrection when she is blinking). The sprite rental system allows for 6 dual sprites or enemies on the screen at once, and as 4 are taken by the boss, only 2 are left for bats.
Performance wise, adding in a lot of custom code for the boss was not a problem. As no scrolling was being performed, there were lots of cycles available every frame. I could also write it all in plain C, which is way more effective for this kind of programming.
Boss movement
I wanted the boss to walk towards a point and against the player, to do a small jump and to spawn bats by charging. I also wanted him to stand and think a bit before doing his next move.This ended in a state machine as described below. The main state is BossAttackState and sub-state is BossState.
BossAttackState possibilities:
- BOSS_ATTACK_STATE_FROZEN - Relax before battle starts.
- BOSS_ATTACK_STATE_THINKING - Waiting, may turn at the end of wait, may decide to enter charging or attack.
- BOSS_ATTACK_STATE_ATTACKING - Moving towards player (has a target pos to reach).
BossState possibilities:
- BOSS_STATE_NORMAL - Nothing happens.
- BOSS_STATE_MOVING - Moving and will enter jumping when target pos is reached.
- BOSS_STATE_JUMPING - Will soon enter falling.
- BOSS_STATE_FALLING - Will enter normal state and start thinking at the end of falling.
- BOSS_STATE_CHARGING - If thinking ended in charging, bat spawns.
- BOSS_STATE_POST_CHARGE - A small delay after spawning bat, enter moving after this (and turn if needed).
- HitAmount: Amount of stomps before defeated.
- ThinkMaxCount: Number of frames thinking (standing still).
- WeaponChance: After thinking, a random number, val, in range [0,127] is generated. Weapon triggers if val is within [0,WeaponChance].
- MaxConcurrentBossWeapons: How many possibly present on the screen at once.
World | Hit Amount | Think Max Count | Weapon Chance | Max Concurrent Boss Weapons |
---|---|---|---|---|
1 | 3 | 60 | 0 (N/A) | 0 |
2 | 5 | 60 | 80 | 1 |
3 | 5 | 60 | 80 | 2 |
4 | 7 | 20 | 128 | 2 |
Bat movement
I went for a super simple path and simple logic for the bats: face the same direction as the boss when spawned with the same speed upwards, which means that bats always move in 45 degrees. Bounce when hitting the wall. After a maximum of 6 bounces, the bat flies out of the screen, and can be re-spawned.
Visual effects
When entering the boss room I added a full screen shake effect. This is easily made possible using the VDP screen adjust registers, the same being used for the smooth horizontal scrolling. A dedicated “slam door” sound effect is played at the same time. The screen shake is also used on the final blow on the boss, and a reduced variant is used when the boss spawns a bat.
Entering the boss room in 50% speed showing screen shake. |
When Abaddon charges and spawns a bat, I added full screen blinks, which are supposed to illustrate lightning occurring. This is achieved by changing the black palette entry for a few frames. I kick off a lightning sound effect at the same time.
Lightning effect when spawning bats at 25% speed. |
When you have finally beaten Abaddon, he dissolves, as can be seen below. This effect is used in one place only in the whole game. The effect is quite costly, but there isn’t much else going on at the time, so that works fine within each frame after all. The algorithm uses an array of pre-calculated masks in ROM which I made using google sheets or something. The algorithm runs through all 144 boss 8x8 sprite patterns and applies a mask onto the original pattern and uploads to VRAM. After 64 runs, all pixels are masked out. Boss sprites are originally 16x16 but can also be treated as 8x8.
Boss finally beaten and facing dissolvement. Rendering at real time speed. |
Full screen text scroll (options)
I was in need of a simple text scroll for game info, credits and such. Going smooth was a requirement of course, so the decision landed on screen 5 and use of the screen hard scroll register, VDP R#23. In a screen in the options section, not much other than music playback is going on, so I decided to write functions in C to handle this scroll. For everything except the low level proportional font “blitting”, that is.
The basic idea is that the VDP-register(#23) can get any value (offset) from 0 to 255, which means that the contents of every VRAM address in the current page can risk being shown as pixels/colors at a given point. Thus, the sprite data in this page could not be active.
The algorithm uses a slightly modified variant of the normal text drawing routine described in Proportional font. The difference is that the height of the source bitmap is 1 vs 8 in the original routine. And when calling it, an y-offset is added to the source’s y-pos. The result is that it only “prints” a thin slice of each character, and this makes it fast enough for a 60 FPS scroll.
With an offset of 3, only this part of the character ‘R’ is copied from the tileset. |
The y-offset will always be in the range [0,7].
With this functionality, we are able to scroll one pixel and also “print” one slice of each character in the text line at the same time. When scrolling upwards, this slice-printing happens right “under the border” at the bottom of the screen, and when scrolling downwards this printing happens right “under the border” on top of the screen. This printing happens after first clearing that thin area/line with a fast and black rectfill.
Making "smooth execution" possible
Never exceed frame time
I wanted the game to run smoothly. Particularly ingame. That means that I need full control over what is done in one frame. The callstack is like this:
sceneLoop() // stays in this function while mode is "ingame"
sceneRun() // called once per frame
The scenerun function is the function that should not exceed a frame. That is, the cycle-spend should not exceed these numbers:
- PAL (“50 FPS”): 71364 cycles (3579545/50.159)
- NTSC (“60 FPS”): 59736 cycles (3579545/59.923)
This is given by Z80 running at ~3579545 Hz (or so) and PAL running at 50.159 Hz and NTSC running at 59.923 Hz.
Visualizing where the time is spent
By using the good old “color-the-border” we can see where our time is spent, and that we are within our bounds.
Showing cycle spend with openMSX 18 in NTSC mode on a Panasonic FS-A1GT with speed set to 2%. |
BLACK | Waiting - except the first black part after the first interrupt (=scenerun start). |
RED | Interrupt (marked) and parallax. |
CRIMSON | Mixed visuals: Time, timers, VRAM-time, VRAM-coin, VRAM-score + set ticklist. |
LIGHT YELLOW | Bump bricks. |
YELLOW | Handle objects (scene tick, visibility). |
ORANGE | Place player. |
BLUE | Handle sprites - visibility, sprite tick, put sprites in RAM (sprites 1). |
DARK BLUE | Putting sprites to sleep (sprites 2). |
GREEN | Sprite collision & awake sprites (sprites 3). |
LIGHT GREEN | Animations: coin, candle. |
BROWN | Set player frame and sprite pattern page. |
DARK BROWN | Pickup coins. |
PETROL | Handle input. |
GRAY | Top: Viewport handling. Below line interrupts: Tiles➔VRAM (2 parts). |
WHITE | Audio (first sfx, then music). |
The red lines that come from the interrupt always happen on the exact same y-position.
From the colors we can see that functions vary in their duration and that they do not necessarily run every frame.
This animation is from a debug build, but it is quite representative. The C-compiler can be run with optimisation settings which will make a few of the code paths slightly faster. The final ROM is compiled using this. In addition, the code to change the color of the border comes with a small cost, which of course is not included in the final build. And in debug there are a lot of extra shortcuts available, so the petrol-colored “Handle input” will be much smaller/faster in the released version.
When doing visuals like this, make sure you test different machines, as there are different performance numbers amongst the different MSX-models on the market. This is mostly due to different, so-called “wait-states” in the different MSX-Engines or MSX-System chipsets. There are 0, 1 or 2 waitstates added (see https://www.msx.org/wiki/Toshiba_T9769#Differences_between_revisions).
Comparison using openMSX of how the different wait-states affect the execution duration, singling out the tile-”blitting”. The first gray block is 612 OUTs and OUTIs. |
Dedicated ISR, no H.TIMI or BIOS usage
Last time I hooked up the profiler on the default, “empty” interrupt, it showed ~5000 cycles on a Philips NMS 8255 and ~2000 on a Panasonic FS-A1GT. That is a lot of cycles. Cycles I’d rather use myself. I planned to handle keyboard routines myself and I did not use disk I/O or any other things from the system. In such a case, I gathered that it is better to just hook up my own ISR at 0x0038 using IM1.
Aside from a call or two at startup I also stayed away from using the BIOS altogether. The reasons for this was:
- Having your running code in page0 makes it complicated to get efficient BIOS usage (which also resides in page0).
- Because of the custom ISR and my need to fully control when DI and EI is set.
- My need to have full control over all the cycles spent.
When going rogue like this you also miss system support for extensions that may use non-standard ports, “external” or alternate ports. For example, the NEOS MSX2-upgrade uses ports 0x88-0x8B instead of the normal ones at 0x98-0x9B. Lilly’s Saga decided to not support external ports. To my knowledge the install base of external VDPs is very small.
Craft your levels to perfectly match your engine
Several things have been carefully considered when making the levels, things like:
- Number of enemies in one location: Maximum dual-sprites in rental is 6. From testing I know that 6 at the same time is pushing the cycles. Locations with maximum sprites can have no or very few tick-objects.
- No enemy-enemy collision is implemented, so to avoid exposing this, I avoid multiple enemies on the same spot. Exceptions are ghosts and cannonballs thrown from Cloudy.
- Avoid more than 2 enemies per line to avoid flicker (not about performance though).
High level and structural optimizations
Spreading work over multiple frames
Splitting the work into sensible chunks and running them at specified times or frequency, can be done in different ways. Ideally each function has a predictable running time as well.
Frame load balancer
This idea takes into account that some things, like visuals must happen every frame, while others, like parallax runs only every second frame or updating the visual timer on screen runs every fourth frame or the palette needs only a visit every eighth frame. As a fundamental mechanism I have a counter called jiffy which counts up, first thing in every frame. This number can easily be used for spreading out the work.
To be able to set up a sensible spread of work across frames you will need to use the profiler on the various functions to see their cost, as well as making a call on how often they should run. A visual representation of how this works in Lilly’s Saga, is shown below.
As an example, the cost of running frame number 9 will only involve the marked functions.
If you are really tight on cycles and want to cram in as much as possible, this work may end up comprehensive and advanced. I had to make some spreadsheets to play with different scenarios and to get it optimal.
In effect, this is what you see in the animation in previous chapter where the colors are jumping up and down, and are only present in some frames. For example, the light yellow color happens only every fourth frame (after line interrupt 2) and is running the bump-brick code. As the bump-brick code uses double-buffering, the visual effect of this code is shown in the succeeding frame.
In practice, it won’t end up looking too nice in the code. In my sceneRun-function I have lots of tests that look quite ugly. Flow can be hard to follow in code, but it’s working and part of a plan. Cost wise, in the grand scheme of things, a test like this is really cheap.
Large number of tests pollute the code in sceneRun (the framecounter variables are made from jiffy). |
Hangover-frame
The hangover-frame kicks in exactly when you bump your head into something. |
The hangover frame happens when you hit something with your head, and a series of things needs to be carried out. As the name (almost🙂) implies, work is carried over from one frame to another. The work can be blinking tiles, kicking off animated tiles (which in turn should kill an enemy) and/or kick off some sound effects or other stuff. Trying to put all this into the current frame which is already pushed on cycles proved difficult. However, the placePlayer-function which does the player movement and collision detection (which is a very costly function as well) can be ignored for one frame at the exact point where the player hits something (!).
In this case, the player is changing from moving upwards to moving downwards, so we just allow the player to hang in the air for one frame, while doing the other stuff mentioned earlier. No one will notice that there is no player movement in that frame. This is attempted illustrated in the outstanding piece of art found below.
The above is run every frame as part of sceneRun. Skipping the heavy PlacePlayer if a “HangOverFrame” is pending from the previous frame/sceneRun. Lilly will be “hanging in the air” for one frame. |
The only risk is that people could sense that the effect is coming too late. Maybe some high-sensitivity people notice that the sfx is coming in a bit too late? Well, I have accepted that risk.
Player kills enemy - delayed work using states
When an enemy is killed there is also a lot of work involved, like:
- The state of the enemy changes.
- The enemy is starting moving on a path out of the screen.
- We are kicking off a sound.
- We are adding points.
- We are probably also putting the player onto a bouncing path.
This work can easily be spread over multiple frame using states on the particular enemy:
- (SPRITE_MODE_NORMAL)
- SPRITE_MODE_ENTER_KILLED
- SPRITE_MODE_KILLED
When I detect an enemy kill, I only change the state to SPRITE_MODE_ENTER_KILLED, change the pointer to the run-function and then return. All the rest of the stuff is handled in succeeding frames.
No dynamic object creation during gameplay
In the initial design for this game and the rental system, the sprites were dynamically created and deleted on the fly. For example, the exploding bricks and the floating points could be typical examples of short-lived objects. However, over time, I found that I could not afford creating and deleting objects on the fly. I wrote a super-efficient memory manager, “pre-populated” template objects in memory, wrote code in asm and managed to get it fast. But it was not fast enough. And it was unpredictable. As the creation or deletion could occur at any time in a frame, it could accidentally happen at “the wrong time”, causing exceeding the cycle budget.
I ended up with this:
- Many objects are pre-created, singular re-usable sprites (exploding bricks, floating points, halo, tree-trunk, and crate).
- All enemies are spawned/created at level load and then put to sleep.
Creating all sprites up front naturally demands enough heap memory and does not scale for bigger games or levels. –For this game though, it was all part of the plan.
Ticklist and reduction of active objects
A level will typically have around 50+ objects, and maybe half of these are ticklist-objects, which means that an object has its own “run”-function. Running 25 objects every frame will be very heavy for the poor Z80. We want to only do the run on the objects that are visible.
Doing visibility testing on all the objects on the level all the time is also very costly. So we also want to reduce the number of objects that we iterate on at any given time.
The location of any object that is part of the ticklist (i.e. ticklist-object) is always static. We use this fact to premake lists of objects for the different parts of the level. The game has a concept of a screen (Ref. Level construction). In the level-creation-script I generate a list of ticklists for every screen in the level.
Click to enlarge. Huge cpu savings by using the technique. |
In the illustration above, we have 7 screens. The player is currently tied to screen 6 because the viewport’s midpoint is in this screen. The contents of the ticklist for any screen is the inside plus 50% in each direction horizontally and vertically (only showing a horizontal level here for simplicity, but this works identically for levels with a vertical span)). For screen 6 above, the ticklist’s contents are:
- 4 moving platforms (gray)
- 2 reveal objects (pink)
This means that we only need to iterate over 6 objects instead of 25 at runtime. The algorithm in sceneTick() is like this:
if object.boundary intersects viewport:
object.run()
For the example above, we end up running the run-function in 4 objects, 2 moving platforms and 2 reveal objects.
Across all the levels in the game, the maximum number of objects in the ticklists for the levels varies between 3 and 9.
Sleeping sprites via sprite-ticklist and the Area system
Sprites use a different management system for ticks than objects, because of these reasons:
- Enemy-sprites does not always have a defined boundary to move within
- Enemy-sprites should move/tick outside the current viewport (to a certain extent) otherwise the game looks crap
In the game the different levels have between 11 (first level) and 32 (the two last levels) enemies.
The main idea is to have two lists instead of one. Any enemy in the level is created up front and is referenced in either of these two lists:
- SpriteTickList
- SleepList
Ideally the amount of sprites in SpriteTickList is as few as possible as this means that there are few sprites to iterate over when running the SceneSpriteTick()-function.
To move sprites between the two lists I needed an effective system, so I created a system called the Area system. It is a grid with cells of 64x32 pixel size.
The reason behind yet another coordinate system on top of pixels (1x1), tiles (8x8), objects (16x16) is computation speed. Thus I needed a coordinate system that could give the location of a sprite with 8-bit values instead of 16-bit. To see if a sprite needs to go into sleep (putToSleep()) or out of sleep (awakeSprites()), I compare their area values with the area value of the player. 8-bit comparisons are significantly faster than 16-bit.
Click to enlarge. |
The algorithm in sceneSpriteTick() is like this:
sprite.run()
Prior to its execution the functions named sceneCalculateViewport() and sceneCheckSpriteVisibility() are called. They set the viewport, and check if sprites in the SpriteTickList are inside this viewport. If outside, they are removed from the rental system or if entering inside, they are added.
The run-function for any sprite that has an animation in addition to movement, will check whether it is currently visible and performs animation/visuals only if it is visible (i.e. is part of the rental-system).
The related calls are then:
sceneCheckSpriteVisibility()
sceneSpriteTick()
putToSleep()
awakeSprites()
The putTosSleep-function became slightly too costly, so I made it alternate between working on the first half, and the second half (the so-called “putToSleepIterative()” function). In other words, it iterates over 50% of the list every frame.
The awakeSprites() needs to iterate over almost all sprites in the level, so this one is obviously expensive. To deal with this, I made it iterate over a maximum of 4 sprites every frame, then return and resume where it left in the next frame.
There are other sprites in the game or level that are part of this system as well, but they are almost always inside the viewport and not making too much difference. These are:
- Exploding bricks (4)
- Floating points (100/200/100/2000)
- Halo / Circle around player
- Blocking tree trunk
- Crate
Calculating and storing another set of coordinates (i.e. the Area system, see Part III) comes with overhead, but my calculations at development time showed me that there were some cycles to be won by doing it this way.
Objects + Sprites tick every other frame (25/30 FPS)
Every object and sprite has their own run-function. Running multiples of these every frame is costly, so the function itself should be efficient. Still, I ended up with many situations with too high a load. I chose to skip every other frame for both objects and sprites. This certainly has a negative visual impact, especially when the speed is high (like enemy bumped off screen), but I can live with it.
Instead of alternating objects and sprites, I run both every frame, but each handles 50% of their respective list at each run.
There are some exceptions to this scheme:
- Player runs and animates every frame.
- Crate runs every frame.
- Boss and bats run every frame.
The crate had to be updated in the same frequency as the player, as pushing the crate looked very buggy when they differed.
Avoid enemy roaming if you can
If hammer helmet was given in this level, crushed bricks would leave the blob walking in the air (blob’s boundary in blue). |
The only enemy left with (a simplified) collision checking is the rolling stone.
All in all, this comes with a great impact on your level design and gameplay.
Turn off code paths not needed (the boss scene)
Obviously, the boss-scene has no scrolling, so it makes sense to take advantage of this, and disable the scrolling-functionality altogether. The logic to run the boss-behavior is much more advanced than all the other enemies, so this was very much needed anyways.
Baking information into tile-ids removes storage and calculation needs
In the effect where the player can walk on a curved surface, tiles were in need of a place to store information about their vertical displacement (y-offset). The value would be in the range [1,7] (0 is no offset). I needed something that stored info efficiently and had little performance impact.Rope bridge - walking on curved surface shown in half speed. |
I already had a look-up system for properties of the different tiles, so I hinged it onto this, by using one of the bits for a y-offset property.
When the player is placed out on screen, there is a collision detection towards the background tiles. If the tile below the player has the COLLISION_TOP bit set, the player is placed on top of this tile. If COLLISION_YOFFSET bit is set as well, the player is placed on top of this tile plus an offset tied to the specific tile
By baking this offset-value into the tile-id itself, there is no extra storage needed and the execution code has the value calculated already.
Tile-id 100 (0x64) immediately gives 4 pixels as offset via the id itself. |
The game uses the following simple c-code to displace the player, and SDCC converts this to perfectly efficient assembly-code.
playerYOffset = tileTileIDBelow[1]&0x07;
else
playerYOffset = 0;
nGlobalPlayerPosY = (nGlobalPlayerTileY*8) + playerYOffset;
Scrutinize handling of lists
This part is leaning towards low level optimisations, but at least the few cycles gained comes from the structure of things, so I included it.
In the beginning of the project I was trying out different kinds of optimisations to be used when traversing long lists of either objects or sprites (which meant elements would hold a 16-bit memory address). Inner loops are always important to optimize, as the instruction costs multiply fast.
“Not only should it be fast to traverse, but also adding/removing elements needed to be fast as well. Normally it is hard to get both.”
When traversing, the fastest possible structure is a plain array and a given number of elements. When adding and removing I was not sure what structure would be the fastest, so I tried out several variants like linked and double-linked lists, as well as different array variants. In the end I ended up with something I called a StackList, as the best overall solution. It consists of both a stack and a list. The list part is a plain array which is used when traversing, and the stack part is used when adding and removing elements, as it keeps track of which positions in the array that are available. As a performance measure I use 8 bit values in the stack, so the maximum size of the list/stack becomes 127. This is more than enough for my purposes. The drawback with this solution is that the list will sometimes contain NULL values (right after a removal), and hence the traversing code must do a quick check for NULL. This can be done quite efficiently and way faster than any of my other suggestions (apart from a plain array with no checking of course).
List 1: The list when initialized and empty. |
List 2: 5 elements have been added in sequence. |
List 3: Building on list 2 and first removing element 3, and then removing element 1. |
This structure risks getting “holes” in it, but the stack ensures that any hole is filled before extending the list, which in general keeps it pretty condensed.
This Stacklist is used in the game in these places:
- Sprite ticklist
- Sprite sleeplist
Originally it was also used for objects in their ticklist. However, due to Ticklist and reduction of active objects these lists are now plain arrays with no NULL-checking when traversing.
Low level alert: During traversal the reading of a 16-bit value into registers, the checking for NULL and loop handling costs around 50 cycles per element (“the loop overhead cost”). Cost of adding an element is 206 cycles (which would be only 180 if the list address would be aligned to 0x0100). Removing an element costs 243 cycles. This was the fastest I could do.
Taking (risky) shortcuts in the name of performance
One may always take shortcuts in the code to optimize. There might be extra checks that should be done to make the code more robust. Almost always, robustness is the way to go. However in tight loops or on a tight cycle budget you might think otherwise. In Lilly’s Saga I’ve done this a couple of places, despite not recommended.
For example, if crates are pushed outside of the screen it will overwrite memory in other places as there is no checking for this. Instead, it became a design decision that level design should not allow this.
Furthermore, a less risky approach, a large portion of level build-up and data consistency is not checked during runtime as would be common in modern or good practice programming, but is moved into the level creation tool instead. See more in Error checking (part III).