Horizontal scrolling & key concepts

The top 16 lines in this video are not visible to the user (used only in the vertical/diagnoal movement part of this algorithm).

How the basics work

Main concept

The core cycle:
  • The algorithm works in steps of 8.
  • From horisontal set adjust = -8 to 7, this gives a span of 16 pixels which is passed in 8 steps with 2px/frame
Each frame (frame 0-7), before it is drawn on the screen we do this:
  • on the current visible page:
    • 1. Move the screen 2 px to the left (SET ADJUST)
    • 2. "Blackstrip": Fill a "one byte = 2 px" vertical line with the color: black at the left side. (CPU/OUT)
    • 3. "One-strip": Fill a "one byte = 2 px" vertical line with the pixels from the tiles in the level data (CPU/OUTI)
    • Point 2 and 3 can needs to have its pixels done at destination before the rasterbeam catches up.
  • on the background page:
    • VDP-copy one of 8 rectangles from current visible page to the background page.
Elaboration of the special 8th cycle (frame 0):
  • We VDP-copy a rectangle containing all the pixels on the right hand side of the screen. This VDP-copy will copy the same area that is currently being written to by the CPU(!)*
  • We swap buffers/pages. The current background page becomes visible, and the current visible becomes the background page.
  • We clear (make black) a rectangle of width 16 pixels at the right hand side (VDP FILLRECT)
  • We "reset" set adjust to -8, fully towards the right.
The 8 target rectangles are like below (sources are just 16 px offset in x direction):


  • For frame 1-7 it is important to kick off as big rectangle as possible to keep the VDP busy the whole frame. For frame 1-6 it kicks off immediately at VBLANK. For frame 7 we start off doing a small part of painting the strip inside rectangle 7, then we kick off the VDP copy before we continue doing the rest of the strip.
  • Frame 0 has a smaller rectangle, because because we do two VDP commands. First we kick off the fillrect as mentioned above. This HMMV fillrect is 100% faster than the HMMM. Then we do a good chunk of other timed CPU work. We time the work to be finished pretty precisely when the fillrect job is finished. An this point we kick off the VDP copy 0.
  • The rectangle sizes have been worked out by trial an error.

How to write vertical lines "less slowly" using CPU

As long as the bytes you want to write are not horisontally in sequence, you need to set the address of the VRAM pointer before each each write, that is, before each OUT to the VDPIO port. This is a common way to set the VRAM pointer/address:



Add a call to this as well, and you have a cost of 153 cycles.

Not counting setting up the target AHL, and assuming a cheap write of color 0 by:

    xor a;
    out (VDPIO),a

gives 17 cycles. This would give a total of 176 lines * (153+17) = 29920/30800 cycles. That is a simplied calculation which eats more than 50% of your frame time.

—Obviously we need to precalculate any value that goes to the VDP PORT.

We can also ignore setting reg#14 repeatedly. It needs only to be set once for all values above line 128, and once for all at and below 128.

Thus I have large tables of precalculated VRAM addresses. For writing one byte (2 pixels) at a given place, I end up with these unrolled commands in a macro:

    outi
    outi
    nop ; nop is needed to obey speed limit
    out (VDPIO),a

Total for this is 53/56 cycles for a byte. Ignoring that there is some setup, and at line 128 we must set reg#14, one black line costs us a minimum of 53*176 = 9328/9856 cycles. Still costly as h***.
Similarly, the macro we unroll for a strip (data from tiles), it gets a bit more expensive:

    outi; C = VDPPORT1
    outi
    exx
    outi; C = VDPIO
    exx

Total for this is 64/67 cycles for a byte. Ignoring overhead as above, and the vertical line on the right hand side costs us a minimum of 11264/11792 cycles.

How it looks

Here from a PoC with this horizontal scroll tech implemented:



* this is danger zone. We have issued a copy rectangle command which starts in upper left corner and goes left to right, top to bottom. When this is running, we also have the CPU filling the edge of this rectangle with a "strip" of 2 pixels from top to bottom. With the V9938 and V9958 VDPs on the market, this parallellism works fine, and the VDP command does not catch up with the CPU. This is where several emulators break, because they either do the copy in an instant, or they do it too fast. At the time of writing, even OCM does this too fast. This results in a rectangle copied onto the background page with lots of unfilled black pixels.

The screen 5 scroll  |  Horizontal scrolling & key concepts  |  Diagonal scrolling - the afterthought, hack and illusion

Comments