Horizontal scrolling & key concepts
The top 16 lines in this video are not visible to the user (used only in the vertical/diagnoal movement part of this algorithm).
How the basics work
Main concept
The core cycle:
- The algorithm works in steps of 8.
- From horisontal set adjust = -8 to 7, this gives a span of 16 pixels which is passed in 8 steps with 2px/frame
- on the current visible page:
- 1. Move the screen 2 px to the left (SET ADJUST)
- 2. "Blackstrip": Fill a "one byte = 2 px" vertical line with the color: black at the left side. (CPU/OUT)
- 3. "One-strip": Fill a "one byte = 2 px" vertical line with the pixels from the tiles in the level data (CPU/OUTI)
- Point 2 and 3 can needs to have its pixels done at destination before the rasterbeam catches up.
- on the background page:
- VDP-copy one of 8 rectangles from current visible page to the background page.
- We VDP-copy a rectangle containing all the pixels on the right hand side of the screen. This VDP-copy will copy the same area that is currently being written to by the CPU(!)*
- We swap buffers/pages. The current background page becomes visible, and the current visible becomes the background page.
- We clear (make black) a rectangle of width 16 pixels at the right hand side (VDP FILLRECT)
- We "reset" set adjust to -8, fully towards the right.
- For frame 1-7 it is important to kick off as big rectangle as possible to keep the VDP busy the whole frame. For frame 1-6 it kicks off immediately at VBLANK. For frame 7 we start off doing a small part of painting the strip inside rectangle 7, then we kick off the VDP copy before we continue doing the rest of the strip.
- Frame 0 has a smaller rectangle, because because we do two VDP commands. First we kick off the fillrect as mentioned above. This HMMV fillrect is 100% faster than the HMMM. Then we do a good chunk of other timed CPU work. We time the work to be finished pretty precisely when the fillrect job is finished. An this point we kick off the VDP copy 0.
- The rectangle sizes have been worked out by trial an error.
How to write vertical lines "less slowly" using CPU
As long as the bytes you want to write are not horisontally in sequence, you need to set the address of the VRAM pointer before each each write, that is, before each OUT to the VDPIO port. This is a common way to set the VRAM pointer/address:Add a call to this as well, and you have a cost of 153 cycles.
Not counting setting up the target AHL, and assuming a cheap write of color 0 by:
xor a;
out (VDPIO),a
gives 17 cycles. This would give a total of 176 lines * (153+17) = 29920/30800 cycles. That is a simplied calculation which eats more than 50% of your frame time.
—Obviously we need to precalculate any value that goes to the VDP PORT.
We can also ignore setting reg#14 repeatedly. It needs only to be set once for all values above line 128, and once for all at and below 128.
Thus I have large tables of precalculated VRAM addresses. For writing one byte (2 pixels) at a given place, I end up with these unrolled commands in a macro:
outi
outi
nop ; nop is needed to obey speed limit
out (VDPIO),a
outi
nop ; nop is needed to obey speed limit
out (VDPIO),a
Total for this is 53/56 cycles for a byte. Ignoring that there is some setup, and at line 128 we must set reg#14, one black line costs us a minimum of 53*176 = 9328/9856 cycles. Still costly as h***.
outi; C = VDPPORT1
outi
exx
outi; C = VDPIO
exx
outi
exx
outi; C = VDPIO
exx
Total for this is 64/67 cycles for a byte. Ignoring overhead as above, and the vertical line on the right hand side costs us a minimum of 11264/11792 cycles.
How it looks
Here from a PoC with this horizontal scroll tech implemented:* this is danger zone. We have issued a copy rectangle command which starts in upper left corner and goes left to right, top to bottom. When this is running, we also have the CPU filling the edge of this rectangle with a "strip" of 2 pixels from top to bottom. With the V9938 and V9958 VDPs on the market, this parallellism works fine, and the VDP command does not catch up with the CPU. This is where several emulators break, because they either do the copy in an instant, or they do it too fast. At the time of writing, even OCM does this too fast. This results in a rectangle copied onto the background page with lots of unfilled black pixels.
The screen 5 scroll | Horizontal scrolling & key concepts | Diagonal scrolling - the afterthought, hack and illusion
Comments
Post a Comment