C64 BASIC: Performance Tuning

There’s a bunch of heuristics and rules for getting BASIC code to run slightly faster. Some of them even actually help. The core principles, though, are that command interpretation is work and so you want to do as little of it as possible. This has some occasionally surprising side effects. That means that to be sure what’s going on you’ll need to do some measurement.

Our Sample Task

For our initial experiments, we will put the checkerboard graphic at a hundred random places on the screen. Now, while Sinclair BASIC would let us place the cursor with a command like

PRINT AT INT (RND*22),INT (RND*32);"{H}";

we don’t have PRINT AT in Commodore BASIC. The KERNAL, however, does have a routine called PLOT at $FFF0 which will position the cursor for us. Thus, here is our first cut at the routine, in petcat format:

  10 print "{clr}{gry3}";
  20 s=ti
  30 for i=1 to 100
  40 y=int(rnd(1)*24):x=int(rnd(1)*40)
  50 poke 781,y:poke782,x:poke783,0:sys 65520:print"{CBM-+}";
  60 next i
  70 e=ti
  80 print "{home}{lblu}total time:";(e-s)/60

UPDATE, 10 Sep 2017: This listing has been updated to fix the POKE to 783 (clearing the status flags). The original post zeroed 780 instead, which cleared the accumulator. The PLOT command requires the carry flag to be clear in order to move the cursor. (Otherwise it reads the cursor position instead of setting it.) This edit does not alter the timing information.

We time our program by consulting the TI variable. This is initialized to 0 at power-on and increments every time the clock-tick IRQ is processed. That’s 60 times per second under normal operation, and 0 times per second when we’ve disabled interrupts. We don’t have to worry about interrupts this time, so TI works fine and will give us frame-level accuracy.

baseline

Running this program shows an average runtime of 4.47 seconds. That will be our baseline.

Lowering Instruction Count

Our inner loop is executing seven full BASIC commands per loop, all to perform cursor positioning. This isn’t even the usual way people did cursor positioning, though; usually they would exploit the way you could put cursor-move characters into strings. String operations permit a lot of work to be done and can be surprisingly fast. If we add a new line and then replace line 50:

  15 y$="{home}{24 down}"
  50 print left$(y$,y+1);tab(x);"{CBM-+}";

This speeds up the average runtime to 3.87 seconds, which is nearly 15% faster. That’s not bad at all, and clever use of LEFT$ and RIGHT$ can be used to get reasonable horizontal scrolling effects in pure BASIC.

This is a point where we then need to think through what our goal really is, though. We’ve been moving the cursor around a bunch, but really all we need to be doing is setting the screen and memory. We can delete line 15 and make line 50 be a direct poke to screen memory:

  50 poke 1024+40*y+x,102

This drops the runtime to 3.65 seconds, and we can do even better by just generating the screen memory offset in one go:

  40 o=rnd(1)*1000
  50 poke 1024+o,102

This takes us down to 2.26 seconds! However, we’ve cut a corner here. We’re assuming that the color memory already has the value we want. If that’s not true, we’ll need a second poke:

  40 o=1024+rnd(1)*1000
  50 poke o,102:poke o+54272,14

This brings the cost up to 3.27 seconds, which is a significant jump but also more of an apples-to-apples comparison. Our real savings here is in not having to create the Y and X variables and then mix them together; we are doing much less math. If we bring back our original line 40 and compute our POKE from that:

  50 o=1024+40*y+x:poke o,102:poke o+54272,14

This ends up slightly slower than our initial baseline code, at 4.82 seconds!

Making Instructions Faster

C64 BASIC is an interpreted language, and it doesn’t cache its intermediate values. That means that every time we ran into a number inside our inner loop, it had to reparse the number as a floating point constant. We can make it cache that by hand by storing it in a variable. Variables are stored in a linked list and finding it is a series of two-byte compares (against the variable name’s prefix) and list traversals. That’s quite a bit faster. If we take our baseline code above and store all our constants in variables, with these two edits:

  15 a=781:b=782:c=780:d=65520
  50 poke a,y:poke b,x:poke c,0:sys d:print "{CBM-+}";

This drops our baseline to a respectable 3.32 seconds, nearly as fast as directly editing the screen and color memory.

Variables are also created as needed, and as we saw last time when we discussed the BASIC memory map, all the “normal” variables live before all of the arrays. This means that if you’ve allocated a bunch of arrays and then refer to a normal variable for the first time, the BASIC runtime has to copy your entire array space seven bytes to the right every time you do this. That’s probably only going to be an issue during initialization, but you’ll do yourself a favor by at least setting default values for all your normal variables before your first DIM command. Variable accesses end up taking a variable amount of time even so, and as a result it’s also probably best, if again a bit marginal, to initialize your most-referenced variables first.

GOTO and GOSUB also take a variable amount of time based on your program structure. The lines of your program are stored in a singly-linked list, and jumps forward in the code will scan forward based on where you are right now. Jumps backwards, however, search from the beginning of the program. Some designers suggest that the most time-critical routines should thus be put at lower line numbers. That’s less flagrant here, but it’s still noticeable: if we replace our FOR/NEXT loop with a poor man’s DOWHILE:

  30 i=1
  60 i=i+1:if i<=100 then 40

This slows the program down from 3.32 seconds to 4.05. (FOR loops, like RETURN statements, need to possibly jump into the middle of program lines. As a result, they store exact addresses and can jump more quickly.

Loops Are Slow

Let’s fill the screen with a single character and color. We can do that with a simple loop:

  10 s=ti:c=54272
  20 for i=1024 to 2023
  30 poke i,102:poke i+c,5
  40 next
  50 e=ti:print "{home}time: ";(e-s)/60

This ends up taking a bit over ten seconds. We can also do this with a few more loops:

  10 s=ti:print "{home}{grn}";
  20 for i=1 to 24
  30 print "{40 CBM-+}";
  40 next
  50 print "{39 CBM-+}{home}{lblu}";
  60 poke 2023,102:poke 56295,5
  70 e=ti:print "time:";(e-s)/60)

This is a much larger program, but it also runs over twenty times faster, clocking in at 0.43 seconds. Strings are a lot faster than loops in BASIC.

Loops are, of course, much faster in machine code. But don’t forget that in order to get your machine code into place, you’re probably using a BASIC loader of some kind. That means that you don’t gain anything by letting the machine code load sprite or custom character data into place; you’re paying the cost of loading it from BASIC just putting it somewhere the machine code can see.

Other Ways To Branch

C64 BASIC doesn’t really have a boolean type. When you perform a comparison operation, it evaluates to the value 0 if it is false and -1 if it is true. You can do math with this, and if you do, that spares you a great deal of jumping around. This line, for instance, reads joystick port 2 and computes delta-x and delta-y values from the result:

J=PEEK(56320):X=((J AND 4)=0)-((J AND 8)=0):Y=((J AND 1)=0)-((J AND 2)=0)

And this expression converts a value between 0 and 15 to a hex digit:

CHR$(I+48-7*(I>9))

You could have the resulting values be line numbers instead of character codes, and this turns GOTO into a fully-functional conditional switch statement. Just remember that this is a trick you’re doing, though. Automatic program renumbering programs can’t reverse-engineer the targets of GOTO expressions that are the results of mathematical expressions. It can’t make sense of

GOTO 100-100*(X=10)

but it can make sense of

ON -(X=10)+1 GOTO 100,200

One particularly nice additional effect here is that if while if an IF statement fails its text, execution proceeds to the next line, an ON statement with an out-of-range constant will proceed to the next statement. That’s handy for when you want to do something like an IF/THEN/ELSE and don’t want to add a lot of extra lines.

IF X=1 THEN ON (Y=1)+1 GOTO 1000:PRINT "X AND Y ARE 1"

This proceeds to the next line if X isn’t 1, goes to line 1000 if X is 1 but Y isn’t, and prints out a message before proceeding to the next line if both are 1.

You probably don’t want to get too clever with that stuff. If you’re making complicated decisions in BASIC it’s usually best to stick to IF‘s ordinary branches and ON‘s jump tables. But they’re there to use, and sometimes they’re even reasonable.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s