Skip to navigation

Source code clues hidden in the game binary

Snippets of original source code and what they can tell us

Assembling large machine code programs on memory-starved 8-bit home computers can be a tricky process. Assembly language source code is always considerably larger than the machine code that it produces, so if you're trying to build a machine code binary that fills your computer to the brim, assembling the whole thing in-place is not an option (at least, not on the original hardware).

The most popular approach is to split the source code up into smaller batches, and then assemble each batch to produce a set of smaller binaries that you can concatenate into the final game binary. On the BBC Micro, this is fairly easy to do with the assembler that comes built into BBC BASIC, with each part assembling its code, saving it to a file, and then loading the next BASIC program to assemble the next part.

A side-effect of this approach is that unless you clear down the computer's memory between program loads - something you are extremely unlikely to do, as this process relies on variables retaining their values between parts - then you will be left with fragments of the previous part's program and its assembled code in memory. If the source code defines a variable's block of memory by simply incrementing the program counter in P%, rather than using an explicit sequence of EQU commands to zero the block, then the block will contain whatever was already in memory, and whatever was already there will then be saved into the finished game binary.

As a result, it is pretty common to find bits of original source code buried in game binaries, particularly with large games. The Sentinel is no exception, so let's take a look at the secrets that are buried in the released game.

The first snippet of source code
--------------------------------

One of the easiest ways of tracking down clues in a game binary is to load the binary into a hex editor. Hex editors show the contents of the file both as hexadecimal bytes and as ASCII characters, so if there's a block of original source code hidden in there, it should be fairly obvious. There is only one game binary file in The Sentinel, and if you load this into a hex editor and jump down to offsets &3C30 (15408) and &3F00 (16128), you should be able to see two snippets of assembly code in there (you can grab the file from the accompanying repository if you want to try this).

In the case of hidden code from the BBC BASIC assembler sources, the embedded assembly language is generally quite readable, though the surrounding BASIC is tokenised and line numbers are stored as integers rather than ASCII text, so the source code appears as assembly language, embedded in random noise. Luckily it's easy enough to copy the source snippets into a modern text editor and work out the line numbers (the first two bytes at the start of each line contain the line number, high byte then low byte, followed by the line length and then the line itself).

The Sentinel binary contains two big chunks of original BBC BASIC assembler source code. Both of them match the code in the game binary.

The first block is after the ConfigureMachine and ClearMemory routines, which run at address &3F00 and are only used during the loading process. The snippet of source code pads this code out to the nearest page boundary at &4100, so presumably these routines were saved out in a block of &200 bytes.

The buried source code looks like this, once the line numbers have been decoded:

            ...ets6
  4810      LDX#6:JSR CFLSH
  4820 
  4830.ets6 rts
  4840 
  4850 
  4860 
  4870 
  4880 
  4890 
  4900 
  4910.MINI LDA#128:STA MEANY,X:STA MEMORY,X
  4920      LDA#0:STA MEANYSCAN,X
  4930      LDA#64:STA MTRYCNT,X:rts
  4940
  4950.MEAN LDA#40:STA COVER
  4960      LDX ETEM:STX XT...

It's worth me pointing out (in an excited voice!) that this is part of Geoff Crammond's original source code for The Sentinel. He literally wrote this - it's in his own, personal style, with his own indented layout, spaces between the mnemonics and variable names (but no spaces between mnemonics and numbers), and his own label names, with routine names in four-letter capitals, and in-routine labels in lower case with three letters and a number. This is the exact same style as in Aviator, which also contains snippets of buried code (see the Aviator deep dive on source code clues for details). I guess he liked this coding layout and didn't feel the need to change it over the intervening years.

It's really interesting to compare this snippet of Sentinel source code with the comparatively unreadable Elite source code, which doesn't bother with things like spaces or indents or consistent labelling (see my Elite source code project to see for yourself). The difference is really illuminating; the Sentinel source code is a lot neater and easier to follow, no doubt about it.

It's also pretty easy to work out where this code is from. In this case, the code at the start of the snippet is from the end of GetPlayerDrain, while the code at the bottom contains the full ResetMeanieScan routine and the start of ScanForMeanieTree. This code implements part of the enemy tactics.

The label names are pretty short, and it's fun to compare them to the labels I invented while disassembling the game (the original source code has never been released, so I had to make up my own for this project). Here's a list:

My labelOriginal source label
pdra5ets6
FlushBufferCFLSH
ResetMeanieScanMINI
enemyMeanieTreeMEANY
enemyFailTargetMEMORY
enemyFailCounterMEANYSCAN
enemyMeanieScanMTRYCNT
ScanForMeanieTreeMEAN
enemyViewingArcCOVER
enemyObjectETEM
viewingObjectStarts with XT

Interestingly, the original source talks about the "meany", but all the game documentation calls it a "meanie", so the spelling got changed between implementation and release. Presumably "MINI" is short for "meany initialise", and calling the viewing arc "COVER" makes sense too. "ETEM" is perhaps a little less obvious for the enemy object number, though.

The second snippet of source code
---------------------------------

The second block of source code in the game binary is rather larger, and can be found in the stripData, tilesAtAltitude, maxAltitude, xTileMaxAltitude and zTileMaxAltitude variables, which between them take up the block of memory from &5A00 to &5BFF once the game has finished loading. These tables are used for storage and don't contain any lookup data, so it doesn't matter what they contain when the game starts, as any content will be overwritten as the code runs. It's likely that these blocks were skipped in the BBC BASIC assembler by incrementing P%, which jumps over the allocated memory while leaving its contents alone.

The buried source code looks like this, once the line numbers have been decoded:

            ...DX ETEM
  5180
  5190      TYA:JSR EMIRTEST:BCC mea2
  5200 
  5210      TYA:STA MEANY,X 
  5220 
  5230      LDA#4:STA OBTYPE,Y
  5240      LDA#104:STA OBHALFSIZEMIN
  5250      CLC:rts
  5251 
  5252.mea2 INC MTRYCNT,X:JMP EEXIT
  5253 
  5260 
  5270.tak5 LDA#128:STA THEEND:JMP EEXIT
  5280 
  5290.TAKE LDX PERSON
  5300      CPX PLAYERINDEX:BNE tak1
  5310      LDA ENERGY:BEQ tak5
  5320      SEC:SBC#1:STA ENERGY
  5330      JSR EDIS
  5340      LDA#5:JSR VIPO
  5350      SEC:JMP tak3
  5360 
  5370 
  5380.tak1 TXA:JSR EMIRPT
  5390 
  5400      LDA OBTYPE,X:BNE tak4
  5410
  5420 \...

Again, it's easy enough to work out where this code is from. In this case, the code at the start of the snippet is from the end of ScanForMeanieTree (which was the last routine in the first snippet above), while the code at the bottom contains the start of DrainObjectEnergy. This code is therefore still part of the enemy tactics.

The label names map to my disassembly like this:

My labelOriginal source label
enemyObjectETEM
CheckObjVisibilityEMIRTEST
enemyMeanieTreeMEANY
objectTypesOBTYPE
minObjWidthOBHALFSIZEMIN
mean5mea2
enemyMeanieScanMTRYCNT
FinishEnemyTacticsEEXIT
dobj1tak5
sentinelHasWonTHEEND
DrainObjectEnergyTAKE
targetObjectPERSON
playerObjectPLAYERINDEX
dobj2tak1
playerEnergyENERGY
UpdateIconsScannerEDIS
MakeSoundVIPO
dobj7tak3
AbortWhenVisibleEMIRPT
dobj3tak4

There are some obvious similarities here - objectTypes sounds a lot like OBTYPE, and playerObject and PLAYERINDEX are clearly related - but I'm not sure I'd work out that the EDIS routine updated the energy icons and scanner row, or that EMIRPT aborts the updating of a visible object if drawing it might corrupt an ongoing pan.

But none of that matters, because these are the original label names that Geoff Crammond himself chose, and that in itself is amazing. This is probably the most literal aspect of software archaeology - digging about in the code to see what gems we can find - and I find it endlessly fascinating to discover artefacts like these from the game's creation. What a privilege...