So What Software

Tag: TextEdit

Chapter 3: Disassembly, decoding OMF2 & ExpressLoad

by on Apr.26, 2009, under Programming

Although you probably want to create programs anew, and I’m with you there, knowing how to disassemble code is really important today with such an old system.

Much of the wisdom and the “knowing of things” is lost to time, it’s amazing how much gets discarded. The only real record is in the actual code itself, but you have to decode and decipher to regain that lost wisdom. Even with the most complete documentation library, many things are not still clear and having a good old working example goes miles to filling in those little what the hey’s.

Disassembly is the practice of decoding binary file images and producing a sourcefile. You could do this by hand, reading byte for byte and looking up the meaning of the byte as interpreted at the particular location in the code and translating that to the proper assembly syntax, and of course that would take “deep time” as the cosmologists would say.

A quicker way is to use a disassembler.

These come in all flavors, from the mundane to the sublime. I think I used to have a beta of the ORCA disassembler but that’s long gone, all I have left is the DISASM.data listing from it. I looked around the Internet and the only thing I could find was a Windows based disassembler for the 65816 called D816.

Now, D816 don’t know anything about Apple firmware, software and the like. It’s a generic 65816 disassembler knowing only the microprocessor opcodes and modes but we’re getting ahead of ourselves a bit here.

Let’s backup a notch and deal with the file some more.

The file you want to disassemble will be a ProDOS 8 binary file or relocatable OMF type. Determining which is which will be vital to successful disassembly.

A ProDOS8 binary is filetype BIN($06) but relocatables can be of several filetypes:

GS/OS or ProDOS 16 application S16($B3)
Shell application EXE($B5)
Permanent initialization PIF($B6)
Temporary initialization TIF($B7)
New desk accessory NDA($B8)
Classic desk accessory CDA($B9)
Tool set files TOL($BA)
Apple IIgs device drivers DVR($BB)
Generic loadfile LDF($BC)
GS/OS file system translator FST($BD)

If you want to disassemble a ProDOS8 BIN($06) file then you can go straight to the disassembler, but if you want to do a relocatable filetype (most likely) you need to do some pre decoding, or de-OMFing.

Oh yeah, OMF stands for Object Module Format which is basically how relocatable code it’s still done today, only the specifics have changed, the principle remains the same. All computer code must run at a specific location, it’s ORG. In short, code needs to be relocatable when a robot such as the Toolboxes Memory Manager is in control. It relentlessly works to dole out and keep track of blocks of memory in the IIgs usually at the next most convenient location. It also compacts memory (move things around to create larger available spaces).

Because it chooses where a program will be initially loaded, the code needs to be installed at and re-written for that location.

Because it moves blocks of code around, the code in them has to be re-written to operate correctly at their new location.

That’s why a relocation scheme is necessary, in the case of the AppleIIgs that’s OMF2.

Many of these filetypes can be stripped of OMF segmentation by using the MAKEBIN utility in APW as long as you change the filetype to EXE($B5). If they are multi-segment or  ExpressLoad type however, this trick will not work and MAKEBIN will let you know. One thing about MAKEBIN is that it will ORG the code at $2000 instead of the normal disassembler ORG of $0. Remember this difference for later on.

In tough cases what you need to do is a little hex editing of the file to remove and expand the necessary parts of the code. Many applications use just a few segment types while a few can use the entire list. You need to learn how to read OMF, it’s not that bad but you will need a reference with the OMF segments and headers described. I know this documentation exists in the GS/OS Reference Manual and the APW Reference Manual. It may also appear elsewhere.

Figure#1

Figure#3.1

Figure#3.1 shows two HexEdit windows where the left one contains a straight OMF2 file and the right one contains an ExpressLoad OMF type of file. The most noticeable difference is that the ExpressLoad header is bigger… it even has the word ExpressLoad in it’s header. De-OMFing either is done the same, it’s just that the header is different, the segment codes however are exactly the same.

I have highlighted the headers in blue for clarity this is Photoshop magic, although HexEdit can do this, it looks better when dolled up in Photoshop for web publication.

Figure#2

Figure#3.2

We’ll do the easier of the two by doing the OMF2 file on the left, CB.PRELAUNCH. After loading this file into HexEdit I open up a TextEdit window and make some zeroes… several lines of 32 zeroes each this corresponds to each line having 16 bytes ($10). These will be used for type $F1 (DS) segments. See Figure#3.2.
.
.
.
.
.

Figure#3

Figure#3.3

Now that I am setup for decoding OMF lets take a look at exactly what we are up against. Figure#3.3 shows CB.PRELAUNCH with all important regions color coded. Blue is the header and trailing zeroes, Green is LCONST segment headers, Orange is DS segment headers, Purple is the relocation dictionary and yellow is static code bytes (what we are after here). I picked this file because it is simple and also because it will fit on one screen, most programs are a bit bigger than this one.

The key skill involved with this in my opinion is “pattern recognition”, much like as is dramatized in the movie “A Beautiful Mind”. Just remember to never interact with those imaginary people!

The first 64($40) bytes are the header, it has a lot of 00 and 20′s (spaces) all that information means something but for our purposes only one bit is of interest. 8 bytes in there is the pattern “7C 02 00 00″. This is the finished length of the file contained, make a note of this. If you are new to the IIgs, numbers are stored in memory in reverse order what is really being represented is a 4 byte value (long) “00 00 02 7C” or just plain “27C” ignoring the leading zeroes. That’s a number represented in hexidecimal and after a quick tap on the calculator the file will be 636 decimal bytes long. You could of course do this conversion in your head if you were painfully smart. Personaly, I need my calculator set to programmer.

Figure#3.4

Figure#3.4

After the header there is another pattern “F2 A2 01 00 00″… a five byte pattern. This is an LCONST($F2) segment header. Once again make note of the last four bytes, they are the length of bytes to follow that contain static program code.  Here we go… Select the first 69($45) bytes which is the header and this segment header and delete them!

Now, index in $1A2 bytes and you will find another pattern: “F1 9A 00 00 00″ immediately followed by another pattern “F2 40 00 00 00″. This is a DS segment header followed by another LCONST segment header. The DS header says to insert $9A(154) zeroes at that point in the code. Go to the TextEdit window with the zeroes and highlight $9A of them, hit

Figure#3.5

Figure#3.5

copy and then highlight both of these headers in HexEdit (10 bytes) and hit paste. See Figure#3.4 and Figure#3.5.

Now we’re nearly there, see… I told you it would not be real hard. The last thing to do because we are all out of construction segments is to highlight and remove the relocation dictionary and trailing spaces Figure#3.6. On most code, that’s all there is to it except for their being more DS followed by LCONST headers in the code, do them one by one in sequence and you will not have any problems.

.

Figure#3.6

Figure#3.6

INTERSEG headers can appear when the program is broken down into multiple load segments where each segment has it’s own relocation dictionary, you will come across INTERSEG headers and special relocation headers which have to be dealt with differently. Read up about these segments and it will all become clear. Now you have a properly constructed binary image of the file ORGed at $0… perfect! Save this away to disk as CBPRE.BIN and get ready to stick it in the disassembler.

The next chapter will be a continuation of this chapter but it will actually deal with authentic disassembly, the meat and potatoes of the process, until then try this out a bit and get a feel for relocatable architecture.

As a side note here, if you have the ORCA disassembler then you really don’t need to do any of this stuff because ORCA knows how to read OMF along with a bunch of other stuff and does all this cutting and pasting for you automatically. I just though it would be good background on what’s really happening when you load a relocatable file by telling you this at the end of this section rather than before where you might tend to skip over this vital information.

Next Post: Chapter 3 (cont.): Actual disassembly

Comments Off :, , , , , , , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...

Archives

All entries, chronologically...