May 11, 2025

Sun 3/60 - My tricks for working with the disassembly

I start with a "bin" file -- a binary image of the bootrom. I have a program "wrap.c" that turns this into an elf file. Then I use objdump from the m68k toolchain (m68k-linux-gnu-objdump) to do the disassembly. Doing this from and elf file rather than just from the binary means we get the addresses right, which is a good enough reason to do things this way.

odx

Even before wrapping the bin file and disassembling, I usually run my own hex dumping utility "odx". It has this name because I once used "od -x" to do this. You will still find "od" (octal dump) on linux systems, but they changed its behavior and options. I never figured out how to make the new version do what the old did (that I liked so much), so I wrote my own replacement in ruby. I also added an ascii dump alongside and to the right ofthe hex, which makes it much nicer than the original "od".

I usually do "odx rom.bin > rom.odx" then use "vi" to examine the generated output, which looks like this:

0000ec10 2e0a 0020 2000 2020 2563 0a00 2020 0020   .       %c
0000ec20 2025 630a 000a 4272 6f6f 6b74 7265 6520    %c   Brooktree
0000ec30 4441 4320 2843 6f6c 6f72 204d 6170 2920   DAC (Color Map)
0000ec40 5465 7374 733a 2020 2827 7127 2066 6f72   Tests:  ('q' for
Note that odx always shows the bytes in file order, it does not byte swap the 16 bit values shown above. So if the file holds big endian words or longs, all is well. And for strings, as the above, we get the strings in proper order. But it the file holds little endian words or longs we will get them in little endian order and will probably want to use some other tool to swap bytes and display any large sections of these.

I should make one enhancement to odx. It currently always generates the dump addresses beginning at address 0. It is necessary (and sometimes error prone) to add the true base address to these when you want to relate them to the disassembly which does have proper addresses.

dump32

This is a program, written in C, that dumps the entire bin file as a sequence of 32 bit values. Since the m68k is big endian, and I do the analysis on a i386 machine, I do appropriate byte swapping. I chop sections out of this when I encounter blocks of constants (typically jump tables) in the code.

romtags

This is a script, written in Python, that generates a tags file. I use vim, which knows how to utilize a tags file. The idea here is that I can now place the cursor on an address (such as a subroutine call to some address) and vim will take me to the location where that subroutine is defined! This is a great time saver. It also works for references to constants and such.

Vim and redisv

I made two additions to my .vimrc file. The first one helps me out when I encounter places where the disassembler got "out of alignment" and spit out a bunch of nonsense.
Here is what I added to my .vimrc:
:let mapleader = "-"
" special hooks for sun3/80 disassembly
" hook for redoing disassembly
nnoremap d :.!./redisv
nnoremap s :.!./vim_string
We will talk about "vim_string" later. What this lets me do is to type the two character sequence -d (where d is for "disassemble) and it will run the external help program "redisv" (written in python) to use objdump to do the disassembly. This sounds ridiculously inefficient, but it only happens when you push a button and takes less than a second, so who cares.
Here is how it works. Suppose I see this:
 fef567c:   4e75            rts

 fef567e:   0000 4e56       orib #86,%d0
 fef5682:   fffc            .short 0xfffc
 fef5684:   2e87            movel %d7,%sp@
I put the cursor on the second line (with orib), type -d, and I get this:
 fef567c:   4e75            rts

 XXX
fef567e:    0000 4e56       orib #86,%d0
 fef5680:   4e56 fffc       linkw %fp,#-4
XXX
 fef5682:   fffc            .short 0xfffc
 fef5684:   2e87            movel %d7,%sp@
The original line is preserved for my study, but then a single line of disassembly at that address plus 2 is inserted. The XXX sandwich both the before and after. I then edit this to look like so:
 fef567c:   4e75            rts

 fef5680:   4e56 fffc       linkw %fp,#-4
 fef5684:   2e87            movel %d7,%sp@

Vim and vim_string

This is triggered by -s rather than -d within vim. It has both a python script and a C program to help it do its thing.
As an example, suppose you see these lines:
 fef7f5c:    4879 0fef e738  pea 0xfefe738
 fef7f62:   6100 1b76       bsrw 0xfef9ada
You are pretty sure the address in the "pea" is referencing a string. You know, either because you used the ctags thing to go there and look at it and recognize ascii hex values, or you know that the subroutine call following is printf or puts or some such.

So you put the cursor on the first line, type -s and you get:

fef7f5c:    4879 0fef e738  pea 0xfefe738
; 0xfefe738 AMD Ethernet
 fef7f62:   6100 1b76       bsrw 0xfef9ada
Sometimes it comes back with a "Sorry" response which you just have to delete. This only gives you the first line of multiline strings and could benefit from further improvement, but it is a big step in the right direction.

Dumping large blocks of strings with xstrings

The xstrings program is the helper program (written in C) used by vim_string and the "-s" vim trick above. It can also be used to dump blocks of strings.

When I use "odx" to dump the ROM binary, it is easy to identify large blocks of strings. I can determine their start and end addresses from the odx output, then use vim_string to dump the blocks, and then hand edit that into the disassembly file replacing the garbled disassembly.

Here is an example. The 3/60 dump shows strings from e738 to f480. Translating this to actual rom addresses, this is 0fef_e738 to 0fef_f480. Given these addresses, edit a line in vim_string.c like so:

batch ( 0x0fefe738, 0x0feff480 );
Recompile, then invoke vim_string without arguments:
./xstrings
0fefe738: AMD Ethernet
0fefe745: w - Wr/Rd CSR1 Reg Test\n ++
0fefe75d: l - Local Loopback Test\n ++
0fefe775: x - External Loopback Test
.....
.....
I do this: ./xstrings > zzz and then copy zzz into rom.dis

Replacing garbled disassembly with strings does a great deal towards reducing the bulk (and useless noise) in the disassembly. There is some risk of including things like 32 bit constants in big blocks of disassembled data like this. So far this has not proven to be a problem. The linker seems to have put all the strings at the end of the ROM image (or most of them anyway). If this does happen, it can be sorted out later -- I always keep a copy of the naive disassembly available for cases like this.

Watch for big groups of "...." as xstrings prints a "." when it encounters a byte with no printable ascii to represent it.

hexfix

Before I ran hexfix, I removed a leading space from all the line addresses, so " fef"... becomes "fef...". This means I also have to modify and rerun romtags. Mostly this makes it sane to use diff on the file before and after hexfix.

Hexfix is a python program that cleans up certain issues with the objdump disassembly.

To run hexfix, I use this command "./hexfix rom.dis > zzz" I verify that the new file has the same number of lines as the original and use diff to examine the changes and make sure things aren't going crazy.

The sorts of changes it does are as follows. Sadly, objdump thinks it is sensible to display all kinds of things I would like to see as hex as binary. Sometimes I can see what I want from the opcodes themselves, but it is just easier and less error prone to let hexfix do the work. < fef0130: 0287 ffff 0000 andil #-65536,%d7 < fef0136: 0c87 7777 0000 cmpil #2004287488,%d7 --- > fef0130: 0287 ffff 0000 andil #0xffff0000,%d7 > fef0136: 0c87 7777 0000 cmpil #0x77770000,%d7

Ideas for the future

Many of these things could be automated even further. One approach to the problem of the disassembler getting out of alignment is to work up a call list and disassemble in known sections. I did this once for the ESP8266 rom and it worked out well. At some point this could turn into a project of working up a full blown fancy disassembler -- but I don't want this project to spawn another project. Maybe someday.
Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org