March 1, 2020

Assembly language basics

I'll note up front that I speak about many things in broad general terms. People can quibble or split hairs over almost every statement I make below, but rather than anticipate and address all those minor issues, I just go ahead and make things simple for the student.

We can talk about 4 sorts of computer languages.

The last two of these emphasize a distinction that I find useful. The C language is the only low level compiled language I use and talk about. Someone who understands C can "see through" the language and have a very good idea about what the assembly language code generated will look like for each statement. Higher level compiled languages (and/or interpreted languages) may implement higher level abstractions (like objects, lists, associative arrays) that don't have a simple mapping to assembly code.

But we are getting ahead of ourselves. A computer chip executes machine language. These are bit patterns stored in memory that tell the chip what to do. It is rare to need to even think about machine language (but it is sometimes important). Instead most people deal with assembly language which uses symbols and word-like respresentations of machine language. The important thing is that each line of assembly language maps into a single machine language instruction.

Reverse engineering versus writing assembly language

Writing assembly language can be quite painful. This is one place where a debugger can truly be useful. The usual problem (or a common problem anyway) is not understanding some nuance of how an instruction works. So you have a blind spot, and until you can single step along and look at registers, you will never confront the truth. Given any option, I will avoid writing assembly language code.

Reverse engineering (looking at disassembled code) is a different game. Here when things don't make sense, you can remind yourself, "but wait, this code works!". The mental process is quite different and often more enjoyable.

Understanding the machine

The main reason to learn assembly language is that you actually learn how the machine works. The reason used to be that you learned assembly language because you could write code in assembly that would run faster than any compiled code, but this is no longer true. So the primary motivation for learning assembly language is to become "one with the machine". Rather than writing code in a compiled language and working somewhat with a black box, assembly language lets you get hands on with how computers really work.

Registers and instructions

To understand assembly language, you have to have a picture in your head of the machine at the assembly language level. By and large this consists of understanding registers and instructions. Virtually every chip you will encounter has a small group of super fast storage cells called registers. These may have names like A, B, C, or they may have names like r0 through r15. Some chips have more and some chips have fewer. Some registers have special purposes or restrictions, some are general registers and you have more than one because more is better. I am avoiding talking about specific processors at this point.

There is almost always a special register called the PC. This is the program counter and it points to the instruction currently being executed (or the next instruction to be executed). A processor reset forces some value (often 0) into this register, and when the processor starts running, it will read an instruction from memory from this address and execute it. If nothing surprising happens, it will bump the PC by some amount that causes it to point to (and fetch) the next instruction to be executed, and away you go.

Many processors these days have a load/store design. To work on data you have to load it from memory into a register. When you have a result you want to save, you have to store the value from a register into memory. Once you have your data in registers, you can do things like add two registers and put the result in another register, sometimes even replacing one of the things you just added together.

You only have a handful of registers, so you can only keep the things you are currently working on in registers. A large part of writing good code is keeping things in registers and avoiding needless loads and stores.

Jumping and branching

These are the special things that cause some disruption in the normal business of the PC moving smoothly along through memory fetching one instruction after another. The simplest disruption is a jump, sometimes called an "unconditional jump". Here the instruction says to load some fixed value into the PC which causes the next instruction to be fetched from that address, with no questions asked.

Sometimes the question is whether to jump or just keep going. This is where some kind of branch instruction comes in. The processor has some way to test the contents of a register and decide whether to jump or keep on. A simple example would be to jump if a register is non-zero and keep going if it is zero.

And there are always weird and special instructions that have to be considered on a one at a time basis. But enough of this general discussion, we need to look at some concrete examples with some real processor.


Feedback? Questions? Drop me a line!

Tom's Computer Info / tom@mmto.org