March 1, 2020
Assembly language basics
I'll note up front that I speak about many things in broad general terms.
People can quibble or split hairs over almost every statement I make below,
but rather than anticipate and address all those minor issues, I just go ahead
and make things simple for the student.
We can talk about 4 sorts of computer languages.
- Machine language
- Assembly language
- Low level Compiled languages
- Higher level compiled languages
The last two of these emphasize a distinction that I find useful.
The C language is the only low level compiled language I use and talk about.
Someone who understands C can "see through" the language and have a very good
idea about what the assembly language code generated will look like for each statement.
Higher level compiled languages (and/or interpreted languages) may implement higher
level abstractions (like objects, lists, associative arrays) that don't have a
simple mapping to assembly code.
But we are getting ahead of ourselves. A computer chip executes machine language.
These are bit patterns stored in memory that tell the chip what to do.
It is rare to need to even think about machine language (but it is sometimes important).
Instead most people deal with assembly language which uses symbols and word-like
respresentations of machine language. The important thing is that each line of
assembly language maps into a single machine language instruction.
Reverse engineering versus writing assembly language
Writing assembly language can be quite painful. This is one place where a debugger
can truly be useful. The usual problem (or a common problem anyway) is not understanding
some nuance of how an instruction works. So you have a blind spot, and until you can
single step along and look at registers, you will never confront the truth.
Given any option, I will avoid writing assembly language code.
Reverse engineering (looking at disassembled code) is a different game.
Here when things don't make sense, you can remind yourself, "but wait, this code works!".
The mental process is quite different and often more enjoyable.
Understanding the machine
The main reason to learn assembly language is that you actually learn how the machine
works. The reason used to be that you learned assembly language because you could write
code in assembly that would run faster than any compiled code, but this is no longer true.
So the primary motivation for learning assembly language is to become "one with the machine".
Rather than writing code in a compiled language and working somewhat with a black box,
assembly language lets you get hands on with how computers really work.
Registers and instructions
To understand assembly language, you have to have a picture in your head of the machine at
the assembly language level. By and large this consists of understanding registers and
instructions. Virtually every chip you will encounter has a small group of super fast
storage cells called registers. These may have names like A, B, C, or they may have names
like r0 through r15. Some chips have more and some chips have fewer. Some registers have
special purposes or restrictions, some are general registers and you have more than one
because more is better. I am avoiding talking about specific processors at this point.
There is almost always a special register called the PC. This is the program counter and
it points to the instruction currently being executed (or the next instruction to be
executed). A processor reset forces some value (often 0) into this register, and when
the processor starts running, it will read an instruction from memory from this address
and execute it. If nothing surprising happens, it will bump the PC by some amount that
causes it to point to (and fetch) the next instruction to be executed, and away you go.
Many processors these days have a load/store design. To work on data you have to load
it from memory into a register. When you have a result you want to save, you have to
store the value from a register into memory. Once you have your data in registers,
you can do things like add two registers and put the result in another register,
sometimes even replacing one of the things you just added together.
You only have a handful of registers, so you can only keep the things you are currently
working on in registers. A large part of writing good code is keeping things in registers
and avoiding needless loads and stores.
Jumping and branching
These are the special things that cause some disruption in the normal business of the PC
moving smoothly along through memory fetching one instruction after another.
The simplest disruption is a jump, sometimes called an "unconditional jump".
Here the instruction says to load some fixed value into the PC which causes the
next instruction to be fetched from that address, with no questions asked.
Sometimes the question is whether to jump or just keep going. This is where some kind
of branch instruction comes in. The processor has some way to test the contents of
a register and decide whether to jump or keep on. A simple example would be to jump
if a register is non-zero and keep going if it is zero.
And there are always weird and special instructions that have to be considered on a
one at a time basis. But enough of this general discussion, we need to look at some
concrete examples with some real processor.
Feedback? Questions?
Drop me a line!
Tom's Computer Info / tom@mmto.org