Writing 6502 Disassembler in Rust
In this post, I am talking about writing a disassembler for MOS 6502 microprocessor. It is a very first step to building a sophisticated emulator such as NES or even Apple II.
It is recommended to read about implementing CHIP-8 interpreter first. If you are not familiar with it, so I can wait, just go and read, then come back.
Little-endian memory layout
One of the most important things you need to know before building anything with MOS 6502 is: It is a little-endian 8-bit microprocessor. So what do it mean?
The NES Hacker has an excellent explanation about how 6502 stores memory.
The table below tells us the difference between Little-Endian and Big-Endian:
Hex Value | Byte Value | |
---|---|---|
Little-Endian | $1000 | 00 10 |
Big-Endian | $1000 | 10 00 |
It is the assembler's job to take care of the order of memory addresses and values to make sure they are in Little-Endian. However, you are building a disassembler, so you need to aware of this layout as well.
For example, this is the instruction to load memory from address $0200
using assembly:
LDA $0200
The compiled machine code should be:
AD 00 02
The high and low bytes of the memory address has been swapped because the 6502 using Little-Endian.
MOS 6502 Opcode Structure
Every instruction starts with an Opcode, which is a 3 characters Assembly code, following with an Operand (the data).
For example:
JMP $1000
The code above is an instruction to move the program counter to address $1000
.
When compiled to machine code, an Opcode always takes 1 byte. For example, the code above compiles to machine code as:
4C 00 10
The number of Operands depends on the Opcode, so when reading or disassembling, we need to increase the program counter based on the Opcode.
During the implementation, you need to constantly refer to the 6502 Instruction Set. This document aggregates the Opcode based on its High and Low nibbles.
To read the nibbles, we use the bitwise AND
operator:
let opcode = 0xA9;
let low = opcode & 0x000F; // you get "9"
let high = (opcode & 0x00F0) >> 4; // you get "A"
Reading binary file in Rust
So let's start building a 6502 disassembler in Rust, the first step is reading the ROM file.
Since it is a binary file, we can read the whole file as a Vector of bytes (Vec<u8>
):
use std::io::prelude::*;
use std::fs::File;
fn main() {
let mut f = File::open("code.bin").unwrap();
let mut program = Vec::new();
let length = f.read_to_end(&mut program).unwrap();
...
}
Remember that we use unwrap()
here just because we are prototyping, it must be handled properly in production.
We open the file with std::fs::File::open()
function then read the whole file with std::fs::File::read_to_end()
function. The result of this function is the length of the input program.
Parsing Opcode from byte vector
Now we have the byte vector program
. We need to define a program counter which tells us where we are in the code.
let mut pc = 0;
The next is scanning the program and parsing the Opcode:
while pc < length {
let low = program[pc] & 0x000F;
let high = (program[pc] & 0x00F0) >> 4;
...
}
For each iteration, use the simple if
statement to parse the Opcode as well as its Operands, for example, A9
is the denoted of LDA #
, which has one operand in the next byte. So we parse it with this code:
if high == 0xA && low == 0x9 {
let param = program[pc + 1];
println!("LDA #${:02X}", param);
pc += 2;
}
The LDA #
instruction takes up to two bytes, so at the end of the if
statement, we increase the program counter by 2.
Use match to write code faster
Repeatedly writing if
statements is not fun, let's make it better by using match
on a tuple of (high, low)
:
match (high, low) {
(0xA, 0x9) => {
let param = program[pc + 1];
println!("\t\t LDA #${:02X}", param);
pc += 2;
},
(0x8, 0xD) => {
let params = (program[pc + 1], program[pc + 2]);
println!("\t\t STA ${:02X}{:02X}", params.1, params.0);
pc += 3;
},
(_, _) => {}
}
What's next?
From now, you already know the format of 6502 programs, and how to read them, you can follow the instruction set to finish your disassembler.
The next step is to use the same code logic as the disassembler to implement the actual behavior of 6502 microprocessors Fetch-Decode-Execute cycle. We will talk about it in another article.
I hope you enjoy the post. Any feedback would be greatly appreciated!
Edit:
Thanks to the comments of @latrasis and @vopi181 on Reddit, I fixed the typo in the title and removed the macro
part because using match
is better.