
NES Programming Guide v 0.4

This document explains the basics of NES programming. It not covers sound
because I lost my sound driver and have not tried anything.

If something seems difficult (especially at the beginning), try to
continue reading, it can be explained later.
It is assumed that you know what are bits, bytes, memory, and such things.
If you don't, better look on a book (that is explained in lots of places),
or wait for a very basic doc (I'll probably write one later).
If, after finished reading, anything is not clear, feel free to mail me.
Please report grammar and ortographic errors too.

Note: Hexadecimal numbers will be denoted as FFh (not $FF) and binary
numbers as 11111111b (not %11111111). Why? I like that way, that's all.
# is "constant". For example, 2007h is a memory address, and #2007h is
only a numerical value. Unknown numbers are x.



Contents:

6502 instruction set	(the only ready by now)
Graphics
Other (memory map, registers, etc)
Getting started




6502 Instruction Set


The NES uses a 6502 as CPU. It has 5 8-bit registers and 1 16-bit register
(do not confuse with the so-called registers in memory for joypads and
graphics).

The 16-bit register is the Program Counter (hereafter PC). It has the
memory address of the next instruction to be executed. In other words, it
points to the next instruction. Since it is only
16 bits long (2 bytes), its maximum addressing range is 64kb, so the longest
program the 6502 can run must be below this size.

The Stack register (hereafter S) holds the address of (aka points to) the
top of the stack. This is a memory zone from 100h to 1FFh where data can be
quickly stored and retrieved. Every time data is "pushed" in the stack,
it is stored in the address to where S is pointing, and S is decreased by 1.
When data is "popped" or "pulled", the data is read and S increased. So,
the first data in enter the stack is the last to come out.

The Processor Status or Flag Register (P) is used to indicate the status of
the last instruction executed. Every bit of this register is a flag that
marks a certain condition. When the bit is 1, we say that flag is set; when
it is 0, the flag is clear. Instructions affect only certain flags. To know
which in detail, consult a 6502 instruction set summary, or a book.
Numbered from bit position 7 to 0 (order does not matter), they are:

	Negative(N): It is set when the bit 7 of the result of the last
	instruction is set (is 1). In integer operations, negative numbers
	have this bit set and positive clear, hence the name.

	Overflow(V): It is set when a carry occurs from bit 6 to 7. In other
	words, when 01xxxxxxb becomes 1xxxxxxxb. For example, when add 1 to
	01111111b (127), the result is (in integer numbers)10000000b (-127),
	so this flag is useful for detect this error. Nevertheless, in games
	the negative numbers are seldom used, so you don't need to worry too
	much about this.

	Break(B): Set when the BRK instruction has been executed.

	Decimal(D): Set when the microprocessor is in Decimal Mode. This is a
	mode where the bytes are cutted in 2 parts, one for a decimal digit,
	making arithmethic with decimal points more exact, but the maximum
	value for a byte then becomes 100 instead of 255, so it is never used
	in games.

	Interrupt(I): This flag allows interrupts when is clear. An interrupt
	is a incoming signal from a peripheral that usually indicates to the
	processor to read data from the peripheral. Set this flag to ignore
	that signal. This is not valid for Non Maskable Interupts (NMI), they
	can't be blocked.

	Zero(Z): It is set when the result of last operation is zero.

	Carry(C): It is set when a carry occurs. In other words, when an add
	operation surpasses 255. For example (now these are natural numbers,
	as usual): add 1 to 255 (11111111b); the result is 0. This flag can
	then be used to do additions in 2 bytes, making the range of values
	wider.
	Another use of this flag is in rotating and shifting operations. The
	bit that is driven out of the byte is stored here.

The accumulator(A) is used for arithmethical and logical operations.

The index register(X) is used for indexed addressing. It means that its value
	will be added to the other address given. This is useful for blocks
	of data: you do a loop which increases (or decreases) X to have
	access to all data, instead of using an instruction for every byte.


Index 2(Y): Used for some limited indexed addressing.

This three registers (A, X and Y) are the "work registers" that can be used
to store data. The others are changed automatically (in most cases).


Now the instructions:

ADC Add with carry. Adds A plus another number, plus the Carry flag, and
	stores the result in A. The other number (source) can be a constant
	(ADC #45), an address (ADC 100h) or an indexed address (ADC 100h,X).
	Since C is added too, you must clear it before with CLC (except when
	you do 2 byte additions).

AND Logical AND. Does bitwise AND with the accumulator and a source (memory
	or inmediate). Logical AND means that result will be true only if the
	two values are true. In other words, the resultant bit will be 1 only
	if both bits are 1; otherwise it will be 0. For example:
		lda #10101010b
		and #00001111b
	stores	     00001010b in A. As you can see, a "mask" can be used to
	force zeros and let remain what you want.This is useful	when using
	a single byte for several purposes.

ASL Arithmethic Shift Left. Shifts bits to left in A or memory. All bits will
	be moved one position to the left. The rightmost position will be
	filled with a zero, and the leftmost will be stored on the Carry Flag.
	Example:
		lda #10011001b
		asl A
	gives:       00110010b, and C will be set. One of its uses is quick
	multiplication by 2 (as shifting decimal numbers to left multiplies
	by 10).

BCC Branch if Carry Clear. Checks the Carry Flag and, if it is set, branches
	to a certain point; otherwise continues down. Branches are used to
	make decisions: if certain flag is set (or clear), the program will
	continue in another point; if not, the program continues as if
	nothing happened. The point where the program (more exactly, the
	Program Counter) will jump is a number with a range of -127 to +127
	bytes of the actual position. Usually you will use a label to mark
	the place where it will jump. An example is some lines below.

BCS Branch if Carry Set. The same, but when C is set.

BEQ Branch if Equal. Branches when the Zero Flag is set. In other words, when
	the operand is zero. It's called "Equal" because, when comparing, Z
	is set if both terms are equal. Example:
		lda bullet_column
		cmp monster_column
		beq monster_dies
	(here code for monster moves)
	monster_dies
		(die sequence: it disappears and the score increases)
	Here, monster_column and bullet_column are variables, stored
	somewhere in memory. But instead of using the address number, we can
	tell first the assembler we will use labels for them. monster_dies is
	a label	for a position where the program will continue.

	In fact, the most common use of BEQ is checking if something is zero.
	For example:
		lda hero_lives
		beq gameover
		(here code for continue playing)
	gameover
		(continue? insert coins)
	It's useful with AND to check status of individual bits.
	This is an example where we suppose the status of button B is in
	bit 1, in a variable called "joypad1":
		lda joypad1
		and #00000010b	;this blows out everything, except B button
		beq B_not_pressed
		(code for fire ball)
	B_not_pressed (more code)

BIT Bit test. Is like AND, but without storing the result in A.

BMI Branch if Minus. Branches when N is set. This means, when the bit 7	is
	set. Is called "minus" because comparisons (which are actually
	substractions) give a negative number when the second term is greater
	than the first.	Negative numbers have the bit 7 set.
	It is useful too to check quickly bit 7.

BNE Branch if Not Equal. Branches when Z is clear (the operand is not 0).
	The contrary of BEQ.

BPL Branch if Plus. Branches when the Negative Flag is clear, that is, when
	the bit 7 of the operand is clear. One of its most common uses is
	to wait for Vblank:
	wait_for_vblank
		lda 2002h	;bit 7 at 2002h is automatically set to 1
				;when Vblank occurs
		bpl wait_for_vblank
		(code for graphics here)

BRK Break. This makes an interruption. Somewhere says that this instruction
	returns one byte after the correct one. Anyway it is rarely used.

BVC Branch if Overflow Clear. Not very useful.

BVS Branch if Overflow Set. Same.

CLC Clear Carry flag. C becomes 0. You must use it before ADC in most cases.
	Used too before rotations to prevent errors.

CLD Clear Decimal mode. The processor is supposed to begin in Binary mode,
	and you will never use Decimal mode, so you can forget this one.
	Nevertheless I have seen is common tu use it at the beginning.

CLI Clear Interrupt disable. Allows interrupts (NMI not included).
	Seldom used.

CLV Clear Overflow flag. Seldom used.

CMP Compare. Compares A with source, for later use of BEQ, BNE, BMI or BPL.
	The comparison is a substraction that does not give result, but
	affects the flags: if both terms are equal, the result is 0; if the
	first is greater, the result is positive; if the second is greater,
	the result is negative.

CPX Compare X. Compares X with source.

CPY Compare Y. Same.

DEC Decrement source. A memory address is decreased 1.

DEX Decrement X. Commonly used in loops:
		ldx 5	;loop 5 times
	loop5	(code)
		dex
		bne loop5
	One useful loop is addressing a block of memory (where a table with
	data can be stored), or clearing it:
		ldx 50h		;the block is 50h bytes
		lda #0		;all bytes will be 0
	memclear sta block_address,X
		dex
		bne memclear

DEY Decrement Y. Same.

EOR Exclusive OR. Bitwise exclusive OR with A and source.
	This gives 1 only is the two bits are different; if not,
	the result is 0. Example:
		lda #11001100b
		eor #11110000b
	stores       00111100b in A.

INC Increment source. A memory address is increased 1.

INX Increment X. Especially useful for loops with graphics. Since the
	address at PPU is automatically incremented, we can't use DEX,
	or the data we put there will be inverted:
		lda #20h	;address of name table
		sta 2006h
		lda #0Ch
		sta 2006h
		ldx #0
	loopgfx lda mem_block,X
		sta 2007h
		inx
		cpx #20h	;we are coying 20h bytes
		bne loopgfx
	As you can see, the use of INX involves a CPX instruction, so
	loops use usually DEX.

INY Increment Y. Same.

JMP Jump. The program jumps to a label and continues from there.

JSR Jump to Subroutine. Is a jump too, but the PC is saved in the stack
	before, so when the code of the subroutine ends, with RTS you return
	where the subroutine was "called". This makes possible to call the
	same subroutine from many points in the program. But often the
	subroutine will need parameters for its actions. They can be passed
	in the registers, memory, or manipulating the stack.
	Look this example, which uses the registers (the easiest way):
		ldx #5		;X will be the enemies number (5 enemies)
	loop_enemies
		jsr enemy_actions
		dex
		bne loop_enemies
		(rest of program)

	enemy_actions
		(here code to approach hero, throw knifes, etc)
		rts

	With this you can control 5 enemies with the same code.

LDA Load Accumulator. Stores in A the value of a constant or a memory address.
	Can be used this way:
		lda #7	This stores in A the value 7
		lda 7	This stores in A the value in the memory address 7
		lda 7,X	This stores in A the value in the memory address plus
	X. This means that if X is 0, A will have the value in 7; if X is 5,
	A will have the value of the memory address 12 (0Bh).

LDX Load X. The same, but obviously indexed addresing can't be used.

LDY Load Y. Same.

LSR Logical Shift Right. Shifts to right A or a memory address. The leftmost
	bit is filled with 0 and the rightmost goes to C. Example:
		lda #01100110b
		lsr A
	stores       00110011b in A, and the Carry Flag will be clear.
	One of its uses is for quick divisions by 2.

NOP No Operation. This does nothing. Can be used for delaying loops
	(sometimes we need less speed), but in that cases is preferable
	to use a slower operation, as this is one of the fastests.

ORA Logical Inclusive OR. Bitwise OR with A and a source. The result will be
	false only if both bits are 0; otherwise it will be 1. Example:
		lda #11001100b
		ora #10101010b
	stores       11101110b in A. This can be used to force bits to 1
	(as AND is used to force 0). For example, suppose we have the
	variable hero_status to indicate if he is jumping with the bit 5,
	and if he is dead with the bit 2:
		(code to decide when he'll die)
		lda hero_status
		ora #00000100b		;set die bit
		sta hero_status
	Now the jump:
		(code to know if button A is pressed)
		lda hero_status
		ora #00100000b		;we force bit 5 to 1, without
		sta hero_status		;touching the other bits
		(code to jump)
	Now, suppose he has finished his jump:
		(code to know when finish jumping)
		lda hero_status
		and #11011111b		;force bit 5 to 0, without
		sta hero_status		;touching other bits
	As you can see, with ORA, 0 lefts bits unchanged and 1 forces 1; with
	AND, 1 lefts bits unchanged and 0 forces 0.

PHA Push accumulator. Stores A in the stack.

PHP Push the Processor Status Register in the stack. Stores P (aka flags) in
	the stack.

PLA Pull Accumulator. Stores in A the value in the top of the stack.

PLP Pull Processor Status Register.  Stores in the flags the value in the
	top of the stack.

ROL Rotate Left. Rotates the bits in A or source one position to the left.
	The rightmost bit is filled with the value of the Carry Flag, and
	the left most bit is sent to the Carry Flag. Example:
		clc		;C will be 0
		lda #10010111b
		rol A
	(A is now    00101110b, and C is set to 1)
		rol A
	(A is now    01011101b, and C is 0)
		rol A
	(A is now    10111010b and C is 0)
		rol A
	(A is now    01110100b and C is 1)
	This is useful to fill a byte when a flow of one bit cames from
	somehwhere. Especifically, from the joypads.

RTI Return from Interrupt. After the interrupt code is executed, this
	returns to where the program was left restoring the PC and the
	flags that were pushed on the stack when the interrupt began.

RTS Return from Subroutine. This returns from a subroutine to the next
	instruction to where it was called, pulling PC from the stack.

SBC Substract with Carry. Stores in A the result of substracting source
	from A. In a book there was written that the value of C is
	irrelevant here, but anyway is preferable clear it before.

SEC Set Carry flag. Sets C to 1.

SED Set Decimal mode. You won't need this.

SEI Set Interrupt disable. When the Interrup t disable flag is set, the
	processor ignores interrupt signals. NMI are not included, they
	will continue working. As far as I know, the only interrupt in
	the NES is one that executes every Vblank, but this is a Non
	Maskable Interrupt (NMI), which can't be affected by this flag.
	It is enabled or disabled with a register in 2000h.
	Nevertheless seems is common to use this at the beginning.

STA Store Accumulator. Stores A in memory. Indexed addressing can be used.

STX Store X. Stores X in memory.

STY Store Y. Same.

TAX Transfer A to X. Stores A in X. Store instructions don't allows transfer
	of data between registers, so the existence of the Transfer ones.

TAY Transfer A to Y.

TSX Transfer S to X. Stores in X the value of the Stack pointer, to know
	where it is.

TXA Transfer X to A.

TXS Transfer X to S. Stores in S the value of X. This must be used always
	at the beginning of a program to use the stack, because it doesn't
	begin to work automatically. If you forget this, you won't be able
	to use subroutines nor interrupts. This way:
		ldx #0FFh
		txs
	This instruction can be used too to manipulate the stack, but that
	is quite complex.

TYX Transfer Y to A.

Those are all the instructions available.









NES Graphics

To be written.













Credits



Author: Bokudono (hebhassan@hotmail.com).
	Waseiyakusha (http://members.xoom.com/has_san/indexe.html)

Thanks to:


YOSHi		for his document. This document (based on Marat Fayzullin's)
		is the base of all NES emulation development.

Tony Young	With his page and demo I began to understand YOSHi's document
		(http://members.aol.com/TYoung79/nesprog.html)

Jonathan Bowen	for his summary of the 6502 instruction set


