Overscore

Overscore is a from-scratch CPU emulator, instruction set, assembler, and (in the future) high-level (C, Zig, etc) language. It's an entirely separate computer ecosystem, starting from the ground up.

CPU

The CPU is a simple 32-bit CPU with 8 bit minimum addressable size and 8 instructions. The architecture size and minimum addressable size are extensively configurable and modifying the values in-code should not break any "hardware".

All address space/memory is in one rwx (read-write-execute) block, including registers and the instruction counter (address 0). This allows jumps to be implemented with arithmetic on the instruction counter at address 0.

Instruction Set

Name	Opcode	Description	Total size (bytes)	C-like equivalent
`set`	0001	Set immediate	9	`a = <constant>`
`mov`	0010	Move	9	`a = b`
`not`	0011	Not	9	`a = ~b`
`and`	0100	And	9	`a = a & b`
`add`	0101	Add	9	`a = a + b`
`irm`	0110	Indirect reading move	9	`a = *b`
`iwm`	0111	Indirect writing move	9	`*a = b`
`sys`	1000	System instruction	9	`a = sys(b)`*

*where sys is a standard IO function

Note: C-like equivalents address with variables for simplicity; a more accurate representation for e.g. mov might be mem[a] = mem[b].

The instruction set is intentionally relatively limited to maintain overall simplicity. It could be simpler, but this has been intentionally avoided as to prevent the CPU from becoming a Turing tarpit.

A null byte (opcode zero) exits. Otherwise, invalid opcodes always error. Addressing invalid memory always errors (errors are defined as emulator exits).

Instruction size (in bytes) is calculated as 1 + 4 * words. Again, this is extensively configurable; a more general equation is (UnitSize + WordSize * words) / 8.

All instructions currently accept two arguments and are thus all the same size.

Set immediate

Name	Total size (bytes)	Data	...
`set`	9	0001	`write` (1 word)	`read` (1 word)

Sets the value at write to the immediate (i.e., specified inline) read. This treats read as a value and not a register.

Move

Name	Total size (bytes)	Data	...
`mov`	9	0010	`write` (1 word)	`read` (1 word)

Sets the value of write to the value of read.

Not

Name	Total size (bytes)	Data	...
`not`	9	0011	`write` (1 word)	`read` (1 word)

Sets the value of write to the binary complement of the value of read.

And

Name	Total size (bytes)	Data	...
`and`	9	0100	`write` (1 word)	`read` (1 word)

Sets the value of write to the binary AND of the values of read and write.

Add

Name	Total size (bytes)	Data	...
`and`	9	0101	`write` (1 word)	`read` (1 word)

Sets the value of write to the wrapping unsigned sum of the values of read and write.

Indirect reading move

Name	Total size (bytes)	Data	...
`irm`	9	0110	`write` (1 word)	`read` (1 word)

Sets the value of write to the value of the value of read. This is equivalent to mov except instead of the value of read being interpreted as a word, the value of read is treated as an address and the value of this address is treated as a word. This is similar to having dereferenced the right argument.

Indirect writing move

Name	Total size (bytes)	Data	...
`iwm`	9	0111	`write` (1 word)	`read` (1 word)

Sets the value of the value of write to the value of read. This is equivalent to mov except instead of the value of write being set, the value of write is interpreted as an address and the value of this address is written to instead. This is similar to having dereferenced the left argument.

System instruction

Name	Total size (bytes)	Data	...
`sys`	9	1000	`write` (1 word)	`read` (1 word)

Performs some arbitrary syscall. This is essentially a hardware/operating system defined call, and is the only standard way of communicating with the environment. Typically it might not be optimal to bottleneck all communication through a single limited operation, but this is not the typical CPU.

Assembly

The assembler is a simple text-based way to write machine code, with a few higher-level features.

Lines are processed and written to the output binary in the order they were written in the file.

Comments are written with //. Semicolons are not supported as comments and will cause an assembler error.

Here's an example program. The next few sections will help with understanding it.

raw Main

label Main
    set 100 AA
    not 104 100
    and 100 104
    end

Line Syntax

Statements

Name	Syntax	Output size
Raw	`raw <WORD>`	4 bytes
Label	`label <NAME>`	0 bytes
Bytes	`bytes <BYTES>`	variable
End	`end`	1 byte

Instructions

Name	Syntax	Output size
`set`	`set <ADDR> <WORD>`	9 bytes
`mov`	`mov <ADDR> <ADDR>`	9 bytes
`not`	`not <ADDR> <ADDR>`	9 bytes
`and`	`and <ADDR> <ADDR>`	9 bytes
`add`	`add <ADDR> <ADDR>`	9 bytes
`irm`	`irm <ADDR> <ADDR>`	9 bytes
`iwm`	`iwm <ADDR> <ADDR>`	9 bytes
`sys`	`sys <ADDR> <ADDR>`	9 bytes

Instructions

Instructions are referred to using their shortened name. The destination address is placed on the left. For example:

    set 100 AA  // Set address 100 (in hexadecimal) to AA
    not 104 100 // Flip all of the bits of 100, writing to 104 
    and 100 104 // Binary AND on 100 and 104, writing to 100

This works the same for every instruction, so it should make sense from here.

Instructions always have a size equivalent to their size as indicated in the instruction set.

Literals

Number literals can be formatted in hexadecimal, binary, or decimal.

By default, they're parsed as hexadecimal.

When the documentations refers to an address or a word, it's referring to a literal.

    set 0 AA        // Set address 0 to AA (hexadecimal)
    set 0 d170      // Set address 0 to 170 (decimal)
    set 0 b10101010 // Set address 0 to 10101010 (binary)
    set 0 xAA       // Set address 0 to AA (hexadecimal)

Labels

The line label Main (putting the name of the label where Main is) allows referring to the address of the next instruction after the label by the name of the label. This is useful for jumping to or reading from certain areas, because there's no way to guarantee where in memory the code is.

Writing a label (e.g. Main) instead of an address will have the label replaced with the address when assembling.

Labels always have a size of zero, because they're an assembler construct.

Raw

You can embed 4 bytes (the size of a CPU word) directly in the binary with the raw line.

This is useful for embedding data that doesn't fit nicely into ASCII, or for setting the instruction pointer. Since binaries always start at address 0, embedding the address of a block (e.g. raw Main) will end up setting the instruction pointer to the given address when the binary is assembled.

This doesn't allow writing smaller amounts of data directly in the binary (e.g. 1 byte), at least for now.

raw always has a size of 4 bytes (the word size).

Bytes

Write bytes <BYTES> to insert the bytes provided directly in the binary. This is typically useful for embedding strings into the binary, because including non-ACSII characters directly

Printing strings can be implemented this way by writing, for example:

label Message
    bytes Hello, world!
    end

The label is included to make it possible to refer to where the string is placed.

Since there's no way to get the length of the embedded bytes, end was added to make the string null-terminated.

bytes has a size of however many bytes you provide to it.

End

You can place a null byte in the binary with the end line.

This is typically useful for ending the program or for padding.

end has a size of one byte.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
src		src
.gitignore		.gitignore
README.md		README.md
ROADMAP.md		ROADMAP.md
build.zig		build.zig
build.zig.zon		build.zig.zon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overscore

CPU

Instruction Set

Set immediate

Move

Not

And

Add

Indirect reading move

Indirect writing move

System instruction

Assembly

Line Syntax

Statements

Instructions

Instructions

Literals

Labels

Raw

Bytes

End

About

Releases

Packages

Languages

GoldenStack/overscore

Folders and files

Latest commit

History

Repository files navigation

Overscore

CPU

Instruction Set

Set immediate

Move

Not

And

Add

Indirect reading move

Indirect writing move

System instruction

Assembly

Line Syntax

Statements

Instructions

Instructions

Literals

Labels

Raw

Bytes

End

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages