Boktai bytecode reference
This content has been moved to the Boktai Hacking Wiki.
Archived version
Table of contents
Credits
- Prof9 for documenting most opcodes in SolDec
- Anonymous for reverse engineering the script index addresses in each game
Useful tools
- bokasm - bytecode assembler and disassembler
- SolDec - bytecode decompiler
- Boktai 1 (U) script info sheet
Script structure
- A script must start with a
block
instruction. The script terminates either when that block ends, or with an explicitreturn
instruction. - Blocks may only contain the following instructions as direct descendants:
end
,expression
,control
, andcall
. - If the bytecode interpreter expects a
value
, almost all instructions can be used, except the following:end
,control
, andcall
. call/control instructions can be used by wrapping them inside ablock
. - With the exception of
pointers
, all values the bytecode interpreter deals with are 32-bit words. The instructionsu8 0x44
,u16 0x44
, andi32 0x44
are therefore all equivalent.
Instruction encoding
The bytecode uses a variable-length instruction encoding. If the first byte is <= 0x0f, then the 1st byte is the opcode itself. Otherwise, the top 4 bits of the 1st byte indicate the opcode, and the bottom 4 bits are part of the instruction's parameters.
Opcode 0x00
(end)
Terminates control
, call
, and block
instructions.
Parameters: None.
Example:
74d8da call 0xdad8 c3 i32 0x2 00 end
Opcode 0x01
(i16)
Immediate signed 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:
018002 i16 0x280
Opcode 0x02
(u8)
Immediate unsigned 8-bit integer.
Parameters: Value (1 byte).
Example:
0246 u8 0x46
Opcode 0x03
(u8)
Unused alias of opcode 0x02
.
Opcode 0x04
(u8)
Unused alias of opcode 0x02
.
Opcode 0x06
(u16)
Immediate unsigned 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:
06d40c u16 0xcd4
Opcode 0x07
(immediate string)
String/byte array directly embedded in the script data.
Parameters:
- Size of array in bytes (1 byte; This means the maximum size of an immediate string is 255)
- N bytes of data
070967616d656f76657200 string "gameover\x00"
Opcode 0x08
(u16)
Used alias of 0x06
.
Opcode 0x09
(i32)
Immediate signed 32-bit integer.
Parameters: Value in little-endian byte order (4 bytes).
Example:
09c0d40100 i32 0x1d4c0
Opcode 0x0a
(i32)
Unused alias of 0x09
.
Opcode 0x0d
(i32)
Unused alias of 0x09
.
Opcode 0x0e
(string reference)
References a string in the text data.
Parameters: ID of the string in little-endian byte order (2 bytes).
Example:
0e850a string-ref 0xa85
Opcode 0x10
(pointer)
Pointer to a value in GBA memory. The address is computed as address = base_address + offset
.
Parameters:
- Data type of referenced value (bottom 4 bits of opcode):
- 0x1: i16
- 0x2: u8
- 0x3: u8 (same as 0x2)
- 0x4: bool (Warning: very slow in Boktai 1, because the bytecode interpreter uses two BIOS division calls to compute the address.)
- 0x6: i16 (same as 0x1)
- 0x8: u16 (Note: for indexed pointers, the game assumes that the element size is 4 bytes instead of 2 bytes. This pointer type is unused in Boktai 1 scripts.)
- 0x9: i32
- Base address (1 byte):
- Top 4 bits: See table below.
- Bottom 4 bits: if type == bool, then the bottom bits indicate the bit number to access.
- Offset in big-endian byte order (2 bytes)
Base address:
0x00 | 0x10 | 0x80 | |
---|---|---|---|
Contents | Save/respawn state | Other | Current state |
Boktai 1 | 0x0203e800 |
0x0203f000 |
0x0203d800 |
Boktai 2 | *(void**) 0x03004698 |
*(void**) 0x03004690 |
*(void**) 0x030046a0 |
Boktai 3 | *(void**) 0x0203db08 |
*(void**) 0x0203e308 |
*(void**) 0x0203c508 |
Boktai 2/3 (after a hard reset only) | 0x0203da00 |
0x0203e200 |
0x0203c400 |
(Boktai 1): For current state, there is a back up at 0x0203e000
. On deaths, that backup is restored back to 0x0203d800.
(Boktai 2/3): The base address is stored in the memory addresses specified in the table. They change on soft resets.
Example:
1110018c ptr i16, 0x203f18c 1412010c ptr bool, 0x203f10c, bit=0x2
Opcode 0x20
(indexed pointer)
Similar to pointer
, but with an additional dynamic offset. The address is computed as address = base_address + offset + index*sizeof(data_type)
.
Note: For type == bool: bitnum = base_bit + index; address = base_address + offset + bitnum/8; bit = bitnum%8
.
Parameters:
- Data type of referenced value (see pointer opcode for details)
- Base address (see pointer opcode for details)
- Offset (see pointer opcode for details)
- TODO: purpose? Maybe array size for bounds checking? (instruction)
- Index (instruction)
22100129 indexed-ptr u8, 0x203f129 c5 i32 0x4 // array size? 32 expr // index 42 param 0x2 a0 end-expr
Opcode 0x30
(expression)
Marks the start of an expression. Expressions are encoded in reverse polish notation (e.g. "4 * 8 + 1" would be encoded as "4 8 * 1 +"). Most instructions inside of an expression "push" values onto an operand stack, while operators
"pop" arguments and may "push" a result back.
Parameters: Length of expression in bytes (container length).
The following general rules should be followed; it is unknown what happens if you violate them:
- Expressions must be terminated with an
end expression
operator. - At the end of an expression, the operand stack should contain either 0 values (e.g. if the expression is a statement like
a = b+c;
), or exactly 1 value (e.g. if the expression is the condition of anif
control structure). call
instructions should be wrapped inside of ablock
.- Unknown what happens if the operand stack under/overflows.
Examples
Simple "statement"-type expression:
38 expr // v3 = (v2-v1) * -1 93 var 0x3 92 var 0x2 91 var 0x1 a5 sub c0 i32 -0x1 a6 mul b6 store a0 end-expr
Unary operators like "not" take a dummy first operand (the i32 0x0
):
37 expr // !BIT(0x0203e954, 3) c1 i32 0x0 14030154 ptr bool, 0x203e954, bit=0x3 a2 not a0 end-expr
Calling a function within an expression requires wrapping the call with a block:
39 expr // FUN_0x7644() == 0 85 block 734476 call 0x7644 00 end 00 end c1 i32 0x0 ab eq a0 end-expr
Opcode 0x40
(parameter)
Accesses a parameter of the current script.
- For parameter numbers < 0xf: Parameter number is the bottom 4 bits of the opcode.
- For parameter numbers >= 0xf: Bottom 4 bits of opcode are set to 0xf, parameter number is 0xf + following byte.
- Parameter 0 is possibly the return value?
- Maximum number of parameters is unknown. The highest parameter Boktai 1 uses is
param 0x10
.
4d param 0xd 4f01 param 0x10
Opcode 0x50
(keyword)
Keywords are used inside of control structures to define their behavior. Keywords contain child instructions, but are not terminated with an explicit "end" instruction.
Parameters:
- Length of keyword (container length)
- Keyword type (1 byte; usually a printable ASCII character hinting at the meaning, e.g. 0x63 = "c" = "case")
5274 keyword 0x74
Opcode 0x60
(control)
Marks the start of a control structure. Control structures must be terminated with an end
instruction.
Parameters:
- Length of control structure in bytes (container length)
- Control structure type in little-endian byte order (2 bytes)
- Byte count until the next keyword instruction or the end instruction (whichever comes first)
- Value <= 0x7f: 1 byte
- Value > 0x7f: 2 bytes; top bit of 1st byte is set to 1;
value = ((first & 0x7f)<<8) | second
6d12ff220e control 0x22ff, next_keyword=0xe 6e6d05860d82c5 control 0xd86, next_keyword=0x2c5
Opcode 0x70
(call)
Calls another script. Supports passing arguments, and the return value can be used in expressions. Calls must be terminated with an end
instruction.
Parameters:
- Length of call in bytes (container length)
- Script ID in little-endian byte order (2 bytes)
- 0-N arguments (instructions)
7d120aa5 call 0xa50a // FUN_0xa50a(7, *0x203f11c - 0x100, *0x203f11e) c8 i32 0x7 39 expr 1110011c ptr i16, 0x203f11c 010001 i16 0x100 a5 sub a0 end-expr 1110011e ptr i16, 0x203f11e 00 end
Opcode 0x80
(block)
Starts a block. Every script must be wrapped inside of a block. Blocks are also used to delimit the branches of if and switch control structures. Blocks must be terminated with an end
instruction.
Parameters: Length of block in bytes (container length)
Example:
5963 case c6 i32 0x5 86 block 34 expr 95 var 0x5 c1 i32 0x0 b6 store a0 end-expr 00 end
Opcode 0x90
(variable)
Accesses a variable of the current script. Maximum number of variables is unknown. The highest variable number Boktai 1 uses is var 0xb
.
Parameters: Variable number (bottom 4 bits of opcode).
Examples:
97 var 0x7
Opcode 0xa0
-0xbf
(operator)
Operators perform computations and effects inside of an expression
. All operators (except for "end expression") take two operands; unary operators take a dummy first operand which is popped from the stack but otherwise ignored. For examples, see the expression opcode. Opcode 0xb7 is defined but unused in Boktai 1 scripts. Opcodes 0xb8-0xbf are undefined and should not be used.
Opcode | Name | Stack transition |
---|---|---|
0xa0 | end expression | |
0xa1 | negate | ..., dummy, value → ..., result |
0xa2 | logical not | ..., dummy, value → ..., result |
0xa3 | bitwise not | ..., dummy, value → ..., result |
0xa4 | add | ..., value1, value2 → ..., result |
0xa5 | subtract | ..., minuend, subtrahend → ..., result |
0xa6 | multiply | ..., value1, value2 → ..., result |
0xa7 | divide | ..., dividend, divisor → ..., result |
0xa8 | modulo | ..., dividend, divisor → ..., result |
0xa9 | shift left | ..., value, shift → ..., result |
0xaa | logical shift right | ..., value, shift → ..., result |
0xab | equal | ..., value1, value2 → ..., result |
0xac | not equal | ..., value1, value2 → ..., result |
0xad | less than | ..., value1, value2 → ..., result |
0xae | less or equal | ..., value1, value2 → ..., result |
0xaf | greater than | ..., value1, value2 → ..., result |
0xb0 | greater or equal | ..., value1, value2 → ..., result |
0xb1 | bitwise or | ..., value1, value2 → ..., result |
0xb2 | bitwise and | ..., value1, value2 → ..., result |
0xb3 | bitwise xor | ..., value1, value2 → ..., result |
0xb4 | logical or | ..., value1, value2 → ..., result |
0xb5 | logical and | ..., value1, value2 → ..., result |
0xb6 | store | ..., destination, source → ... |
0xb7 | unused | ..., dummy, value → ..., value |
0xb8+ | undefined | ..., dummy, dummy → ..., zero |
Opcode 0xc0
-0xff
(i32)
Immediate signed 32-bit integer, "compressed" encoding for integers in the range [-1; 62].
Formula: value = (opcode & 0x3f) - 1
Parameters: None.
Example:
d7 i32 0x16
Undefined opcodes
The following opcodes are undefined and should not be used: 0x05, 0x0b, 0x0c, 0x0f.
Container lengths
Instructions that start a "container"-like structure (expression
, keyword
, control
, call
, and block
) include the length of the container in bytes as a parameter. This length is calculated over the parameters of the call/control instruction, all child instructions, and the terminating end
or end expression
instruction. It does not include the opcode byte of the current instruction or the length bytes themselves. As an example, the following call instruction has a length of 0x10 bytes:
7d10dd56 call 0x56dd // 2 bytes of call parameters (script id 0x56dd) 06bccc u16 0xccbc // \ 06e890 u16 0x90e8 // | c1 i32 0x0 // | 13 bytes of child instructions 0842f1 u16 0xf142 // | 08eb74 u16 0x74eb // / 00 end // 1 byte of end instruction
The length is encoded like so:
- Length <= 0xc bytes: Length is stored in the bottom 4 bits of the opcode byte.
- Length <= 0xff bytes: Bottom 4 bits of opcode byte are set to 0xd, after the opcode byte is 1 byte containing the length.
- Length <= 0xffff bytes: Bottom 4 bits of opcode byte are set to 0xe, after the opcode byte are 2 bytes containing the length in little-endian byte order.
- Length <= 0xffffff bytes: Bottom 4 bits of opcode byte are set to 0xf, after the opcode byte are 3 bytes containing the length in little-endian byte order.
Control structures
This section documents the control structures supported by the control
and keyword
instructions. Each control structure contains a description of its grammar in EBNF.
Control 0x0d86
(if/else if/else)
Conditional execution. Supports "else if" and "else" blocks, both optional.
Grammar:
control 0x0d86
, value, block
, { else if }, [ else ], end
;
else if = keyword 0x69
, value, block
;
else = keyword 0x65
, block
;Example
6d28860d0c if 34 expr 42 param 0x2 c1 i32 0x0 ab eq a0 end-expr 86 block 745d9f call 0x9f5d c3 i32 0x2 00 end 00 end 5d0d69 else-if 34 expr 42 param 0x2 c5 i32 0x4 ab eq a0 end-expr 86 block 745d9f call 0x9f5d c2 i32 0x1 00 end 00 end 5865 else 86 block 745d9f call 0x9f5d c1 i32 0x0 00 end 00 end 00 end
Control 0x121f
(call indirect)
Calls a script using a dynamic (= not hardcoded) script id
control 0x121f
, script_id, { args }, end
;691f1205 control 0x121f 1900021c ptr i32, 0x203ea1c ; script id c1 i32 0x0 ; param 1 00 end
Control 0x22ff
(TODO)
Unknown.
Grammar:
control 0x22ff
, { value }, end
;6d0dff2209 control 0x22ff 0675d8 u16 0xd875 060000 u16 0x0 06f773 u16 0x73f7 00 end
Control 0x4a6f
(switch/case/default)
Conditional execution. Supports a "default" case if no explicit case matches. There is no explicit "break" statement like in other programming languages; after a case matches and its code was executed, no further cases will be interpreted.
Grammar:
control 0x4a6f
, value, { case }, [ default ], end
;
case = keyword 0x63
, value, block
;
default = keyword 0x64
, block
;Example
6d2a6f4a06 switch 35 expr 198002dc ptr i32, 0x203dadc a0 end-expr 5a63 case c5 i32 0x4 87 block 653acd01 return c7 i32 0x6 00 end 00 end 5a63 case c7 i32 0x6 87 block 653acd01 return c7 i32 0x6 00 end 00 end 5964 default 87 block 653acd01 return c1 i32 0x0 00 end 00 end 00 end
Control 0x9906
(call engine)
Similar to control 0xb745
but with a different calling convention. The 1st parameter is the ID of the engine function to call. The 2nd parameter will be passed to the engine function in r0
. Both control 0x9906 and 0xb745 use the same dispatch table.
Control 0xb745
(call engine)
Generic "call engine" or "call native code" instruction. The 1st parameter is the ID of the engine function to call. The called function is then responsible for interpreting the remaining parameters and keywords.
Control 0xcd3a
(return)
Returns from the current script, optionally with a return value. The return value can be almost anything, including an expression. Usage of a return instruction is optional; the script will implicitly return when the top-level block
ends. If no return value is specified, 0 is implicitly returned.
Grammar:
control 0xcd3a
, [ value ], end
;653acd01 return 96 var 0x6 00 end
Implementation pointers
Name | B1J | B1U | B2J |
---|---|---|---|
Program counter | 0x03004520 | 0x030045a0 | |
Execute script by ID | 0x081c4774 | 0x081c5e74 | |
Opcode 0x01 (i16) | 0x081c41d4 | ||
Opcode 0x07 (immediate string) | 0x081c4212 | ||
Opcode 0x10 (pointer) | 0x081c4cc8 | ||
Pointer data type | 0x081c4c10 | ||
Opcode 0x30 (expression) | 0x081c5488 | ||
Opcode 0xa0 (operator) | 0x081c5364 | ||
Control opcode dispatcher | 0x085809a4 | 0x081c5dc4 | |
Control table 1 | 0x085809a4 | ||
Control table 2 | 0x08ee9f24 | ||
Control 0x0bb3 (unused) | 0x081cd1e8 | ||
Control 0x0d86 (if) | 0x081c51b9 | ||
Control 0x121f (TODO) | 0x081c52d0 | ||
Control 0x22ff (TODO) | 0x081cd2a0 | ||
Control 0x4a6f (switch) | 0x081c51d5 | ||
Control 0x64c0 (unused) | 0x081c5231 | ||
Control 0x9906 (TODO) | 0x081cd244 | ||
Control 0xb745 (call engine) | 0x081cd274 | ||
Control 0xb96e (unused) | 0x081c527d | ||
Control 0xc091 (unused) | 0x081cd304 | ||
Control 0xc8bb (TODO) | 0x081cd1ec | ||
Control 0xcd3a (return) | 0x081c5251 | ||
Control 0xd4cb (TODO) | 0x081cd308 | ||
Control 0xe43c (TODO) | 0x081cd480 |