raphi.xyz

Boktai bytecode reference

This content has been moved to the Boktai Hacking Wiki.

Archived version

Table of contents

Credits

Useful tools

Script structure

Instruction encoding

The bytecode uses a variable-length instruction encoding. If the first byte is <= 0x0f, then the 1st byte is the opcode itself. Otherwise, the top 4 bits of the 1st byte indicate the opcode, and the bottom 4 bits are part of the instruction's parameters.

Opcode 0x00 (end)

Terminates control, call, and block instructions.
Parameters: None.
Example:

74d8da       call 0xdad8
c3               i32 0x2
00           end

Opcode 0x01 (i16)

Immediate signed 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:

018002       i16 0x280

Opcode 0x02 (u8)

Immediate unsigned 8-bit integer.
Parameters: Value (1 byte).
Example:

0246         u8 0x46

Opcode 0x03 (u8)

Unused alias of opcode 0x02.

Opcode 0x04 (u8)

Unused alias of opcode 0x02.

Opcode 0x06 (u16)

Immediate unsigned 16-bit integer.
Parameters: Value in little-endian byte order (2 bytes).
Example:

06d40c       u16 0xcd4

Opcode 0x07 (immediate string)

String/byte array directly embedded in the script data.
Parameters:

  1. Size of array in bytes (1 byte; This means the maximum size of an immediate string is 255)
  2. N bytes of data
Example:
070967616d656f76657200    string "gameover\x00"

Opcode 0x08 (u16)

Used alias of 0x06.

Opcode 0x09 (i32)

Immediate signed 32-bit integer.
Parameters: Value in little-endian byte order (4 bytes).
Example:

09c0d40100   i32 0x1d4c0

Opcode 0x0a (i32)

Unused alias of 0x09.

Opcode 0x0d (i32)

Unused alias of 0x09.

Opcode 0x0e (string reference)

References a string in the text data.
Parameters: ID of the string in little-endian byte order (2 bytes).
Example:

0e850a       string-ref 0xa85

Opcode 0x10 (pointer)

Pointer to a value in GBA memory. The address is computed as address = base_address + offset.
Parameters:

  1. Data type of referenced value (bottom 4 bits of opcode):
    • 0x1: i16
    • 0x2: u8
    • 0x3: u8 (same as 0x2)
    • 0x4: bool (Warning: very slow in Boktai 1, because the bytecode interpreter uses two BIOS division calls to compute the address.)
    • 0x6: i16 (same as 0x1)
    • 0x8: u16 (Note: for indexed pointers, the game assumes that the element size is 4 bytes instead of 2 bytes. This pointer type is unused in Boktai 1 scripts.)
    • 0x9: i32
  2. Base address (1 byte):
    • Top 4 bits: See table below.
    • Bottom 4 bits: if type == bool, then the bottom bits indicate the bit number to access.
  3. Offset in big-endian byte order (2 bytes)

Base address:

0x00 0x10 0x80
Contents Save/respawn state Other Current state
Boktai 1 0x0203e800 0x0203f000 0x0203d800
Boktai 2 *(void**) 0x03004698 *(void**) 0x03004690 *(void**) 0x030046a0
Boktai 3 *(void**) 0x0203db08 *(void**) 0x0203e308 *(void**) 0x0203c508
Boktai 2/3 (after a hard reset only) 0x0203da00 0x0203e200 0x0203c400

(Boktai 1): For current state, there is a back up at 0x0203e000. On deaths, that backup is restored back to 0x0203d800.
(Boktai 2/3): The base address is stored in the memory addresses specified in the table. They change on soft resets.

Example:

1110018c     ptr i16, 0x203f18c
1412010c     ptr bool, 0x203f10c, bit=0x2

Opcode 0x20 (indexed pointer)

Similar to pointer, but with an additional dynamic offset. The address is computed as address = base_address + offset + index*sizeof(data_type).
Note: For type == bool: bitnum = base_bit + index; address = base_address + offset + bitnum/8; bit = bitnum%8.
Parameters:

  1. Data type of referenced value (see pointer opcode for details)
  2. Base address (see pointer opcode for details)
  3. Offset (see pointer opcode for details)
  4. TODO: purpose? Maybe array size for bounds checking? (instruction)
  5. Index (instruction)
Example:
22100129     indexed-ptr u8, 0x203f129
c5           i32 0x4  // array size?
32           expr     // index
42               param 0x2
a0           end-expr

Opcode 0x30 (expression)

Marks the start of an expression. Expressions are encoded in reverse polish notation (e.g. "4 * 8 + 1" would be encoded as "4 8 * 1 +"). Most instructions inside of an expression "push" values onto an operand stack, while operators "pop" arguments and may "push" a result back.
Parameters: Length of expression in bytes (container length).
The following general rules should be followed; it is unknown what happens if you violate them:

Examples

Simple "statement"-type expression:

38           expr          // v3 = (v2-v1) * -1
93               var 0x3
92               var 0x2
91               var 0x1
a5               sub 
c0               i32 -0x1
a6               mul 
b6               store 
a0           end-expr

Unary operators like "not" take a dummy first operand (the i32 0x0):

37           expr          // !BIT(0x0203e954, 3)
c1               i32 0x0
14030154         ptr bool, 0x203e954, bit=0x3
a2               not 
a0           end-expr 

Calling a function within an expression requires wrapping the call with a block:

39           expr          // FUN_0x7644() == 0
85               block 
734476               call 0x7644
00                   end 
00               end 
c1               i32 0x0
ab               eq 
a0           end-expr

Opcode 0x40 (parameter)

Accesses a parameter of the current script.

Examples:
4d           param 0xd
4f01         param 0x10

Opcode 0x50 (keyword)

Keywords are used inside of control structures to define their behavior. Keywords contain child instructions, but are not terminated with an explicit "end" instruction.
Parameters:

  1. Length of keyword (container length)
  2. Keyword type (1 byte; usually a printable ASCII character hinting at the meaning, e.g. 0x63 = "c" = "case")
Example (see the control structure reference for details and more examples):
5274         keyword 0x74

Opcode 0x60 (control)

Marks the start of a control structure. Control structures must be terminated with an end instruction.
Parameters:

  1. Length of control structure in bytes (container length)
  2. Control structure type in little-endian byte order (2 bytes)
  3. Byte count until the next keyword instruction or the end instruction (whichever comes first)
    • Value <= 0x7f: 1 byte
    • Value > 0x7f: 2 bytes; top bit of 1st byte is set to 1; value = ((first & 0x7f)<<8) | second
Examples (see the control structure reference for details and more examples):
6d12ff220e       control 0x22ff, next_keyword=0xe
6e6d05860d82c5   control 0xd86, next_keyword=0x2c5

Opcode 0x70 (call)

Calls another script. Supports passing arguments, and the return value can be used in expressions. Calls must be terminated with an end instruction.
Parameters:

  1. Length of call in bytes (container length)
  2. Script ID in little-endian byte order (2 bytes)
  3. 0-N arguments (instructions)
Example:
7d120aa5     call 0xa50a    // FUN_0xa50a(7, *0x203f11c - 0x100, *0x203f11e)
c8               i32 0x7
39               expr 
1110011c             ptr i16, 0x203f11c
010001               i16 0x100
a5                   sub 
a0               end-expr 
1110011e         ptr i16, 0x203f11e
00           end

Opcode 0x80 (block)

Starts a block. Every script must be wrapped inside of a block. Blocks are also used to delimit the branches of if and switch control structures. Blocks must be terminated with an end instruction.
Parameters: Length of block in bytes (container length)
Example:

5963         case
c6           i32 0x5
86           block
34               expr
95                   var 0x5
c1                   i32 0x0
b6                   store 
a0               end-expr 
00           end

Opcode 0x90 (variable)

Accesses a variable of the current script. Maximum number of variables is unknown. The highest variable number Boktai 1 uses is var 0xb.
Parameters: Variable number (bottom 4 bits of opcode).
Examples:

97           var 0x7

Opcode 0xa0-0xbf (operator)

Operators perform computations and effects inside of an expression. All operators (except for "end expression") take two operands; unary operators take a dummy first operand which is popped from the stack but otherwise ignored. For examples, see the expression opcode. Opcode 0xb7 is defined but unused in Boktai 1 scripts. Opcodes 0xb8-0xbf are undefined and should not be used.

Opcode Name Stack transition
0xa0 end expression
0xa1 negate ..., dummy, value → ..., result
0xa2 logical not ..., dummy, value → ..., result
0xa3 bitwise not ..., dummy, value → ..., result
0xa4 add ..., value1, value2 → ..., result
0xa5 subtract ..., minuend, subtrahend → ..., result
0xa6 multiply ..., value1, value2 → ..., result
0xa7 divide ..., dividend, divisor → ..., result
0xa8 modulo ..., dividend, divisor → ..., result
0xa9 shift left ..., value, shift → ..., result
0xaa logical shift right ..., value, shift → ..., result
0xab equal ..., value1, value2 → ..., result
0xac not equal ..., value1, value2 → ..., result
0xad less than ..., value1, value2 → ..., result
0xae less or equal ..., value1, value2 → ..., result
0xaf greater than ..., value1, value2 → ..., result
0xb0 greater or equal ..., value1, value2 → ..., result
0xb1 bitwise or ..., value1, value2 → ..., result
0xb2 bitwise and ..., value1, value2 → ..., result
0xb3 bitwise xor ..., value1, value2 → ..., result
0xb4 logical or ..., value1, value2 → ..., result
0xb5 logical and ..., value1, value2 → ..., result
0xb6 store ..., destination, source → ...
0xb7 unused ..., dummy, value → ..., value
0xb8+ undefined ..., dummy, dummy → ..., zero

Opcode 0xc0-0xff (i32)

Immediate signed 32-bit integer, "compressed" encoding for integers in the range [-1; 62].
Formula: value = (opcode & 0x3f) - 1
Parameters: None.
Example:

d7      i32 0x16

Undefined opcodes

The following opcodes are undefined and should not be used: 0x05, 0x0b, 0x0c, 0x0f.

Container lengths

Instructions that start a "container"-like structure (expression, keyword, control, call, and block) include the length of the container in bytes as a parameter. This length is calculated over the parameters of the call/control instruction, all child instructions, and the terminating end or end expression instruction. It does not include the opcode byte of the current instruction or the length bytes themselves. As an example, the following call instruction has a length of 0x10 bytes:

7d10dd56     call 0x56dd       //  2 bytes of call parameters (script id 0x56dd)
06bccc           u16 0xccbc    //  \
06e890           u16 0x90e8    //  |
c1               i32 0x0       //  |  13 bytes of child instructions
0842f1           u16 0xf142    //  |
08eb74           u16 0x74eb    //  /
00           end               //  1 byte of end instruction

The length is encoded like so:

Control structures

This section documents the control structures supported by the control and keyword instructions. Each control structure contains a description of its grammar in EBNF.

Control 0x0d86 (if/else if/else)

Conditional execution. Supports "else if" and "else" blocks, both optional.
Grammar:

if = control 0x0d86, value, block, { else if }, [ else ], end ; else if = keyword 0x69, value, block ; else = keyword 0x65, block ;
Example
6d28860d0c   if 
34               expr 
42                   param 0x2
c1                   i32 0x0
ab                   eq 
a0               end-expr 
86               block 
745d9f               call 0x9f5d
c3                       i32 0x2
00                   end 
00               end 
5d0d69           else-if 
34               expr 
42                   param 0x2
c5                   i32 0x4
ab                   eq 
a0               end-expr 
86               block 
745d9f               call 0x9f5d
c2                       i32 0x1
00                   end 
00               end 
5865             else 
86               block 
745d9f               call 0x9f5d
c1                       i32 0x0
00                   end 
00               end 
00           end

Control 0x121f (call indirect)

Calls a script using a dynamic (= not hardcoded) script id

call-indirect = control 0x121f, script_id, { args }, end ;
Example:
691f1205         control 0x121f
1900021c             ptr i32, 0x203ea1c  ; script id
c1                   i32 0x0             ; param 1
00               end

Control 0x22ff (TODO)

Unknown.
Grammar:

unknown = control 0x22ff, { value }, end ;
Example:
6d0dff2209       control 0x22ff
0675d8               u16 0xd875
060000               u16 0x0
06f773               u16 0x73f7
00               end

Control 0x4a6f (switch/case/default)

Conditional execution. Supports a "default" case if no explicit case matches. There is no explicit "break" statement like in other programming languages; after a case matches and its code was executed, no further cases will be interpreted.
Grammar:

switch = control 0x4a6f, value, { case }, [ default ], end ; case = keyword 0x63, value, block ; default = keyword 0x64, block ;
Example
6d2a6f4a06   switch 
35               expr 
198002dc             ptr i32, 0x203dadc
a0               end-expr 
5a63             case 
c5               i32 0x4
87               block 
653acd01             return 
c7                       i32 0x6
00                   end 
00               end 
5a63             case 
c7               i32 0x6
87               block 
653acd01             return 
c7                       i32 0x6
00                   end 
00               end 
5964             default 
87               block 
653acd01             return 
c1                       i32 0x0
00                   end 
00               end 
00           end

Control 0x9906 (call engine)

Similar to control 0xb745 but with a different calling convention. The 1st parameter is the ID of the engine function to call. The 2nd parameter will be passed to the engine function in r0. Both control 0x9906 and 0xb745 use the same dispatch table.

Control 0xb745 (call engine)

Generic "call engine" or "call native code" instruction. The 1st parameter is the ID of the engine function to call. The called function is then responsible for interpreting the remaining parameters and keywords.

Control 0xcd3a (return)

Returns from the current script, optionally with a return value. The return value can be almost anything, including an expression. Usage of a return instruction is optional; the script will implicitly return when the top-level block ends. If no return value is specified, 0 is implicitly returned.
Grammar:

return = control 0xcd3a, [ value ], end ;
Example:
653acd01     return
96               var 0x6
00           end

Implementation pointers

Name B1J B1U B2J
Program counter 0x03004520 0x030045a0
Execute script by ID 0x081c4774 0x081c5e74
Opcode 0x01 (i16) 0x081c41d4
Opcode 0x07 (immediate string) 0x081c4212
Opcode 0x10 (pointer) 0x081c4cc8
Pointer data type 0x081c4c10
Opcode 0x30 (expression) 0x081c5488
Opcode 0xa0 (operator) 0x081c5364
Control opcode dispatcher 0x085809a4 0x081c5dc4
Control table 1 0x085809a4
Control table 2 0x08ee9f24
Control 0x0bb3 (unused) 0x081cd1e8
Control 0x0d86 (if) 0x081c51b9
Control 0x121f (TODO) 0x081c52d0
Control 0x22ff (TODO) 0x081cd2a0
Control 0x4a6f (switch) 0x081c51d5
Control 0x64c0 (unused) 0x081c5231
Control 0x9906 (TODO) 0x081cd244
Control 0xb745 (call engine) 0x081cd274
Control 0xb96e (unused) 0x081c527d
Control 0xc091 (unused) 0x081cd304
Control 0xc8bb (TODO) 0x081cd1ec
Control 0xcd3a (return) 0x081c5251
Control 0xd4cb (TODO) 0x081cd308
Control 0xe43c (TODO) 0x081cd480

Tags: boktai boktai1