TaiyohMark
TaiyohMark applies the ideas from Shrek_2_TASHack to an RTA Any% speedrun of the Japanese version of Boktai 1. A playthrough of the game is recorded, and by syncing the input playback to the game’s frame loop, the recording can be replayed on multiple systems to check how much time each system requires to complete the playback. Syncing the input playback to the frame loop (as opposed to the console’s frame rate) is critical, because it allows the same recording to sync on systems with wildly different timing accuracy.
The systems under test were chosen based on availability (for hardware). Emulators were chosen based on their prevalence and accuracy in other tests. Not every emulator was tested due to time constraints.
Related work
Like Shrek_2_TASHack, this method can be used to create a measure of emulation accuracy. Unlike Shrek_2_TASHack, TaiyohMark uses RTA inputs as opposed to TAS inputs, and of course the game is completely different as well.
Unlike CiroteMark, this method counts frames instead of CPU cycles, so the test results are not as precise. CiroteMark is also a somewhat synthetic benchmark and not representative of normal gameplay, while TaiyohMark tries to be as close as possible (but see the determinism section) to an RTA setting. And TaiyohMark is more pleasant on the viewer’s senses.
Determinism
To ensure deterministic playback on all systems, TaiyohMark uses a patched version of Boktai 1. This means that the test results from TaiyohMark cannot be exactly applied to unmodified Boktai 1, although care was taken to ensure that the differences are minimal. However, it is very likely that an “identical” playthrough of both TaiyohMark and unmodified Boktai 1 takes a different amount of time to complete.
Cartridge hardware
The patch makes the following cartridge hardware inaccessible:
- Solar Sensor: Connected to the cartridge GPIO port.
The patch sets
80000C8h - I/O Port Control
to 0, which causes all reads from the GPIO port to always return 0. Instead, Prof9’s solar sensor patch is used to control the solar sensor with the keypad. This is a significant change to the game, but the change is necessary because the solar sensor’s output can’t be controlled with the required sub-frame precision on any system. - RTC: Also connected to the cartridge GPIO port, and also made inaccessible with the 80000C8h register. This change is justifiable because in real Boktai 1 speedruns, the RTC battery is likely drained anyway, and as such the RTC wouldn’t be powered.
- EEPROM: To ensure that playback does not depend on any save games that may or may not be present, the EEPROM read/write code is patched to do nothing in both cases. The Boktai 1 speedrun does not save or reset the console. If required, load/save timing can be checked separately.
Timekeeping
Even with the RTC disabled, the game still keeps track of real time by counting frames in the VBlank interrupt handler. This means that real time passes even during lag frames. Real time affects gameplay. For example, there is one early puzzle where the shape of a room depends on the “seconds” part of real time (the garden in Bloodrust Mansion).
To make this puzzle (and others) deterministic as well, the patch moves the timekeeping code from the VBlank interrupt handler into the input polling function.
This is a significant change because timekeeping does not run on lag frames anymore and therefore consumes fewer CPU cycles over the course of the playthrough. But it is a necessary one to make playback sync independent of any lag frame differences between systems.
Timing
Every frame on the GBA lasts for 280
896 CPU cycles. The GBA’s system clock is 16
777 216 Hz, resulting in a framerate of
16777216/280896 ≈ 59.727 fps
. The NDS’s system clock is 33 513 982
Hz, this is halved
in GBA mode resulting in a framerate of
(33513982/2)/280896 ≈ 59.656 fps
.
The systems were timed as follows:
- GameBoy Player (comparison baseline): Timed by video capture (using GBI SR, a Retro-Bit Prism HDMI adapter, and an EVGA XR1 Lite capture card) and measuring the duration in Avidemux.
- Nintendo DS Lite: Timed by recording the screen with a camera (thanks, LanHikariDS), measuring the duration in Avidemux, and accounting for the framerate difference between NDS and GBA.
- Old 3DS: Timed by video capture using Loopy’s capture board, measuring the duration in Avidemux, and accounting for the framerate difference between NDS and GBA (as implied by this open_agb_firm discussion).
- NanoBoyAdvance: Timed by screen recording and measuring the duration in Avidemux (this emulator has no frame counter overlay).
- All other emulators were timed using their frame counter overlay and dividing the frame count by the GBA’s framerate.
All systems use a BIOS, and the BIOS intro is not skipped.
Timing starts when creating a new file after confirming the “IS THE ABOVE OK?” question, at the first frame where the NAME/CITY values disappear. Timing ends when defeating the final boss, at the first frame when its health bar becomes empty. There is no automated analysis and/or an frame/poll count display integrated into the test ROMs yet.
Test results
For this test, I recorded 3 independent playthroughs of the Boktai 1 Any% speedrun, using human inputs (no TASing). Ideally, there’d be more trials, but due to the time-consuming nature of this test (every trial is approximately 1½ hours, and some systems do not support fast-forwarding the playback), I’ve restricted myself to 3.
Then I played back each recording on the systems below and noted the
duration of the playback. The main result of this test is the
Difference per GameBoy Player (GBP) hour, calculated as
(duration - GBP_duration) / GBP_duration * 1 hour
. For
example, if emulator A has difference per GBP hour of -20 seconds, this
means that the same hour of gameplay on GBP only requires 59 minutes and
40 seconds on emulator A.
Of course, every hour of gameplay is different, so these results shouldn’t be generalized to other categories, other players, other games, or even an unmodified version of Boktai 1. For example, the differences here are much less than in Shrek_2_TASHack. What can be generalized is the order of magnitude: The gaps between VBA-RR, mGBA, and console are clearly identifiable in both tests.
The table below shows the difference per GBP hour in seconds for each system and trial:
System | Trial 1 | Trial 2 | Trial 3 | Mean |
---|---|---|---|---|
Mesen (git fabc9a62) | +0.002 | -0.018 | -0.017 | -0.011 |
GBAHawk 2.3.2 | +0.002 | -0.018 | -0.017 | -0.011 |
Old 3DS (OAF beta_2024-12-24) | +0.030 | +0.029 | +0.037 | +0.032 |
NDS Lite | -0.112 | -0.156 | -0.136 | -0.135 |
NanoBoyAdvance 1.8.2 | -3.810 | -3.080 | -2.980 | -3.290 |
mGBA 0.10.5 | -21.638 | -20.402 | -22.698 | -21.579 |
BizHawk 2.10 | -21.861 | -20.656 | -22.797 | -21.771 |
VBA-RR v23.6 svn480-LRC4 | +135.743 | +138.623 | +136.367 | +136.911 |
Mesen and GBAHawk perform identically in this test. There is one frame of difference compared to GBP in trials 2 and 3 for which I do not know the source (could be measurement error due to the video capture, or a real inaccuracy in emulation). Old 3DS is slightly slower (≈ 2 frames/h), but the difference is still tiny.
The NDS result is surprising since it’s gaining more than a tenth per hour over GBP in every trial (after compensating for the framerate difference). This still doesn’t matter for RTA purposes, but there seems to be a measurable difference, due to unknown factors, between GBP and NDS-in-GBA-mode.
NanoBoyAdvance is somewhat too fast, mGBA (and BizHawk’s mGBA core) are moderately faster, and VBA-RR is excessively slow.
Raw data
Trial,System,Duration (s),Difference (s),diff/hour (s)
1,GBP (baseline),5417.233,,
1,Mesen (git fabc9a62),5417.237,0.004,0.002
1,NanoBoyAdvance 1.8.2,5411.500,-5.733,-3.810
1,mGBA 0.10.5,5384.672,-32.561,-21.638
1,BizHawk 2.10,5384.337,-32.896,-21.861
1,GBAHawk 2.3.2,5417.237,0.004,0.002
1,Old 3DS (OAF beta_2024-12-24),5417.278,0.045,0.030
1,NDS Lite,5417.064,-0.169,-0.112
1,VBA-RR v23.6 svn480-LRC4,5621.498,204.265,135.743
2,GBP (baseline),5473.050,,
2,Mesen (git fabc9a62),5473.023,-0.027,-0.018
2,NanoBoyAdvance 1.8.2,5468.367,-4.683,-3.080
2,mGBA 0.10.5,5442.033,-31.017,-20.402
2,BizHawk 2.10,5441.647,-31.403,-20.656
2,GBAHawk 2.3.2,5473.023,-0.027,-0.018
2,Old 3DS (OAF beta_2024-12-24),5473.094,0.044,0.029
2,NDS Lite,5472.814,-0.236,-0.156
2,VBA-RR v23.6 svn480-LRC4,5683.797,210.747,138.623
3,GBP (baseline),5477.000,,
3,Mesen (git fabc9a62),5476.975,-0.025,-0.017
3,NanoBoyAdvance 1.8.2,5472.467,-4.533,-2.980
3,mGBA 0.10.5,5442.468,-34.532,-22.698
3,BizHawk 2.10,5442.317,-34.683,-22.797
3,GBAHawk 2.3.2,5476.975,-0.025,-0.017
3,Old 3DS (OAF beta_2024-12-24),5477.056,0.056,0.037
3,NDS Lite,5476.794,-0.206,-0.136
3,VBA-RR v23.6 svn480-LRC4,5684.467,207.467,136.367
Potential further research
- Determining the root cause of the difference between GBP, NDS, and Mesen. Measurement error can be checked by counting VBlank interrupts in-game, but care must be taken in periods where the game disables interrupts.
- Determining when and why mGBA’s and NBA’s timing differs from GBP.
- Timing the dungeons individually instead of the entire game at once.
- The playback for trial 2 initially only synced on mGBA and BizHawk, but not on Mesen or Old 3DS. This is strange because the input playback should be fully deterministic. I worked around this by manually duplicating a frame in the main menu (at the time of the desync), but this shouldn’t have happened at all.