Reverse engineering 3D Movie Maker - Part 3
A while ago, I started reverse engineering Microsoft 3D Movie Maker to understand how it works and to develop my game reversing skills. This blog series is about my adventures in reversing 3D Movie Maker and some of the interesting things I learnt along the way.
Previously, on “Reverse engineering 3D Movie Maker”:
- I recovered the C++ class hierarchy by reversing the custom runtime type identification system and wrote a Ghidra script to automate the process.
- I reverse engineered the message handling system and found an Easter egg that had gone unnoticed for about 20 years.
Scripting engine
One of the first things I did when I started reversing 3DMM was run strings
on the executable. Running strings
is a good way to triage what an executable does - as strings are the one part of the binary that is human-readable. The 3DMM executable has suspiciously few strings: there are a bunch of strings related to the BRender 3D engine, some messages for errors that may occur while initializing the application, and the names of imported functions. Most of the strings relating to the application’s functionality are stored in the data files instead.
There was a set of strings in the binary that seemed interesting to me: a list of what look like keywords for a programming language. There was also the string “Script Message ('%f’, 0x%x, %d):” - clearly a format string for printing a message from a script.
My first thought was to look for cross-references to these strings. The strings were referenced in a global data structure that looked like a map of opcode numbers to strings. Adjacent to this structure was another one which included the opcode, pointer to the opcode string, the number of parameters, and a flag for if the operation accepted a variable number of arguments.
Curiously, the opcode info structure wasn’t referenced by any other code, which was unfortunate as I had hoped it would lead me to the script interpreter. How was I going to find the script interpreter? I thought about what a script interpreter might look like. I figured there would be a big switch() statement somewhere that would implement all of the different opcodes. So, I sorted all of the functions by size and found a giant function in the SCEG class that included a very large switch statement.
The cases in the switch statement corresponded with the opcodes I had found in the opcode struct. For example, the “Add” instruction (0x100) would call a function that returned a value, call that function again, add the values, then call another function with the result. I recognised that these two functions were pushing and popping values from the script interpreter’s stack. Finding the stack push/pop functions was really important as it helped to identify how operands were being used in each instruction.
I used WinDbg to set breakpoints in these functions to dump out instructions as they were executing. By walking up the stack I found another function which was the interpreter’s main loop: it would fetch an instruction, check some flags on the instruction and then call one of two opcode dispatch functions. One of the opcode dispatch functions only handled eight opcodes: the other would handle 116 opcodes, and if it couldn’t handle the opcode would call another function that handled 57 opcodes. I had found the last function, which appeared to handle opcodes related to math operations (eg. Add, Sub, Mul…), string operations (CopySubStr, NumToStr, NukeStr…) and control flow operations (If, GoEq, GoNe, etc.). I had a look at the larger opcode handler but it was pretty difficult to figure out what the opcodes were without much context. Many of the opcodes mapped directly to virtual functions on objects. I wasn’t even sure what those objects were, let alone what the parameters might be.
Without much more context than that, it was pretty difficult to figure out what any of those opcodes did. I continued to trace instructions with WinDbg to see if any of the parameters had known constants that I could match to other functionality I had reverse engineered. From this I was able to find an opcode which would send a message to another object, and found some opcodes for playing sounds and changing the volume. I didn’t have a clear way forward for reversing the opcodes that matched to virtual functions though. How was I going to reverse all of these opcodes?
There was also the question of why there were eight special opcodes that had their own function for handling them. I figured out that there were two different encodings for the instructions. Scripts were stored in GLSC and GLOP chunks in the application’s data files. Each instruction included the opcode, an operand count, and the operands as DWORD values. However, the eight special opcodes included six extra bytes of data in between the opcode and the operands. It wasn’t clear what this was for.
At this point, I honestly lost a lot of my motivation to continue reversing the scripting engine. I decided to put the scripting engine into the too hard basket and focus on reversing some other parts of the application such as the data file loader, object deserialization and message handlers for well-known graphical objects (eg. the Studio and the Theatre).
Googling the answers
Some time later, I was bored on the internet and started playing around with Google Patents - a global index of patent applications from around the world. I decided to search for “3D Movie Maker” to see if Microsoft had filed any interesting patents for the application. It turns out they had filed a few patents which proved to be very useful for reverse engineering 3DMM.
One patent that does not explicitly mention 3DMM, but was filed by one of the application’s developers, was for a “Method and apparatus for scriping (sic) animation”. I started reading the patent and realised it seemed strangely familiar. The patent describes an animation system in which “Graphical Objects” (GOBs) could receive event messages and execute a script to control the object’s animation.
The patent included keywords that I had found in the binary (eg. “IfGoto”, “SetReturn”) and contained snippets of documentation for new keywords such as the Cell command, which selects the current bitmap for an animation. It included sample code, both as low-level instructions and the high-level C-like language which would have been used by the 3DMM developers to write the scripts.
Another useful inclusion was a diagram of how the instructions are encoded. I was able to match the diagram of the sample bytecodes to the format I had observed in the GLSC and GLOP chunk data. It turns out the six extra bytes were used to store a variable name. The scripting engine included opcodes for getting and setting variables by name for the current script, the object that the script was attached to, another object by ID, and globals. The six bytes didn’t look like anything in particular because the variable name was stored as a 6-bit packed string of up to eight characters in length. The variable name doesn’t need to be unpacked to be used, so the six byte blob is treated as an opaque key. The eight variable opcodes also supports indexes for array support - although this appears to be a hack on top of variable support. If you set an index, the index value gets patched into the six-byte variable key.
There are several other interesting 3DMM patents, including:
- The dissolve effect: when you move between scenes, you have a choice of transitions. The default is a cool dissolve effect.
- Selecting objects in 3D space: including patent drawings of the 3DMM user interface.
- A description of how Z-buffering works
- Serializing an object graph to a file: This patent basically describes the chunky file format used by 3DMM and Creative Writer 2.
I also found a really interesting patent for what appears to be an unreleased Microsoft Kids product where you write programs to control toy robots. In the game present embodiment of the invention, you control the “McZeebot”, a programmable toy car that includes an infrared sensor, a speaker, and collision detection buttons. The patent includes a design for the user interface which features the McZee character found in other Microsoft Kids products including 3DMM, Creative Writer and Fine Artist.
I’m definitely curious to know more about this product. The patent was filed in February 1995 which would likely mean it was in development around the same time as 3DMM. I wonder why it was cancelled?
Disassembling scripts
Finding the scripting engine patent gave me the motivation to continue reversing the script interpreter. I used the patent and my reversing of the binary to write my own disassembler for the custom bytecode format. As well as unpacking the instructions, my disassembler also recovers the original variable names from the scripts by unpacking the 6-bit encoded strings. Here’s an example of one of the disassembled scripts from SHARED.CHK
:
@L_0002: Push $L_0027
Push 0x2
@L_0005: PushLocal[] _parm
Push 0x73, 0x65, 0x74, 0x61, 0x72, 0x63, 0x6f, 0x73
@L_000f: PushThis count
Push 0x8
@L_0012: Select
@L_0013: Eq
@L_0014: GoZ
@L_0015: PushThis count
@L_0017: Inc
@L_0018: PopThis count
Push $L_0024
@L_001b: PushThis count
Push 0x8
@L_001e: Eq
@L_001f: GoZ
Push 0x1e61
Push 0x1e61
@L_0022: Op0x1001
@L_0023: Pop
@L_0024: Push $L_002b
@L_0026: Go
@L_0027: Push 0x0
@L_0029: PopThis count
@L_002b: Push 0x1
@L_002d: Return
The opcodes used in this script are:
- Push: Push value(s) to the stack
- PushThis: Get a variable from the current object and push it to the stack
- PopThis: Pop a value from the stack and set a variable on the current object
- Select: Choose a value from a list using an index. Example:
Select(3, 2, "a", "b", "c")
-> “b” - Eq: Pop a value from the stack, check if equal, push boolean result to the stack
- Go: Pop a new instruction pointer from the stack and jump to it
- GoZ: Pop a value and a new instruction pointer from the stack, jump if the value is zero
- Inc: Pop value from the stack, increment it, push it to the stack
- Return: End the script
This script fetches a parameter from the _parm
variable and checks it against an internal array of values using the Select opcode. If it matches, an object-local counter is incremented. When the counter reaches eight… something happens. At this stage I wasn’t sure what that Opcode 0x1001 was doing. I suspected since SHARED.CHK
has a chunk with number 0x1e61 that this would cause another graphical object to be loaded.
Why is this script interesting? Let’s have a closer look at that array of values that is pushed to the stack for the Select opcode. If you take all of those values and convert them to characters, you get “socrates”. Socrates was the codename for the 3DMM project. This script is the trigger for the second Easter egg in 3DMM. When you open the Talent Book and press a key, this script will check if you have typed the word “socrates”. The 0x1001 opcode tells the engine to instantiate a new graphical object that displays the video of the Microsoft offices where 3DMM was developed.
Roughly translated to Python, the script looks something like this:
CODENAME = "socrates"
class TalentBook():
count = 0
def HandleKeyPress(self, key):
if key == CODENAME[self.count]:
self.count += 1
if self.count == len(CODENAME):
create_graphical_object(0x1e61, 0x1e61)
else:
self.count = 0
return True
Finding the rest of the opcodes
Now that I had a way to disassemble the scripts, I was curious to know what those other opcodes were. My first thought was to check all of the versions of 3DMM and Creative Writer 2 (which uses the same engine) that I could find to see if any of them had any extra debugging strings that might help. It is not unheard of for games to accidentally ship with debug information, so I had hoped that maybe there was a release with some extra debug information - or even a full list of opcodes.
Creative Writer 2 turned out to be pretty interesting as it includes parts of the script debugger. The binary includes the SCDB and SCCB classes which appear to be the debugger and part of the script compiler. There are a number of error message strings that suggest that it can check the syntax of an already assembled script for errors, and code to disassemble a script. There are also some Windows resources in the binary that hint at what the user interface looked like for the script debugger. Unfortunately it looks like a lot of the code is missing, and the code that is there is unreachable, so I guess this was probably some kind of link optimization failure.
I continued to look at other releases of 3DMM, including trial versions, the Nickelodeon 3D Movie Maker (a licensed version of 3DMM with Nickelodeon characters), and different language releases. Even though all of the UI strings are in the chunky files, the executable still contains a few localizable strings so it has to be rebuilt for each language. Generally the code doesn’t change between language releases, but there is one exception: the Japanese release was recompiled with enhanced Unicode support. It turns out that the Japanese version inexplicably contains names for all of the opcodes! This was a great find, not just because I could now make sense of the disassembled scripts but also because many of those opcodes directly mapped to virtual functions on game engine classes! This gave me valuable symbols for many of the virtual functions on the GOB and GOK classes as well as the sound manager and global app class.
With the bytecode format understood, and the full list of opcodes, it is now possible to completely disassemble all of the scripts in 3DMM. The scripts are located in the files STUDIO.CHK
, SHARED.CHK
, BUILDING.CHK
and BLDGHD.CHK
. There are about 950 scripts in total. Many of these scripts are pretty simple (eg. a script might just send a message to another GOB using the EnqueueCid opcode) but some of them are fairly complex. I used my disassembler to calculate some statistics on opcode usage - it turns out that 80 of the opcodes implemented in the scripting engine are never used in 3DMM’s scripts. This is good news if you’re reimplementing the scripting engine as there are now less opcodes to implement. The unused opcodes are still useful though as they provide some additional context when reversing the script engine.
Announcing: Pymaginopolis
I have started writing a new Python 3 package for reverse engineering 3DMM files called Pymaginopolis. The Pymaginopolis package can read and write chunky files from 3DMM and CW2, and disassemble the app’s scripts from the GLSC and GLOP chunks. I have tested the loader and disassembler with multiple versions of 3DMM and CW2.
Pymaginopolis includes command-line tools for disassembling scripts and editing chunky files. Once you have installed Pymaginopolis into your Python environment you can run the disassembler with python -m pymaginopolis.tools.disassembler path-to-chunky-file.chk
.
The Pymaginopolis package isn’t quite finished yet: for example I haven’t yet implemented the compression algorithms used by some of the chunks, but the script chunks are not compressed so this isn’t required. I am also working on an assembler that can be used to write new scripts and patch them into the app. I will push an update with the assembler when I finish my next blog.
Next time: Assembling my own scripts.