Simulating the Apple MessagePad

The NewtonOS was originally written in C++ with large parts hand-optimized in ARM assembler. Not surprisingly, Apple does not give us access to the original C++ source code. So in order to keep NewtonOS alive on new hardware, the OS should be transcoded for new platforms.

Ideally, we take a look at the ARM assembler code through a smart disassembler, and rewrite C++ code based on what we see. This has actually been done for huge parts of the OS, but it has two drawbacks: the code base is quite big with around 1 million assembler commands, and bugs will creep their way into the code.

Instead, I decided to transcode from ARM into a verbatim C translation, which is then recompiled and optimized by the target platform C compiler. It is still possible to create wrappers to generate the original C++ API or hand-optimize the code, but todays optimizing compilers do an amazing job.

This process is known as "software retargeting".


in-depth information links


What is Software Retargeting?

Software retargeting is a way to make software for outdated or defective hardware run on newer hardware.

Retargeting is a mix of reverse engineering, emulating, simulating, decompiling, and recompiling. A retargeting compiler translates a binary or assembler code file into a language that is understood by the target machine. Not all code must be understood, but system calls and hardware access must be translated manually. In the case of NewtonOS, we also need to translate the lowest OS layer which can not be directly translated into C code.

The result is a source code tree that is mostly created by the translator, with many details being hand coded. This tree can then be compiled on the destination platform, creating a program that is functionally identical to the original (but not in timing).


How is that different to emulation?

An emulator provides an environment that is as close to the original hardware as possible. For complex environents, a complex emulator is needed, requiring a much higher performance of the target machine (a few hundred to a few thousand times faster) to achieve the same final speed.

Einstein, the NewtonOS emulator, has a Just-In-Time compiler that emulates all ARM code to the last bit, even if the result of operations is not needed later in the code. It also emulates every access to memory through the MMU, including access faults. It does a fantastic job, but much work could be optimized away when retranslating.


How is retargeting different from simulation?

A simulator provides the same interfaces to the user, but the internal code must be written from scratch. Probably the best known simulator is "wine" which provides much of the MSWindows API, but running on Unix style platforms.

The advantage is, that all new code is now free from the rights of the original author. However, to acheive that, the original API and code must be understood, and must be rewritten. This will generate bugs and incomplete implementations, making the resulting software expensive and of limited compatibility (but possibly much better and faster than the original).


What are the pros and cons of retargeting then?

When retargeting, we let our computers do most of the work. About 90% of the code can be translated into C code. The rest must be hand coded, or hand optimized. The trick is, that we don't need to understand everything, and the translator will let us know when it encounters parts that require our intervention. The resulting code is many times faster than the emulated code, but by a factor of 2-10 slower than hand-written routines.

There is one disadvantage: the part of the code that was created automatically by the translator is still at least partially under the license of the original author. To run the retargeted code, proof of a license is required, preferably using the same means as the original software.


How does this work for NewtonOS?

Retargeting an OS to run in user mode is no small task. The hardware configuration of the original machine must be very well understood, and the use of hardware registers, timers, MMU, Copros etc. must be translated.

Three wonderful things will help us with retargeting the NewtonOS:

  • there is basically no native code outside the ROM. This is fantastic news, because when the OS is retargeted, all user programs that were written in NewtonScript (probably 99.9%) will run on the new platform without further need for translation.
  • there is a perfectly working emulator, Einstein, that already contains all the code that is needed to emulate the hardware and MMU. Einstein will play a huge role in this tranlation, keeping NewtonOS in a hybrid state that executes correctly at all times during the translation.
  • and for those parts that are still in the fog, Apple was so kind to leave much of the symbolic information of the ROM in the C developer files.


So what's the hold up?

This retargeting work will be done in steps.

Currently (July 2014) I am transcoding chunks of original code, searching for those areas that will need human intervetion. This is done by infusing transcoded chunks into the Einstein emulator, replacing the ROM code piecewise with native code. There are a few things that can not be solved easily and need hand-coding:

  • MMU exceptions
  • Task switching (cooperative multitasking)
  • Timer interrupts, external interrupts
  • virtual functions and other jump tables

Once the emulator executes only native code, this code can be moved out of the virtual memory environment into user mode memory. This is the point where we leave the emulated environment alltogether and transform everything into native code, finally gaining the speed and efficinecy that we need.

At this point, the code also becomes readable and debuggable. This is when we can modernize and add network capabilities and other features.


A hybrid, we are

As of July 27th 2014, the retargeting has reached an interesting hybrid state. Although I have not yet solved the DataAbort handling, I can translate much of the code into JIT-style native code. Should the simulated code trigger a DataAbort, a C++ execption is thrown in Einstein, which will throw us right back into the emulator, continuing as if the simulated code was just one huge emulated instruction.

This trick allows me to translate many functions now. There are two limitations however:

  • simulated code can not call emulated code. Whenever we translate a function, all functions that may be called must also be translated. For high-level functions, this can generate quite a dependency tree.
  • simulated code must stay close to the original. If a DataAbort is triggered, all registers are expected to have the same content as if we were in the emulator, so no hand-optimizing at this point (it would be much too early anyway).