RISC OS: Introduction to the ARM AIF object file format


In this post we’ll see some details about the ARM Image File Format (or AIF) useful to new RISC OS software developers when coding with Assembly and/or compiled languages.

Disclaimer

This article is by no means exhaustive to the argument it describes. It should be intended as a simple introduction to the AIF format with the minimal amount of information required to have a general understanding of the format and gain some capacity to debug RISC OS applications in AIF file format.

At the end of the article you can find a reference list with documents containing more details. I did my best to summarise the most important info and make sure they all get tested and verified to the best of my abilities and available time.

Intro

The Arm Image Format (AIF) is a simple object file format used primarily for software intended to run on ARM microprocessors. It was introduced by Acorn Computers Ltd during the early days of ARM for use on the Archimedes, the RiscPC and all the other RISC OS computer ranges. It is still being used on RISC OS computers nowadays, although if RISC OS now supports also ELF object file format via UnixLib suite. AIF supports debug info and so can optionally help with programs debugging activities on RISC OS.

Although if it’s possible to create an AIF manually (the format is extremely simple), generally an AIF file is generated by a linker after we tell the linker to take an aof (ARM Object File) format or a binary image in input and generate the AIF file as output.

AIF structure

An AIF file consists of few parts:

  • A 128-bytes header area
  • A Binary Image area (our executable code)
  • An image’s initialised static data area

An AIF can also be compressed and self decompressing (this to improve loading performance on slow devices), in this case its parts are:

  • The 128-bytes Header 
  • The compressed Image
  • Decompression data  (this data section appears to be position independent)
  • Decompression code (this code appears to be position independent)

An AIF file layout is composed by:

  • Header (more details later on)
  • A read-only area
  • A read-write area
  • Debugging data (this is optional and populated when asking compilers, assemblers and linkers to add such info to the output file)
  • Self-relocation code (position independent)
  • Relocation list (a list of words to relocate terminated by a -1)

Characteristics of the AIF files

The ARM STD Reference Guide reports that there are 3 types of AIF files:

Executable AIF file

  • This type of AIF is the most common (and it’s basically used for most !RunImage files for RISC OS applications) can be loaded at its load address and entered or executed from there
  • When executed it can relocate itself if required.
  • It can create its own zero-initialized area (using the ZeroInit code subroutine, explained later here).
  • The image header contains code that ensures that the image is setup correctly for execution before being executed at its entry-point.
  • The 4th word of an executable AIF header is always: BL entry-point-address
  • The base address of this AIF is where the AIF header is loaded, while the code address is at base_address + 0x80
  • On RISC OS the base address for our Executable AIF header is always 0x8000 unless it relocates itself.

For Beginners: If you are wondering how it is possible that all applications load to the same address (0x8000) without overwriting each-other, this is because RISC OS has been designed from the beginning around the concept of using an MMU (Memory Management Unit) and Virtual Memory Address space (although if such concept is quite rudimental in RISC OS compared to modern Operating Systems). In other words that 0x8000 is actually a virtual address which gets mapped into different physical addresses in memory by the MMU using page allocation tables created by RISC OS itself at startup (it seems to be using a mechanism similar to bank switching if that helps to clarify the concept). This mechanism, however, is available only when using the WIMP (RIS COS Desktop), it is not available when using the old fashioned CLI. So, when using the CLI, only one Absolute file can be executed at any given time. If I have time I’ll add an article about the details of how this mechanism works. 

Non-Executable AIF file

  • This type of AIF needs to be prepared for execution by an image loader.
  • When the image loader has prepared this type of AIF by following the header, the header will be discarded
  • The base address of this type of AIF is the address where it should be loaded

Extended AIF file

  • This type of AIF is a special type of Non-Executable AIF.
  • This type of AIF contains a scatter-loaded image.
  • It has a header that points to a chain of descriptors within the file.

The AIF header

The AIF header, which may also be displayed in some debuggers on RISC OS, has a word (32bit) structure. Having some knowledge of it will help understanding what’s going on when we start a debugging session.

The AIF header is generally composed by:

Word Position Brief Desc
0x00 BL DecompressCode | NOP Jump to decompression code section OR No Operation if the AIF is not compressed.
0x04 BL SelfRelocCode  | NOP Jump to subroutine for self relocation OR No Operation if the image is not self-relocating
0x08 BL ZeroInit  | NOP Jump to ZeroInit code subroutine OR No Operation if the image has none
0x0C BL ImageEntryPoint  | EntryPoint offset Jump to EntryPoint for Executable AIF OR EntryPoint offset for Non-Executable AIF. BL is used to make the header addressable via R14 (ARM32 Link Register) in a position independent to ensure the header is position-independent
0x10 Program Exit Instructions to exit the program as last attempt, in RISC OS this is an OS_Exit SWI
0x14 Image ReadOnly size Size of the ReadOnly section, it includes the size of the Header only in the case the AIF is Executable
0x18 Image ReadWrite size Exact size of the ReadWrite section in multiple of 4 bytes
0x1C Image Debug size Exact size of the Debug section in multiple of 4 bytes. Includes high and low level debug size. Bits 0-3 hold the type, bits 4-31 hold the low-level debug size
0x20 Image ZeroInit area size Exact size of the ZeroInit section in multiple of 4 bytes
0x24 Image Debug type Valid values are 0=No debugging data present,1=Low-level debugging data present,2=Src-Level debugging data present,3=1 and 2 present
0x28 Image base Address where the code was linked
0x2C Work Space this was obsoleted in the ’90s
0x30 Address mode this word contains either 0, 26 or 32 in its last significant byte to indicates if the binary image is linked for 26bit, 32bit or, if it’s 0 then that indicate the binary is in an old 26bit header
0x34 Data base address where the image data was linked
0x38 Two reserved words This is for Extended AIF
0x40 DBGInit | NOP Debug Initialisation Instruction OR No Operation if DBGInit is unused
0x44 ZeroInit code 15 words Header is 32 words long

AIF Header details Table

The Binary Image

The binary image is fundamentally our code and, given that most of the compilers available for RISC OS only support static linking, it also contains all the libraries you may have used during the static linking phase.

The Image EntryPoint may also depend on the runtime library used with our code, for example if we used a compiler that links against the SharedCLibrary then the AIF header EntryPoint will be the initialisation of the SharedCLibrary that, when done, will call our code’s main function.

The Image EntryPoint for a simple Assembly binary code (for example from the ObjAsm) will be the EntryPoint of our code.

Practical Example

The picture here below displays a typical HelloWorld program AIF header (source is ARM ASM) and the Binary Image (Figure generated in the Acorn / ROOL DDT debugger). The beginning of the code is called symbolic disassembly of the run-time system initialisation code.

DDT-HelloWASM-AIFHeader

  • At location 0x8000 (locations are on your left, first window’s column) we find the first NOP (mind that the NOP instruction gets disassembled as MOV r0,r0 on ARM). This generally tells us that the executable AIF above is not compressed.
    Please note: Some very old Assembler may assemble NOP as BL instruction with the NV condition (NV = Never) BLNV. NV condition will force the instruction to never execute. This was the very old recommendation to assemble the NOP instruction. Basically BLNV will never jump hence it’s equivalent to NOP. Given that on old ARMs, MOV without condition flags is fully decoded by the PLA (while condition’s bits are processed after the pseudo microcode is decoded), one could argue that internally MOV r0,r0 is probably a sligthely better way to encode NOP.
  • The second NOP at 0x8000 + 0x04 tells us that this AIF is NOT self-relocating.
  • The BL at 0x8000 + 0x08 tells us this AIF has a ZeroInit section at 0x8040 which starts with the NOP and then, when the ZeroInit has completed at 0x8070, we load the Link Register (lr) back into the Program Counter (pc) which in ARM Assembly is the same as a “return” instruction, so we’ll return back to 0x800c.
  • At 0x800c execution will jump to our Binary Image EntryPoint  which has label main (don’t get confused between this label and the C main function entry point, in this case the main label correspond to the ASM directive ENTRY).
  • At 0x8080 our HelloWorld code starts and does its job.
  • At location 0x8030 we can see that the Address mode is set to 0, this is because when I assembled and linked the ASM source, I did not specify any external library and so no APCS (ARM Procedure Call Standard) was set for this Executable AIF.
  • At location 0x8010 we can see the SWI (Service call) OS_Exit that will return to the OS when the execution of our code will complete.

The DBGInit Instruction

At 0x8040 the Debug Initialisation Instruction is optional and generally this field is left as NOP. However, from the official RISC OS documentation, this field, if used, is expected to be a SWI instruction which should alert a debugger that a debuggable image is starting execution. This however doesn’t seem to be required for debuggers on RISC OS.

The ZeroInit code

This code is generally standard and is added by the Linker when we link our object file generated either by a compiler or an assembler.

It basically is a simplified version of a self-move code and make sure the AIF can be tailored easily to new environments.

Below there is an example with comments extracted from the ARM official docs (for your reference):

        NOP                       ; or <Debug Init Instruction> 
        SUB    ip, lr, pc         ; base+12+[PSR]-(ZeroInit+12+[PSR])
                                  ; = base-ZeroInit
        ADD    ip, pc, ip         ; base-ZeroInit+ZeroInit+16 = base+16
        LDMIB  ip, {r0,r1,r2,r4}  ; various sizes
        SUB    ip, ip, #16        ; image base
        ADD    ip, ip, r0         ; + rO size
        ADD    ip, ip, r1         ; + RW size = base of 0-init area
        MOV    r0, #0
        MOV    r1, #0
        MOV    r2, #0
        MOV    r3, #0
        CMPS   r4, #0
    00  MOVLE  pc, lr             ; nothing left to do
        STMIA  ip!, {r0,r1,r2,r3} ; always zero a multiple of 16 bytes
        SUBS   r4, r4, #16
        B      %B00

Ok, below I wrote a detailed analysis of the ZeroInit code (above) as it would have been applied to the example AIF file in the DDT screenshot (described previously). That will help the reader to understand the ZeroInit code, as well as to see that, even if a linker or compiler could produce a slightly different ZeroInit code, its function stays exactly the same.

1st line is a NOP instruction, remember this is the DBGInit instruction and in this case it’s unused, so NOP.

2nd line the SUB instruction is used to calculate the base address for the Zero Initialised data. The math is relatively simple:

  • The current address in PC (Program Counter, in AArch32 R15, should contain the value 0x8044) is subtracted to the current address in LR (Link Register, in AArch32 R14, which in this case should contain the value 0x800c which is the location with the BL instruction to the Binary Image entry-point address) and the result is placed in IP (Intra Procedure call scratch Register, in AArch32 R12)
  • At the end of this, in the example code, IP should contain 0xFFFFFFc0 (note this is a negative number!)

3rd Line add the value stored in IP to the value stored in PC (PC now has 0x8048, FYI, while IP still has the value calculated above) and put the result in IP (the  result should be 0x8010, which also explains why on the 2nd line we tried to get a negative value for IP). The value in IP clearly shows the base address which correspond to the last location used by our header for the exit instruction, in our case SWI OS_Exit.

For beginners: Given that an AIF is relocatable it is necessary to calculate base-addresses because it may not be the one we’d expect. In this example they are the standard virtual addresses because our AIF did not try to relocate.

On the 4th line we load multiple registers (R0,R1,R2 and R4) with values contained in the memory locations starting with the one pointed by IP + 1 WORD. So, basically, from location 0x8014. To do that we use LDMIB instruction (ASM simple trick, LDMIB LoaD Multiple Increment BEFORE) which will increment the value in IP before using it. This makes sense because, as we have seen before, the value in IP when the 3rd line starts to get executed is the address pointing at the last instruction from the AIF header, so we need to move to the next address after that one to start our zero initialisation.

So, after the above line we have:

  • R0 should contain the value 0x00a0 (which is the size of the ReadOnly area, look at the AIF header details table above, row 0x14, for info)
  • R1 should contain 0x0000 (this is the Image ReadWrite area size, look at the AIF details table above row 0x18 for more info)
  • R2 should contain 0x0298 (this is the Debug are size, look at the AIF details table above for info, row 0x1c).
  • R4 should contain 0x0000 (This is the Image ZeroInit size, have a look at the AIF details table above, row 0x20, for more info), in our case do not consider this value given that the ZeroInit code we are describing is different than the one used by the executable in the DDT screenshot.

On the 5th line we decrement IP of 16 (in some case you may see this value represented as #&10 which is the hexadecimal representation for the decimal 16). IP should contain the base address for the AIF Image (in our case 0x8000, which in RISC OS is the standard virtual base address for all applications executables).

On the 6th line we add the value contained in R0 to the value contained in IP and we store the result in IP. Now IP should contain the AIF image base virtual address + the size of the ReadOnly area, which in our case should be 0x80a0.

On the 7th line we add the value contained in IP with the value contained in R1 and we put the result in IP. Now IP should contain the value of virtual base address of the AIF image + the size of the ReadOnly area + the size of the ReadWrite area, which in our case is still 0x80a0 because our ReadWrite area size was 0. This value is the base virtual address of our ZeroInit area.

From line 8 to line 11 we simply set Registers R0,R1,R2 and R3 values to zero (if you are wondering why also R3, that’s because we want to do a zeroInit that is a multiple of 16 when we’ll do the STMIA at line 14).

At line 12 we check if the value in R4 is zero, if it is then CMPS will set flag Z in the CPSR register to 1 otherwise it’ll be set to 0. Please note: the S condition at the end of CMP is irrelevant in modern ARM ASM, given that CMP always updates CPSR flags (Current Program Status Register), so the example code is old.

At line 13 we execute a MOVLE (LE stays for Less than or Equal, if you’re not familiar with ARM AArch32 ASM it can have conditional bits on each instructions which will determine if an instruction is going to be executed or not at runtime) of LR Register to PC (which basically creates a conditional return instruction that should be read like: if we are done doing the zeroInit lets return to the caller) if R4 is equal or less than 0.

At line 14 we initialise memory addresses from IP pointed one to zero by storing the content of Registers R0,R1,R2,R3 (in a multiple of 16 fashion) and we increment IP value so that at the next round we’ll zero the next 4 locations after the one we initialised now.

At line 15 we subtract 16 from the value contained in R4. Given that we used a SUBS (note the S) the result of the operation will also influence the Flags in CPSR (in this case the S is needed for the MOVLE at line 13 where we’ll jump to on line 16).

At line 16 we jump back to line 13 and we repeat the initialisation process until all the zeroInit area is set to 0 🙂

AIF Header for C developers

The following struct represent an AIF32 header in C, there is a full implemented one in ROOL DDE in C:DDTLib.h.AIFHeader if you want to include it in your own code.

typedef struct {
uint32_t BL_decompress_code;
uint32_t BL_selfreloc_code;
uint32_t BL_zeroinit_code;
uint32_t BL_imageentrypoint;
uint32_t swi_OSExit;
uint32_t size_ro;
uint32_t size_rw;
uint32_t size_debug;
uint32_t size_zeroinit;
uint32_t debug_type;
uint32_t image_base;
uint32_t workspace;
uint32_t reserved[ 4];
uint32_t zeroinitcode[16];
} AIF32HeaderBlock;

Conclusions

From this brief introduction to the AIF format we can clearly see it’s a really simple format, for instance:

  • There is no standardised fields in the AIF format related to security and validation of an exectuable.
  • There is no fields dedicated to store information about the compiler or assembler used to produce the AIF itself
  • It also lack fields for ABI target and/or operating system target or release.
  • Another missing section in the AIF format is the AIF format version used for a specific file
  • It also lacks dynamic linking information (althought if those could be added in the image section)

AIF was mostly designed for RISC OS, however it has been used also on consoles like the 3DO, as well as setop boxes and other devices. So, it should have had some extention over the years, but AFAICT it did not and therefore it should probably be considered obsolete. Luckly RISC OS 5 supports ELF32, which is a much more mature object file format and that also allow Dynamic Linking.

Ok that’s it for now, thanks for reading and I hope you’ve found some useful information here. If you enjoyed this post, please don’t forget to support my blog by:

  • Visiting my on-line hacking and engineering merchandise shop on redbubble.com by clicking here
  • Or you can also make a donation with the PayPal link in the column on your right
  • Or share this article

If you like my articles and want to keep getting informed on new ones you can follow me on on of those 21st Century thingies called FacebookTwitterInstagram or Pinterest

And as always if you have any questions please feel free to use the comments section below.

Thank you! 🙂

If you are interested in programming on RISC OS:

More references:

3 thoughts on “RISC OS: Introduction to the ARM AIF object file format

  1. Pingback: RISC OS: Using the Acorn / ROOL Desktop Debugging Tool DDT (part 1) | Paolo Fabio Zaino's Blog

  2. Amazing article! If you have any notes on position independent code (PIC) and how that is done on older ARM processors e.g. ARM2/ARM3 and how RISC OS handles this it would be a huge service to the community make that available. I’m just trying to figure a few things out with regards to PIC on the Archimedes – do you know of any good resources on this topic by any chance?

    Liked by 1 person

    • Hi and thanks for reading 🙂

      I’ll add more articles on this matter. I usually wait to see if there is interest from the readers, and publish new articles based on people’s feedback (just like yours!).

      “If you have any notes on position independent code (PIC) and how that is done on older ARM processors e.g. ARM2/ARM3 and how RISC OS handles this it would be a huge service to the community make that available.”

      Yes, I do have quite a few old notes on PIC. Let me translate them in English and put them on an article.

      “do you know of any good resources on this topic by any chance?”

      Not sure which sources to suggest, I have been “off the grid” for what concern RISC OS stuff for quite few years in the past, so no idea if there are any good resources around. I have my old books from Acorn and old ARM ARM (up to ARM Architecture v4 should cover quite a bit of info used in RISC OS). So, these days, probably you should have a look on things like eBay?

      You can also post a question on the RISC OS Open forum: https://riscosopen.org/forum
      some people there have worked at Acorn and ARM, so few definitely have good knowledge of the matter 🙂

      There might be some info on Rick Murray blog: https://heyrick.eu/assembler/index.html

      Also, as a side note, I’m working on a RISC OS SharedCLibrary replacement to use it for linking transient executables (they are PIC) written in C language. At this time it’s not possible to use C to write transient executables in RISC OS and, because they are quite important when using RISC OS in CLI mode or as an Embedded OS (without the desktop), I think we really need such library. You can have a look at the RISC OS Community on GitHub for the early sources (please look into the branch develop, the code is not ready yet for release and so it’s only available in develop): https://github.com/RISC-OS-Community/uCLib/tree/develop

      C Stack and recursion are already working, next I need to add command line parameters passing and also improve the stack size extension code together to add extended stack clean up code at the exit. Depending on how much free time I have from work and family things will proceed, sorry if I am being slow, but all RISC OS stuff is done only in my spare time.

      Feel free to help if you feel like it 🙂

      All the best, and please keep posting feedback if you want more material on RISC OS and coding on it, thx!
      – Paolo

      Like

Leave a Reply or Ask a Question

This site uses Akismet to reduce spam. Learn how your comment data is processed.