Mach Write



Apr 15, 2021 The Writings of Dark Lord Jeff Mach. Winston rubbed his presumably-broken toe and moodily contemplated the bottle of something-like-bourbon which sprawled in. Mach was initially hosted as additional code written directly into the existing 4.2BSD kernel, allowing the team to work on the system long before it was complete. Work started with the already functional Accent IPC/port system, and moved on to the other key portions of the OS, tasks and threads and virtual memory. Mach was initially hosted as additional code written directly into the existing 4.2BSD kernel, allowing the team to work on the system long before it was complete. Work started with the already functional Accent IPC/port system, and moved on to the other key portions of the OS, tasks and threads and virtual memory.

  1. Mach 3 Server Writer’s Guide
  2. Mach Write System
  3. March Write The Room

I’ve recently been working a lot with parsing Mach-O files, so I’m begining to understand in a fair bit of detail how they are structured and how they work. I’ve been developing a library, called libhelper, which can parse Mach-O files. Libhelper-macho also powers Img4helper, and HTool.

This is not a complete writeup or documentation covering everything about Mach-O’s, and I appreciate this has probably been covered to death. It’s not aimed at those who already have an advanced knowledge of how Mach or Darwin works, rather it’s aimed at those who are in a position I was a few weeks ago, having limited knowledge of how Mach-O’s are structured. However I felt this would be a useful resource, and a good way to kick-off my Blog.

There are multiple types of Mach-O, such as Executable or KEXT Bundles, so I can’t cover them all. My aim for this post is to discuss the basics - namely Header, Load Commands and Segment Commands. I may discuss other areas in the future but this is a start.

What are Mach-O files

Mach-O files, or Mach Object Files, are an executable format used on Operating Systems based on the Mach Kernel. This includes Apple’s Darwin iOS, macOS, watchOS etc. There are multiple types of Mach-O file, such as executables, object-code, shared and dynamic libraries, kernel extension (KEXT) bundles and even debug companion files.

Mach-O Format

Mach-O files are simply binary files, there isn’t particularly anything special about them in that regard. You can read in some bytes into a C structure and boom, you’ve parsed a Mach-O (or at least part of it). Natively, they can only be run on Mach/Darwin/XNU-based systems, however there are some implementations for loading and executing Mach-O files on Linux. Although you can run simple applications this way, the majority of applications will not work due to reliance on certain macOS libraries, such as /usr/lib/libSystem.B.dylib.

Mach WriteFree

A Mach-O is made up of one Mach header, a number of load commands (specified in the header) and the data. The data is organised into Segments, which are made up of 0 to 255 Sections, and there special load commands to describe them. Mach-O files are organised as follows:

  1. Mach-O Header
  2. Load Commands
  3. Data

The purpose of this article is to discuss, at a higher level, each of these areas of a Mach-O file, how data is organised and how to load this data from a given Mach-O file into relevant C structures.

Header

Starting with the Mach Header. It’s purpose is to describe what the file contains, and how the Kernel and Dynamic Linker should handle it. The first 4 bytes are, like with any file, it’s “Magic Number”. A Magic Number is used to identify a file format. In the case of Mach-O’s there are three Magic Numbers that one may come across. 0xfeedface for 32-bit, 0xfeedfacf for 64-bit and 0xcafebabe for Mach Universal Binaries / Object files.

Other properties of a Mach-O Header include the cpu type and sub type which define the architecture the Mach-O is built for (e.g. arm64, x86_64, arm64_32), the number of Load Commands and the size of that area and flags to be passed to the Dynamic Linker. The layout of the header is shown below:

The Mach-O header takes up 32 bytes for 64-bit files, at 28 bytes for 32-bit files. You can populate the the header structure by memcpy() the correct size into a mach_header structure, and you’ll be able to access the header elements as normal.

Load Commands

Load Commands are placed directly after the Mach-O header in the file. They specify the logical structure of the file and the layout of the file in virtual memory.

All Load Commands have a common 8 byte structure which identifies the type of the command and it’s size. This common structure is defined as follows:

There are over a dozen Load Commands, some are common across all Mach-O’s and some are only found in certain cases. Load Commands placed after the Mach-O header, with the first being Segment Commands. These are discussed further under Segment Commands.

But Segment Commands are not the only commands that are included in the majority of Mach-O files. The LC_DYLD_INFO and LC_LOAD_DYLINKER commands specify information such as rebase, bind, weak, lazy and export information for the Dynamic Linker, and the path of the Dynamic Linker the Kernel should use to execute the binary respectively. Mach-O’s frequently require Dynamic Libraries, especially /usr/lib/libSystem.B.dylib. The LC_DYLIB command defines the path for Linker to find the Dylib, and there can be however many of these commands as are required for the number of Dynamic Libraries.

The offset and sizes for both the symbol table and the string table are defined with LC_SYMTAB, and offsets for local, external, undefined and other types of dynamic symbols are defined with LC_DYSYMTAB

The last command that I will discuss here is LC_MAIN which defines the offset for the entry point, so where the Kernel should start executing the binary from. This is only used for MH_EXECUTE filetypes.

Below is output from an experiemental version of htool showing all of the Load Commands from itself. I’ve ommited some parts because the output is rather long.

Going back to struct load_command. Looking at it from the perspective of trying to parse Mach-O’s having a constant format for the first 8 bytes of each Load Command makes detecting and parsing them easier. The following is an example of how we can parse a command, using LC_MAIN as an example. The code is based off XNU’s loader.h rather than libhelper.

If you are interested in learning more about the different types of Load Commands, you can either checkout EXTERNAL_HEADERS/mach-o/loader.h in the XNU sources, or include/libhelper-macho/macho-command-types.h from Libhelper.

Segment Commands

Going back to Segment Commands, the first couple of Load Commands in a Mach-O are either LC_SEGMENT for 32-bit, or LC_SEGMENT_64 for 64-bit. These define an object files Segments.

If you are unfamiliar with how object files work, you have a number of these segments. The __TEXT segment contains the instructions that will be executed by the CPU, and the __DATA segment contains both static local variables and global variables. These are both standard, however you may find additional segments such as __PAGEZERO and __LINKEDIT, and in XNU Kernelcaches, you’ll get even more funky segment names like __PRELINK_INFO and __LAST.

Segments are further divided into sections, so for example you’ll find __cstring in the __TEXT segment, formatted as __TEXT.__cstring, as a common one.

The Segment Commands in a Mach-O define what regions of the binary data should be mapped into memory as what. So looking at the segment_command_64 struct, there’s the segments name as segname, but then we have two sets of address/sizes.

The vmaddr and vmsize define the virtual memory address and size for this segment And fileoff with filesize for the segments location and size within the file. maxprot and initprot define virtual memory protection for the segment in memory, so this may prevent it from being both writable and executable at the same time. Finally is the flags, which are just a way of giving the Kernel options for loading the segment into memory.

Mach Write

Like I said, we have segments which are divided into sections. These sections are placed directly after the segment command, are included in the cmdsize and are counted with nsects. Again, sections essentially dividing up segments into more meaningful chunks, for example __TEXT.__text or __TEXT.__const.

To load these, we must take the offset of the segment command in the file, add the size of the segment structure, and then loop through nsects times, incrementing the offset by the size of the section struct each time.

To start, the section structure is defined as follows. Again, there are both section_64 and section structures, with the difference being the 64-bit section_64 struct uses uint64_t for both addr and size, and has a third reserved property at the end of the structure although it is not designated for any optional properties:

As I just stated, we can load the correct data into that structure by adding sizeof (segment_command_64) to the offset of the command in the file, then add sizeof(section_64) for each of segment->nsects. Here is an example of what I mean (note this time I am using libhelper code to demonstrate):

The mach_segment_info_t struct is not implemented in XNU’s standard loader.h, so if you’re writing your own Mach-O parser, please ignore references to Libhelper structs.

Looking at this function in more detail. Two arguments are passed to mach_segment_info_load, an unsigned char *data pointer to the Mach-O loaded in memory, and an uint32_t offset which points to the start of the segment command within that data pointer. This offset is relative to the start of the Mach-O, not the start of the load commands.

Ignoring the code that checks and sets up the mach_segment_command_t, it starts by calculating the offset of the first section. This is done by adding the offset passed to the function to the sizeof() the segment command structure.

The segment command has nsects containing the amount of sections placed after the command. So, we loop round the number of sections from segment->nsects and create mach_section_64_t’s for each one. We can use memcpy() to to copy the ssize amount of bytes we need. We can set the start point for the copying by adding the offset to the data pointer. By doing this, we are incrementing the pointer by the offset, resulting in it pointing to, in this case, the start of the current section struct.

Calling h_slist_append() can be ignored. This is simply adding the section to a Statically-linked list in a libhelper macho_t structure.

The last bit of interest here, make sure to increment sectoff by the size of the mach_section_64_t struct, so sectoff will point to the next section structure.

If you are interested, please take a look at libhelper. It has a Mach-O parser that I wrote, and you’ll find the example above.

Data

The actual data, so that is instructions and variables, in a Mach-O are stored after the Load Commands region. Depending on the type of Mach-O, the way this region is used varies.

So, for example. An executable - meaning a Mach-O with the filetype of MH_EXECUTE - would have the segment commands laying out the data region, and a LC_MAIN command specifying the offset of the entry point instruction the Kernel should jump too when loading. The Kernel will also start the Dynamic Linker specified in the LC_DYLD_INFO command, and link any specified dylib’s with LC_LOAD_DYLIB.

This entire region is mapped out by the segment commands. We can inspect this mapping with Mash, or Mach-O Shell, which is part of HTool. Loading the file, we can inspect a particular segment like so.

To print a segment, we use p seg __TEXT. This is the short version, if you prefer print segment __TEXT would also work fine. The first line of the output display’s the start and end addresses of the __TEXT segment, and it’s total size in bytes.

Underneath, slightly indented, are each of the sections contained within the segment. For example, we can see that the __TEXT.__stubs section is 390 bytes, and is located from 0x10000f4f0 to 0x10000f676.

Two things to note about these addresses, first they are the virtual memory addresses, and second they are relative to the start of the data, not the start of the Mach-O. Before this __TEXT segment is a __PAGEZERO segment ranging from 0x000000000 to 0x100000000.

Summary

This is only an introduction to Mach-O files. I’d like to continue writing about them and maybe even write a Mach-O loader for Linux.

I hope I covered this fairly well, any feedback would be greatly appreciated. I aim to write these blog posts more often and hopefully they’ll improve over time - both in quality and technical accuracy. For now, you can download Img4helper which you can use to extract Apple Image4 files from the Downloads page linked above, Libhelper sources are available here if you’d like to look at my Mach-O parser, and htool will be available soon.

You can contact me either via Twitter (@h3adsh0tzz), Email (me@h3adsh0tzz.com), my iOS Security Discord server (https://discord.gg/CfNnCs8) or on irc.cracksby.kim :-).

Ever have writer’s block? Try using this simple process to breakdown your writing into the right process steps. It’s called madman, architect, carpenter, and judge.

The idea is simple, but instructive. There are 4 different writing personas tugging at you when you write. The madman is coming up with great ideas all the time which might not be related to anything. The architect is providing structure to the writing; moving paragraphs around and looking at the story-line. The carpenter is crafting the sentences, phrases, and word choice. The judge is deleting unnecessary parts.

System

Know where you are. Each action is valid and valuable, but it’s important to know where in the writing process you are at the moment. If you have 3 days to put together a client proposal, you want to make sure you meet your deadlines. Get through the madman brainstorming. Agree on the architecture of the proposal including scope, structure, format, and tone. After all, you don’t want to get 3/4 the way through the writing and have people bring in new (last minute, unrelated) ideas.

By getting those 2 stages out of the way, you give yourself more time to do the carpentry of writing. Putting together succinct, impactful, and clear thinking on paper. Naturally, you also want to leave enough time for judging by other partners, subject-matter experts and other in the approval chain (e.g., legal, finance).

HBR podcast with Bryan Garner here, author of HBR’s Guide to Better Business Writing and he describes these writing steps. Start audio at the 5 min 30 seconds remaining, if you don’t want to listen to the whole thing.

You can apply this to your PowerPoint:

Madman

  • What is your point of view? Are you just going to bore the client with the same recycled marketing brochure-ware? How will you “wow” the client?
  • How can you incorporate your previous work, graphics, and insights?

Architect

  • Bucket your thoughts into logical groupings; don’t jump around from topic to topic
  • Ask yourself, could someone understand each slide without a voiceover?
  • Use impactful titles. It’s the most valuable part of the page
  • Each page should make 1 point (no more than 2 points)

Mach 3 Server Writer’s Guide

Carpenter

  • Put your most important point first
  • Use parallel structure (e.g., all bullets start with verbs)
  • Graphics and words should support each other
  • Use graphics when possible, not text
  • Use parallel structure between slides to keep the flow of the powerpoint
  • Do not use clip art or photos. Seriously, don’t.

Mach Write System

Judge

March Write The Room

  • Repetitive words should be eliminated
  • If it’s not obvious what the point of the graph is, call it out with a text box
  • Leave whitespace on the page; don’t feel compelled to clutter the page
  • If you can combine pages, do it.

Related Posts:

Consulting is an apprenticeshipNew Darden dean is ex-McKinsey