Skip to content

Instantly share code, notes, and snippets.

@lijunle
Last active June 16, 2026 06:52
Show Gist options
  • Select an option

  • Save lijunle/b458c1f91c32066466057bc80f6d80fc to your computer and use it in GitHub Desktop.

Select an option

Save lijunle/b458c1f91c32066466057bc80f6d80fc to your computer and use it in GitHub Desktop.
OneDrive on macOS: why 'Always Keep on This Device' doubles disk usage (pinned=2 copies, cached=1) — File Provider / OneDrive.noindex investigation
#!/bin/bash
#
# onedrive-cache-report.sh — report OneDrive (macOS File Provider) local file states
#
# USAGE
# onedrive-cache-report.sh [PATH...] report counts/sizes of ONLINE-ONLY / CACHED / CACHED+PINNED
# onedrive-cache-report.sh -h|--help this help
#
# PATH defaults to the current directory (recursive). Files or folders are accepted;
# every PATH must be inside ~/Library/CloudStorage/OneDrive-*.
#
# STATES (verified on the macOS File Provider OneDrive client)
# ONLINE-ONLY : st_flags has SF_DATALESS (0x40000000) -> 0 local copies
# (the same "not materialized" bit Finder reads to draw the cloud badge)
# CACHED : not dataless, noindex copy blocks == 0 -> 1 local copy
# CACHED+PINNED: not dataless, noindex copy blocks > 0 -> 2 local copies
# ("Always Keep on This Device")
#
# WHY READ-ONLY (no download / evict / pin operations)
# This tool only reports. The write operations all have a better or only home in Finder:
# download (online-only -> cached) : click the ☁ icon, or right-click -> Download.
# free space (cached/pinned -> online-only) : right-click -> Free Up Space.
# (Programmatic NSFileProviderManager eviction is entitlement-gated: a plain CLI
# gets "the application cannot be used right now".)
# pin (Always Keep on This Device) : right-click. There is no public API for it, and
# OneDrive ignores the com.apple.fileprovider.pinned xattr anyway.
# Reporting is the one thing Finder can't do: it never shows per-folder totals, or how
# much disk the CACHED+PINNED double-copies are wasting. That's this script's job.
#
# SAFETY: read-only. Metadata only (find/stat/flags) — never reads file contents, so it
# never triggers a download.
set -u
# ----------------------------------------------------------------------------- locate trees
CS_ROOT="$(/bin/ls -d "$HOME/Library/CloudStorage/"OneDrive-* 2>/dev/null | head -1 || true)"
GC_ROOT="$HOME/Library/Group Containers/UBF8T346G9.OneDriveSyncClientSuite/OneDrive.noindex/OneDrive"
if [ -z "$CS_ROOT" ]; then
echo "ERROR: no ~/Library/CloudStorage/OneDrive-* folder found." >&2
exit 1
fi
# ----------------------------------------------------------------------------- helpers
hsize() {
awk -v b="$1" 'BEGIN{
split("B KB MB GB TB", u, " "); i=1;
while (b>=1024 && i<5){ b/=1024; i++ }
printf "%.1f %s", b, u[i]
}'
}
# terminal width (falls back to 80 when not a tty)
term_cols() {
local c; c="$(tput cols 2>/dev/null || true)"
case "$c" in ''|*[!0-9]*) echo 80 ;; *) echo "$c" ;; esac
}
# fit_tail TEXT WIDTH -> TEXT truncated from the LEFT (filename stays visible),
# prefixed with a leading … when shortened, kept within WIDTH *display columns*.
# Display width is computed by decoding UTF-8 bytes (LC_ALL=C): a 3- or 4-byte
# char (CJK, fullwidth, emoji) counts as 2 columns, otherwise 1. This never
# under-counts a real wide char, so the line never wraps even with Chinese names.
fit_tail() {
LC_ALL=C awk '
BEGIN{
for (j=0; j<256; j++) ord[sprintf("%c", j)] = j
s = ARGV[1]; w = ARGV[2] + 0; if (w < 6) w = 6
n = length(s); i = 1; m = 0
while (i <= n) {
b = ord[substr(s, i, 1)]
if (b < 128) { len = 1; wd = 1 }
else if (b < 224) { len = 2; wd = 1 }
else if (b < 240) { len = 3; wd = 2 }
else { len = 4; wd = 2 }
m++; st[m] = i; wsum[m] = wd; i += len
}
tot = 0; for (k = 1; k <= m; k++) tot += wsum[k]
if (tot <= w) { printf "%s", s; exit }
budget = w - 1; acc = 0; startk = m + 1
for (k = m; k >= 1; k--) { if (acc + wsum[k] > budget) break; acc += wsum[k]; startk = k }
printf "…%s", substr(s, st[startk])
}' "$1" "$2"
}
# progress LABEL TEXT -> draws "<LABEL><fitted TEXT>" on one line, no wrap, on stderr
progress() {
local label="$1" text="$2" cols
cols="${PROGRESS_COLS:-$(term_cols)}"
printf '\r\033[2K%s%s' "$label" "$(fit_tail "$text" "$(( cols - ${#label} - 1 ))")" >&2
}
usage() { awk 'NR==1{next} /^#/{sub(/^# ?/,""); print; next} {exit}' "$0"; }
# scan_records ROOT... -> emits "STATE<TAB>size<TAB>path" per file; progress on stderr.
# STATE is O (online-only), C (cached), or P (cached+pinned). Online-only is detected via
# the SF_DATALESS st_flags bit (what Finder uses); cached vs pinned needs the noindex
# block cross-check. Metadata only — never reads file contents, so it never downloads.
scan_records() {
{
for root in "$@"; do
if [ -f "$root" ]; then
printf '%s\0' "$root"
elif [ -d "$root" ]; then
find "$root" -type f ! -name '.DS_Store' ! -name '.localized' -print0 2>/dev/null
fi
done
} | {
i=0
while IFS= read -r -d '' f; do
i=$((i+1))
if [ $((i % 25)) -eq 0 ]; then
progress "scanned $i files | " "${f#$CS_ROOT/}"
fi
set -- $(stat -f '%f %b %z' "$f" 2>/dev/null) || continue
fl="${1:-0}"; sz="${3:-0}"
if (( fl & 0x40000000 )); then # SF_DATALESS -> not on disk
printf 'O\t%s\t%s\n' "$sz" "$f"
else
rel="${f#$CS_ROOT/}"
gcb="$(stat -f '%b' "$GC_ROOT/$rel" 2>/dev/null || echo 0)"
if [ "${gcb:-0}" -gt 0 ]; then
printf 'P\t%s\t%s\n' "$sz" "$f"
else
printf 'C\t%s\t%s\n' "$sz" "$f"
fi
fi
done
printf '\r\033[2K' >&2
}
}
# ----------------------------------------------------------------------------- arg parsing
TARGETS=()
while [ $# -gt 0 ]; do
case "$1" in
-h|--help) usage; exit 0 ;;
-*) echo "ERROR: unknown flag: $1" >&2; exit 2 ;;
*) TARGETS+=("$1") ;;
esac
shift
done
[ ${#TARGETS[@]} -eq 0 ] && TARGETS=("$PWD")
# validate every target is inside the OneDrive CloudStorage tree
for t in "${TARGETS[@]}"; do
abs="$(cd "$(dirname "$t")" 2>/dev/null && printf '%s/%s' "$(pwd)" "$(basename "$t")")" || abs="$t"
case "$abs/" in
"$CS_ROOT"/*|"$CS_ROOT/") : ;;
*)
echo "ERROR: target is not inside the OneDrive CloudStorage tree:" >&2
echo " $t" >&2
echo " root: $CS_ROOT" >&2
exit 1 ;;
esac
done
# ----------------------------------------------------------------------------- scan once
PROGRESS_COLS="$(term_cols)" # compute terminal width once; reused by progress()
echo "Target(s):"
for t in "${TARGETS[@]}"; do echo " $t"; done
echo "(recursive, read-only scan — metadata only, never downloads file contents)"
echo
RECORDS="$(mktemp)"
trap 'rm -f "$RECORDS"' EXIT
scan_records "${TARGETS[@]}" > "$RECORDS"
# ----------------------------------------------------------------------------- report
cmd_report() {
read -r ON_N ON_SZ CA_N CA_SZ PI_N PI_SZ < <(
awk -F'\t' '
$1=="O"{on_n++; on_sz+=$2}
$1=="C"{ca_n++; ca_sz+=$2}
$1=="P"{pi_n++; pi_sz+=$2}
END{ printf "%d %.0f %d %.0f %d %.0f\n",
on_n+0,on_sz+0, ca_n+0,ca_sz+0, pi_n+0,pi_sz+0 }
' "$RECORDS"
)
: "${ON_N:=0}" "${ON_SZ:=0}" "${CA_N:=0}" "${CA_SZ:=0}" "${PI_N:=0}" "${PI_SZ:=0}"
printf "%-15s %10s %12s\n" "" "count" "size"
printf "%-15s %10s %12s\n" "ONLINE-ONLY" "$ON_N" "$(hsize "$ON_SZ")"
printf "%-15s %10s %12s\n" "CACHED" "$CA_N" "$(hsize "$CA_SZ")"
printf "%-15s %10s %12s\n" "CACHED+PINNED" "$PI_N" "$(hsize "$PI_SZ")"
echo
echo "ONLINE-ONLY -> not on disk; click the ☁ in Finder to download (1 copy)."
echo "CACHED -> 1 copy; Finder 'Free Up Space' to evict (back to online-only)."
echo "CACHED+PINNED -> 2 copies; un-pin in Finder, then 'Free Up Space'."
}
# ----------------------------------------------------------------------------- run
cmd_report

OneDrive on macOS: how "Always Keep on This Device" silently doubles disk usage

A practical, evidence-based investigation into where the modern (File Provider–based) OneDrive macOS client physically stores data, why pinned files end up taking two copies of disk space while merely cached files take only one, how macOS itself tracks these states, and why there is no supported way to get the best of both (single copy + never evicted) on OneDrive.

Tested on macOS 26 (Tahoe-era), OneDrive client v26.x, APFS. Findings verified with stat block counts, BSD file flags, extended attributes, the FileProvider SDK headers, and fileproviderctl evaluate. A companion read-only reporting tool, onedrive-cache-report.sh, is included in this gist.


TL;DR

  • The OneDrive client stores data in two places (a user-facing mount and a private sync store). A file's footprint depends on its state: online-only = 0 copies, cached = 1 copy, pinned = 2 copies.
  • OneDrive's "Always Keep on This Device" writes a real second physical copy. This is OneDrive's own implementation of "always keep" — not the same as iCloud's "keep downloaded", which keeps only a single copy (see below). That second copy is the root of the "OneDrive uses twice the space" complaints.
  • For an individual file, macOS/Finder reads its online-vs-downloaded state from a kernel BSD flag (SF_DATALESS), not from a network check — which is why the file's cloud badge flips instantly. (A folder's badge is different: folders never carry that flag and instead aggregate their children — see below.)
  • Apple does have a real File Provider API for "keep downloaded" (this is what iCloud uses), but it is read-only to everyone except the owning provider: there is no public (or private) API, and no working xattr trick, to force-pin another provider's file. OneDrive did not implement an equivalent single-copy "keep downloaded" — so today you can't pin a OneDrive file without the second copy. The likely real fix is a third-party OneDrive client that keeps one copy.
  • OneDrive doesn't use Apple's standard pin mechanism at all — it rolls its own (custom Finder actions + the second copy). So unlike iCloud, pinning costs a second copy, and there is no supported fix from Microsoft.

The two storage locations

Location Role
~/Library/CloudStorage/OneDrive-<Account>/ The user-facing mount (what you browse in Finder)
~/Library/Group Containers/UBF8T346G9.OneDriveSyncClientSuite/OneDrive.noindex/OneDrive/ The sync client's private store

(UBF8T346G9 is Microsoft's Apple Developer Team ID — it is the same string on every Mac, not personal data.)

How much disk each file actually consumes depends on its state:

Finder state Icon CloudStorage blocks OneDrive.noindex blocks Physical copies Auto-evictable?
Online-only ☁️ cloud 0 0 0 n/a
Cached (downloaded on access) (no badge) full 0 1 ✅ yes
Pinned ("Always Keep on This Device") ✅ green check full full 2 ❌ no

Pinned = two physical copies. Cached = one.


Why du lies here

du walks each path independently and counts the same logical tree twice, so it always makes it look like there are two full copies — even for cached files that physically exist only once. Don't trust du across these two folders.

Use one of these instead:

  • Per-file physical blocks: stat -f "%z bytes %b blocks inode=%i" <file> (a dataless / online-only file reports blocks=0).
  • Whole-volume truth: df -m /System/Volumes/Data before vs. after an action.

A file that has not been downloaded shows up like this:

-rwx------@ ... compressed,dataless 1004491141  somevideo.mp4
                ^^^^^^^^^^^^^^^^^^^^
                BSD flags: not on disk yet (blocks=0)

The experiments

1. Cached file → only ONE physical copy

Pick an online-only file (blocks=0, flagged dataless), record free space, download it by reading it, then re-measure.

# before
$ stat -f "size=%z blocks=%b" video.mp4
size=1004491141 blocks=0          # ~958 MB, not on disk

$ df -m /System/Volumes/Data       # Used = 613187 MB

# force download
$ cat video.mp4 > /dev/null

# after
$ stat -f "size=%z blocks=%b" video.mp4
size=1004491141 blocks=1961904     # now materialized (~958 MB)

$ df -m /System/Volumes/Data        # Used = 614065 MB

Delta ≈ 878 MB ≈ 1× file size. A second copy would have cost ~1916 MB (2×). Crucially, the same file inside OneDrive.noindex still reports blocks=0 — it's just a placeholder. So a cached file is stored once.

2. Pinned file → TWO physical copies

Compare block counts for files that were pinned via "Always Keep on This Device", in both locations:

file                          CloudStorage_blocks   noindex_blocks
projects/.../README.md                 8                   8
skills/.../SKILL.md                   48                  48
skills/.../image_gen.py               72                  72
...

Both locations report the same non-zero block count for every pinned file. That is a real, independent second copy on disk (different inodes, real blocks).


How macOS knows a file's state (and why the badge flips instantly)

Finder does not poll the network or scan file contents to draw a file's ☁️ badge. For an individual file the state lives in a kernel-level BSD file flag, maintained by the File Provider system and visible via stat:

State st_flags (stat -f '%f') Decoded
Online-only 0x40000060 SF_DATALESS + UF_COMPRESSED + UF_TRACKED
Cached / pinned 0x40 UF_TRACKED only

SF_DATALESS (0x40000000, "file is dataless object" in <sys/stat.h>) is the bit that means "placeholder, not materialized". It is read-only (the kernel/File Provider owns it) and flips the instant a file is downloaded or evicted, which is why the Finder badge updates immediately.

Note: st_flags cannot tell cached from pinned — both are just 0x40. The only way to detect the second copy is to cross-check block counts in the OneDrive.noindex store (which is exactly what the companion script does).


How macOS knows a folder's state (the folder cloud badge)

Folders are different from files. A folder's own st_flags never carries SF_DATALESS — it is always 0x00000000, even when every file inside it is online-only. So Finder's folder-level ☁ badge is not a folder flag; it is an aggregate of the descendants' download state.

The authoritative system signal is exposed by fileproviderctl evaluate <folder>:

Folder contents isDownloaded isRecursivelyDownloaded Finder ☁ on folder
All children cached/pinned 1 1 none
All children online-only 1 0
Mixed (even one online-only child) 1 0

(isDownloaded = 1 for all three just means the folder's listing is present; it says nothing about contents. The discriminator is isRecursivelyDownloaded.)

Verified by a controlled test: starting from an all-online folder (isRecursivelyDownloaded = 0), reading one small file materialized exactly that one file (the folder became 66 online / 1 cached) and the folder stayed isRecursivelyDownloaded = 0. So:

A folder with no cloud badge ⟺ isRecursivelyDownloaded = 1 ⟺ every descendant is already downloaded. A single online-only child keeps the badge lit.

(Aside: downloading is per file, not per folder — reading one file does not pull its siblings. If you ever see a whole folder materialize at once, that's something else enumerating/previewing it, e.g. Finder/Spotlight/QuickLook.)

Why this still doesn't let a reporting tool skip the walk: this signal only separates online-only from downloaded. It cannot distinguish CACHED from CACHED+PINNED — both are "downloaded". Telling those apart (the whole point of a disk-usage report) still requires descending to each file and checking for the second copy in OneDrive.noindex. So a per-file walk is unavoidable for reporting.


The "pin" mechanism: Apple's standard vs. OneDrive's custom

This is the key new finding, and it explains everything above.

Apple does have a real, documented pin API — but it's read-only to you

In the modern File Provider framework, "keep downloaded" is expressed through NSFileProviderItem:

// FileProvider.framework/Headers/NSFileProviderItem.h
@property (nonatomic, readonly) NSFileProviderContentPolicy contentPolicy;
//                   ^^^^^^^^                       ^^^^^^^^^^^^^^^^^^^^^^
// NSFileProviderContentPolicyDownloadEagerlyAndKeepDownloaded
//   = "Download eagerly… Prevent eviction on low disk pressure"

Two things matter:

  1. The property is readonly and is declared by the provider extension, not by you. The comment in the header literally calls it a "declarative API" — the provider declares the policy; the system and any third party can only read it.
  2. The older equivalent, NSFileProviderItem.isPinned (macOS 11–13), was also read-only and is now deprecated in favor of contentPolicy.

Consequence: there is no public API for one app to set the pin / keep- downloaded state of another provider's file. Apple's developer forums confirm the same conclusion repeatedly — a third-party app can only observe the user's pin action (via Finder/Files), never set it for someone else's items. This is intentional.

The com.apple.fileprovider.pinned xattr is an internal detail, not an API

Providers that delegate pinning to the system (iCloud Drive) get their pin state persisted by macOS as an extended attribute:

$ xattr -p com.apple.fileprovider.pinned <icloud-pinned-dir>
1
  • It's set at the directory level and propagates to children (the #P flag in xattr -l output), with no second physical copy.
  • It is undocumented, system-managed, and not a supported write target. Apple's guidance is explicit: interact via the FileProvider API, do not read/write this xattr yourself.

OneDrive bypasses the standard mechanism entirely

OneDrive does not use the system pin mechanism at all:

  • A OneDrive pinned file has no com.apple.fileprovider.pinned xattr.
  • fileproviderctl evaluate <path> on a OneDrive pinned file still reports isKeepDownloaded = 0 — i.e. as far as the standard mechanism is concerned, it isn't pinned at all.
  • Instead, OneDrive ships custom Finder actions (com.microsoft.OneDrive.FileProviderActions.MarkPinned / MarkUnpinned) and implements "keep on device" by physically writing the second copy into its OneDrive.noindex store.

That's why OneDrive's pin badge is a separate icon next to iCloud's, and why its pin costs a second copy while iCloud's does not — they are different implementations, and the double-copy is OneDrive's own design choice, not a limitation forced by Apple's API.

Can you force-pin a cached OneDrive file without the second copy?

Short answer: no. We tried the obvious hack — manually writing the xattr on a cached OneDrive file:

$ xattr -w com.apple.fileprovider.pinned 1 <onedrive-cached-file>   # write "succeeds"
$ fileproviderctl evaluate <onedrive-cached-file>
… isKeepDownloaded = 0     # ← ignored. OneDrive doesn't read this.

The write is accepted by the filesystem but completely ignored by OneDrive, because OneDrive's extension — not the system — is the source of truth for its files. Combined with the fact that programmatic eviction is entitlement-gated (NSFileProviderManager domain enumeration fails with "the application cannot be used right now" from any unsigned CLI), there is no supported, working way to get "single copy + never evicted" on OneDrive. You are limited to:

  • cached → 1 copy, but macOS may auto-evict it under disk pressure, or
  • pinned → never evicted, but 2 copies.

The only true fix would be a third-party OneDrive client that keeps a single local copy — a much larger undertaking.


What you can actually do

There is no official Microsoft fix as of mid-2026; this is inherent to how the client implements pinning. Practical options:

  1. Don't pin. Leave files online-only and let them become cached on first use. Cached files are single-copy and auto-evicted under pressure. This is the lowest-footprint option.
  2. Pin selectively. Only pin small, truly must-be-offline folders. Avoid pinning large media trees.
  3. Reclaim space from already-pinned data. Right-click a pinned folder → Free Up Space. This drops the second copy in OneDrive.noindex and returns the items to online-only. (Programmatic eviction from a CLI is entitlement-blocked, so Finder is the supported path.)
  4. Alternative clients (rclone, or a custom File Provider client) can keep a single local copy, but each has trade-offs (weaker Finder integration; rclone bisync is still prone to "must run --resync" errors in 2025–2026, so it's risky for unattended two-way sync).

Gotcha: deleting the folders to reclaim space

After unlinking/uninstalling, deleting these large folders may not immediately free space, because APFS local Time Machine snapshots still pin the deleted bytes. Thin them out:

sudo tmutil thinlocalsnapshots /System/Volumes/Data 999999999999 4

(Quote the trailing 4 in zsh: ... 999999999999 "4".) Then re-check df -h /System/Volumes/Data.


Companion tool: onedrive-cache-report.sh

This gist includes a small, dependency-free, read-only bash tool:

onedrive-cache-report.sh [PATH...]   # count/size of ONLINE-ONLY / CACHED / CACHED+PINNED
                                      # PATH defaults to the current dir (recursive)
  • It is purely metadata (find + stat + the SF_DATALESS BSD flag + a block-count cross-check against OneDrive.noindex). It never reads file contents, so it never triggers a download. It's the only reliable way to see, per folder, how much is online-only vs. cached vs. double-stored — something Finder never shows you.
  • It deliberately has no download / evict / pin operations. Those all live in Finder (click the ☁ to download; right-click → Free Up Space; right-click → Always Keep on This Device), and as shown above, eviction/pinning have no working or public CLI path anyway.

Quick reference: tell a file's real state

# online-only?  -> flags include "dataless", blocks = 0
stat -f "%N  blocks=%b  flags=%f" file
ls -lO file        # look for "dataless" in the flags column

# is a pinned file double-stored? compare blocks in both trees:
CS="$HOME/Library/CloudStorage/OneDrive-<Account>"
GC="$HOME/Library/Group Containers/UBF8T346G9.OneDriveSyncClientSuite/OneDrive.noindex/OneDrive"
stat -f "%b" "$CS/rel/path"   # CloudStorage copy
stat -f "%b" "$GC/rel/path"   # second copy if both > 0

# what does the system think about download/keep state?
fileproviderctl evaluate "$CS/rel/path"   # look at isDownloaded / isKeepDownloaded

If both block counts are > 0, you're paying for two copies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment