Hf upload-large-folder failed to commit

yirens · January 2, 2026, 3:49am

Hi All,

I am trying to upload a CFD dataset to huggingface. Due to the dataset nature, the data size is 1.1TB.

I was using upload-large-folder feature of the hf CLI for uploading. The process was normal until over 1TB files are already committed. After that, I always get the following error for hf CLI:
Failed to commit: The read operation timed out Failed to commit 20 files at once. Will retry with less files in next batch.

The hf upload-large-folder was able to hash and pre-upload, but simply cannot commit the pre-uploaded files to the dataset repository.

I wonder if the community has came across this issue, and if so what is the proper way to commit the files to dataset repo. Currently I am using

hf upload-large-folder <dataset_repo_name> . --repo-type=dataset

Any I tried to restart and resume uploading multiple times but the issue persist.

Also, here is the link to the CFD repo: yirens/double-delta-aero · Datasets at Hugging Face

John6666 · January 3, 2026, 12:22am

Hmm… I can’t pinpoint a single cause…

This error is happening in the commit phase, not in the data-transfer phase.

hf upload-large-folder (same as HfApi.upload_large_folder) does three separate things:

Hash local files.
Pre-upload file blobs (LFS or Xet backend).
Commit: send an HTTP “create commit” request that tells the Hub “these N paths now exist and point to these uploaded blobs”.

Your logs say hashing + pre-upload succeed, then “Failed to commit: The read operation timed out”. That means the client waited for the Hub’s response to the commit request and timed out before it got one.

The main underlying cause: commit requests can exceed the server timeout

Hugging Face explicitly documents that when pushing through HTTP, the Hub enforces a 60s timeout on the commit request, and that each file operation is validated server-side. They also note that in rare cases the server may still finish the commit even if the client times out, and recommend keeping commits to roughly 50–100 files per commit to reduce timeout risk. (Hugging Face)

So the failure mode is:

Your blobs are already uploaded.
The final “create commit” HTTP call is taking too long (Hub load, repo state, validation cost).
The client reports a timeout and retries with smaller batches.

Important nuance: a client-side timeout does not always mean “nothing happened”. HF warns the server may complete the commit anyway. (Hugging Face)

A second likely factor in your specific case: you are over the “>1TB dataset” threshold

HF also documents that datasets bigger than 1TB require Team/Enterprise or an explicit storage grant, and they require you to email datasets@huggingface.co when crossing that scale. (Hugging Face)

Your repo UI currently shows about 1.07 TB already stored. (Hugging Face)
So you are exactly at the point where storage policy and internal throttles can start to matter, even if you were able to upload most of it.

What to do first (fast sanity checks)

1) Check whether the “failed” commit actually landed anyway

Because HF says timeouts can still complete server-side, do this:

Open the repo “Files and versions”.
Look for the last batch of files you expected.
If they appear, your retry loop may be re-attempting work that already succeeded.

This is specifically called out in HF docs: timeout can be raised client-side even if server completes, and you can verify by browsing the repo. (Hugging Face)

2) Confirm you are not hitting the >1TB hosting gate

If you have not already: email datasets@huggingface.co with dataset name, size, format, and intended community use. HF explicitly requires this for >1TB datasets. (Hugging Face)

If you do not, you can end up in a situation where uploads “mostly work” but commits become unreliable or blocked.

Practical fixes that usually work

Fix A: Reduce commit “work” by reducing file-count pressure (best long-term)

If your dataset is many files, the Hub has multiple scaling limits that can degrade reliability:

≤10k files per folder recommendation (use subdirectories). (Hugging Face)
Repo UX degrades after thousands of commits. (Hugging Face)
Commit requests can timeout when validation takes too long; HF suggests 50–100 files per commit. (Hugging Face)

For very large datasets, HF explicitly recommends using Parquet or WebDataset to share large data efficiently and keep the ecosystem tools working. (Hugging Face)

Concrete approach for CFD data:

Pack samples into shards:
- WebDataset: data-00000.tar, data-00001.tar, …
- Or Parquet row-groups / multiple parquet files per split.
Target shard sizes like 1–10 GB (or bigger if you prefer fewer files), but avoid single huge objects.
Keep directory fanout sane (e.g., train/000/, train/001/, …).

Even if your current upload finishes, this restructuring typically prevents future “commit timeout” loops and makes downloads more robust.

Fix B: Use Xet-backed uploading properly (performance and robustness)

HF’s upload guide recommends hf_xet and notes you can enable high performance mode with HF_XET_HIGH_PERFORMANCE=1. (Hugging Face)
They also recommend putting the Xet cache on local disk (NVMe/SSD) when uploading from network/distributed filesystems, via HF_XET_CACHE, because the default cache is under HF_HOME which might live on slower network storage. (Hugging Face)

This does not directly remove the 60s commit timeout, but it reduces overall upload friction and can reduce “commit phase” lag indirectly by lowering contention and retry churn.

Fix C: Make sure resumability is actually working (do not delete the upload cache)

upload_large_folder is resumable because it caches task results locally in a ./cache/huggingface directory inside the folder being uploaded. (Hugging Face)

If you:

run from a different path,
wipe that cache,
or upload from different machines without shared cache,

then “resume” becomes much weaker and you can waste time re-hashing/re-preuploading.

Fix D: If you are stuck in a commit-timeout loop, stop relying on one big repo state transition

When the commit step is the bottleneck, the most reliable pattern is:

Commit fewer file operations per commit (even 1–5 at a time if needed).
Or commit “by shard” after repacking.

Even HF’s own pain-point threads emphasize that large uploads can hit rate limits or commit-step failures and that chunked workflows are necessary at TB scale. (GitHub)

Why your batch is already “small” but still failing (what that implies)

You said it fails committing 20 files at once, then retries with fewer.

20 is not large. If even very small batches still fail, the likely explanations narrow to:

Server-side slowness or transient infrastructure issues around commit endpoints (you will see similar reports as 503/504 during upload of very large datasets). (GitHub)
Crossing an account/repo storage policy boundary (you are already at ~1.07 TB). (Hugging Face)
Too many total repo entries / heavy repo state (many files, many commits), making each additional commit slower to validate and apply. HF warns UX degrades after thousands of commits. (Hugging Face)

Given your repo is already over 1TB, item (2) is the first thing I would address.

Similar cases online (high-signal references)

Timeouts / errors during commit step

Hub docs on HTTP commit timeout (60s) and advice on commit sizing. https://huggingface.co/docs/hub/en/storage-limits (Hugging Face)
GitHub issue: upload_large_folder commit step returns 503/429. https://github.com/huggingface/huggingface_hub/issues/3071 (GitHub)
GitHub issue: 504 Gateway Timeout uploading ~500GB dataset. https://github.com/huggingface/datasets/issues/7400 (GitHub)
GitHub issue: uploader pain points for multi-TB datasets. https://github.com/huggingface/huggingface_hub/issues/2612 (GitHub)

Repo scaling limits and “too many files”

Forum thread on 100k file limit and archiving/sharding advice. /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Flimit-of-100000-files%2F26400%3C%2Fa%3E (Hugging Face Forums)
Older but still useful thread summarizing practical limits (files per repo, per folder, commit sizes). /static-proxy?url=https%3A%2F%2Fdiscuss.huggingface.co%2Ft%2Fis-there-a-size-limit-for-dataset-hosting%2F14861%3C%2Fa%3E (Hugging Face Forums)

Official “how to upload large folders” mechanics

Upload guide explaining that upload_large_folder is resumable, multi-threaded, and caches state under ./cache/huggingface. https://huggingface.co/docs/huggingface_hub/guides/upload (Hugging Face)

What I would do for your repo, specifically

Email datasets@huggingface.co now with your repo link and size. You are past 1TB, and HF explicitly requires contact for support/storage at that scale. (Hugging Face)
Verify whether the “timed out” commit actually landed by checking “Files and versions”. HF says it can. (Hugging Face)
If commits truly are not landing:
- Repack into shards (WebDataset tar shards or Parquet files) and upload those instead of many raw files. HF explicitly recommends Parquet/WebDataset for large dataset sharing. (Hugging Face)
Ensure the upload is running with stable resumability:
- same working directory,
- do not wipe ./cache/huggingface,
- if on cluster, configure HF_XET_CACHE to a local SSD path. (Hugging Face)

Summary

The error is from the final commit HTTP request timing out, not from uploading bytes. HF documents a 60s commit timeout and recommends limiting commit workload. (Hugging Face)
Your repo is already ~1.07 TB, and HF requires contacting them for datasets >1TB. (Hugging Face)
The most robust fix is usually sharding into fewer larger files (WebDataset/Parquet) plus making sure you have the proper >1TB storage grant. (Hugging Face)

Topic		Replies	Views
Failed to commit 504 Server Error Gateway Time-out for url Beginners	1	110	December 26, 2024
How to get around rate limits? Beginners	17	2205	April 23, 2025
Fail to upload datasets 🤗Datasets	4	105	September 25, 2025
Error uploading folder with a lot of files: Comment must be less than 65536 chars 🤗Hub	0	245	January 9, 2024
Upload a large folder from S3 to a dataset 🤗Datasets	5	110	January 20, 2026