huggingface_hub up-to-date
According to the documentation, when installing the relatively new huggingface_hub, hf_xet is supposed to be installed at the same time, and this usually works fine. However, Xet-related drift still occurs occasionally. Judging from the screenshot, this appears to be a symptom of such an issue.
Setting the environment variable HF_HUB_DISABLE_XET=1 is the simplest way to work around or isolate the problem.
Considering download speed, the ideal solution would be to resolve the issue using pip install -U hf_xet.
If the issue cannot be bypassed by setting HF_HUB_DISABLE_XET=1, there may be an unknown bug.
The most likely cause is not your token and not your hf_hub_download(...) call itself. It is the download path underneath it. Hugging Face now uses Xet for Hub file transfers by default, and current docs say all Hub repositories are Xet-enabled, hf_xet is the default transfer path, and as of huggingface_hub 0.32.0 installing the latest huggingface_hub also installs hf_xet. That means “it worked a few days ago, then started hanging without code changes” is plausible because the backend path may have changed even though your Python call did not. (Hugging Face)
What is probably happening
hf_hub_download() does not simply save a file directly into the folder you are watching. Hugging Face documents that it downloads into the HF cache and returns a path that points into that cache. It also documents a separate Xet cache under HF_XET_CACHE. So if you are only watching one Kaggle disk indicator or one folder, you may be missing where activity is actually happening. At the same time, there are current reports of real Xet download stalls, so this is not just a visualization problem either. (Hugging Face)
The closest match to your environment is a current GitHub issue in huggingface/xet-core where downloads on Kaggle get stuck and the reporter explicitly suspects Xet rather than the dataset code. There are also other Xet issues where large files stick at 0% or near 99%. That makes your case look like a real backend or environment interaction problem, not a user error. (GitHub)
Why the progress bar can look frozen
There is a very recent huggingface_hub issue showing that Xet downloads barely report progress, so a large transfer can look dead for long stretches even if bytes are moving. The report says the bar may only jump a few times on a multi-GB file, and points to fixes in both xet-core and huggingface_hub. So a “stuck” bar is not always proof of a stalled transfer. (GitHub)
But your extra detail matters: you said Kaggle storage did not increase. That makes a pure progress-bar-only explanation weaker. In your case, the most likely reading is: either the transfer is really hanging, or the writes are happening in a cache location different from the one you are watching. (Hugging Face)
Why Kaggle is a good suspect
There is a Kaggle-side product feedback report about Hugging Face downloads failing because of Kaggle proxy URL rewriting. The search result snippet also says that if it persists, it is likely a Kaggle-side problem. That is not definitive proof for your exact failure, but it is strong context that Kaggle networking or proxying has already caused Hugging Face download breakage before. (Kaggle)
There is also a fresh huggingface_hub issue from Colab where large HF downloads hang while plain wget downloads work, which is useful context because it suggests this class of bug can appear in managed notebook environments specifically, not just on your machine. (GitHub)
My ranked diagnosis for your case
1. Most likely: Xet-backed transfer hanging in Kaggle
This best fits the timing, the environment, and the public issue reports. Hugging Face’s newer transfer path uses Xet by default, and there is already a Kaggle-specific Xet issue with a “gets stuck” symptom. (Hugging Face)
2. Also likely: a broader Xet large-file stall, exposed more easily on Kaggle
There are public reports of Xet downloads sticking at 0% and 99% on larger files. That matches your symptom even if Kaggle is only part of the trigger. (GitHub)
3. Possible contributor: poor progress reporting
This can make the stall look worse than it is, but by itself it does not fully explain “no visible disk movement.” (GitHub)
4. Possible contributor: Kaggle proxy or networking layer
There is direct evidence that Kaggle proxy behavior has interfered with HF downloads before. (Kaggle)
5. Less likely: your authentication or function arguments
If auth were the main issue, the normal failure would usually be a clearer 401, 403, missing file, or repository error rather than an indefinite hang. The function itself is the documented standard way to download a single file. (Hugging Face)
Best fixes to try, in order
1. Disable Xet first
This is the highest-value test.
Set these before importing huggingface_hub. Hugging Face explicitly says environment variables are read at import time, not afterward. It also documents HF_HUB_DISABLE_XET, HF_HUB_DOWNLOAD_TIMEOUT, and HF_HUB_ETAG_TIMEOUT, with both timeouts defaulting to 10 seconds. (Hugging Face)
import os
# Must be set BEFORE importing huggingface_hub
os.environ["HF_HUB_DISABLE_XET"] = "1"
os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "120"
os.environ["HF_HUB_ETAG_TIMEOUT"] = "30"
# Put cache somewhere explicit on Kaggle
os.environ["HF_HOME"] = "/kaggle/working/hf_home"
# Useful for debugging
os.environ["HF_DEBUG"] = "1"
os.environ["HF_HUB_VERBOSITY"] = "debug"
Then:
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id=self.repo_id,
filename=file_path,
repo_type="dataset",
cache_dir="/kaggle/working/hf_cache",
force_download=True,
)
print(path)
Why this is the best first test: if the problem disappears with HF_HUB_DISABLE_XET=1, your root cause is very likely the Xet path, not the repo, not the token, and not your Kaggle notebook code. That diagnosis is grounded in the Xet-related stuck-download issues and the fact that HF now defaults to Xet transfers. (Hugging Face)
2. Make the download visible in a real folder, not just the cache
Hugging Face docs say hf_hub_download() normally returns a pointer into the cache, and they also document a local_dir mode for downloading to a specific folder while maintaining metadata under .cache/huggingface. On Kaggle, the documented persisted output area is /kaggle/working. (Hugging Face)
So for debugging, try writing somewhere explicit:
from huggingface_hub import hf_hub_download
path = hf_hub_download(
repo_id=self.repo_id,
filename=file_path,
repo_type="dataset",
local_dir="/kaggle/working/hf_files",
cache_dir="/kaggle/working/hf_cache",
force_download=True,
)
print(path)
This does two things:
- it makes actual file writes easier to observe
- it removes confusion around where the cache lives (Hugging Face)
3. Use a dry run to separate “metadata works” from “payload transfer hangs”
Hugging Face documents dry_run=True for hf_hub_download() and snapshot_download(). It returns file info without performing the full transfer. (Hugging Face)
from huggingface_hub import hf_hub_download
info = hf_hub_download(
repo_id=self.repo_id,
filename=file_path,
repo_type="dataset",
cache_dir="/kaggle/working/hf_cache",
dry_run=True,
)
print(info)
Interpret it like this:
- If dry run works but the real download hangs, auth and repo resolution are probably fine, and the problem is in the actual transfer path.
- If dry run fails too, the problem may be earlier in the flow. (Hugging Face)
4. If you must keep Xet enabled, reduce its aggressiveness
Hugging Face documents HF_XET_NUM_CONCURRENT_RANGE_GETS, which controls how many byte ranges per file are fetched concurrently, with a default of 16. On a managed notebook or proxy-heavy environment, reducing concurrency can help. (Hugging Face)
import os
os.environ["HF_XET_NUM_CONCURRENT_RANGE_GETS"] = "4"
os.environ["HF_HUB_DOWNLOAD_TIMEOUT"] = "120"
os.environ["HF_HUB_ETAG_TIMEOUT"] = "30"
os.environ["HF_HOME"] = "/kaggle/working/hf_home"
Then retry in a fresh session. I would still test disable Xet first, because it is the cleaner diagnostic step. (Hugging Face)
5. Do not rely on old hf_transfer advice
Current Hugging Face docs say HF_HUB_ENABLE_HF_TRANSFER is deprecated because the Hub is now fully powered by Xet. So older advice built around hf_transfer is no longer the main path. (Hugging Face)
One important caveat
If the file you are downloading is larger than about 50 GB, disabling Xet may not be a viable fallback right now. There is an open huggingface_hub issue showing that with HF_HUB_DISABLE_XET=1, HTTP-based download of a ~59 GB file fails before starting and reports that the file is too large for the regular method. (GitHub)
So:
- for files under that threshold, disabling Xet is the best first test
- for files over that threshold, you may need Xet to work, which makes Kaggle/Xet compatibility more important (GitHub)
What I would do in your exact situation
I would do this in a fresh Kaggle session:
- Set
HF_HUB_DISABLE_XET=1 before importing anything from huggingface_hub. (Hugging Face)
- Set
HF_HUB_DOWNLOAD_TIMEOUT=120 and HF_HUB_ETAG_TIMEOUT=30. Defaults are 10 seconds, which is not generous for fragile notebook networking. (Hugging Face)
- Set
HF_HOME, cache_dir, and optionally local_dir to /kaggle/working/... so you can see where bytes are supposed to go. Kaggle documents /kaggle/working as the saved output area. (Kaggle)
- Run
dry_run=True. If that succeeds, your repo, filename, and auth are probably fine. (Hugging Face)
- Retry the real download.
- If it only works with Xet disabled, you have your answer: Kaggle + Xet path.
- If it still hangs even with Xet disabled, then the next suspects are Kaggle networking, proxy rewriting, or a broader transient Hugging Face routing issue. (Kaggle)
Bottom line
Your case is most consistent with a recent backend-path regression, not a mistake in your code. The strongest explanation is:
huggingface_hub now routes downloads through Xet by default
- there are current reports of Kaggle-specific Xet failures
- there are separate reports of 0% / 99% stalls on large Xet downloads
- there is a real progress-reporting bug that can make healthy transfers look frozen
- Kaggle has also had proxy-related HF download issues before (Hugging Face)
So the cleanest first move is:
disable Xet, raise timeouts, and write into /kaggle/working in a fresh session.