CORS support for HTTP Range Requests on dataset files

I’m building a web-based data visualization tool that uses Parquet files with row groups for efficient partial data loading. The tool needs to make HTTP Range Requests from the browser to fetch only specific portions of large Parquet files.

Current behavior:

  • Simple GET requests work fine through the CDN redirect

  • Range requests fail with CORS preflight errors when redirected to cas-bridge.xethub.hf.co

Error:

Access to fetch at ‘https://cas-bridge.xethub.hf.co/…’ (redirected from ‘http../file.parquet’)

from origin ‘…’ has been blocked by CORS policy:

Response to preflight request doesn’t pass access control check: It does not have HTTP ok status.

Requested change:

Add CORS headers to the XetHub CDN that support Range requests:

Access-Control-Allow-Origin: *

Access-Control-Allow-Methods: GET, HEAD, OPTIONS

Access-Control-Allow-Headers: Range, Content-Type

Access-Control-Expose-Headers: Content-Range, Accept-Ranges, Content-Length

This would enable tools like parquet-wasm, DuckDB-WASM, and other browser-based data tools to efficiently read partial data from HF-hosted datasets.

1 Like

Hmm…


What is going on in your setup

You are doing the “right” thing for Parquet in the browser: byte-range reads so you only fetch the Parquet footer + the specific row groups you need, instead of downloading multi-GB files.

Your failure mode is also typical: Range + redirect + CORS preflight breaks when the final CDN hop does not implement CORS correctly for OPTIONS and 206 Partial Content.

On the Hugging Face Hub, many downloads now redirect from huggingface.co/.../resolve/... to Xet’s bridge/CDN hostnames such as cas-bridge.xethub.hf.co/... (this redirect behavior is documented in Hugging Face’s Xet migration write-up). (Hugging Face)

So your browser ends up doing cross-origin requests to cas-bridge.xethub.hf.co, not to huggingface.co, and CORS must work on the redirected host too.


Why your Range request triggers a preflight at all

A lot of people assume “Range always preflights.” That is not strictly true.

  • The Range request header can be CORS-safelisted (no preflight) only in a narrow case: a single byte range like bytes=500-999. (MDN WebDocument)
  • If your tooling sends multiple ranges (comma-separated), or adds other non-safelisted headers, you get a preflight. (MDN WebDocument)
  • Many browser data stacks also do an initial HEAD to check size and range support. DuckDB-Wasm users routinely hit failures when that HEAD is blocked by CORS. (GitHub)

Even if you personally only set Range, libraries in the chain (parquet readers, fetch wrappers, WASM httpfs layers, etc.) commonly add one or more of:

  • HEAD probe
  • Range in a non-safelisted form
  • extra headers (caching validators, custom metadata, etc.)

So designing for “no preflight” is fragile. You want the infrastructure to support preflight cleanly.


Why the error text points to “preflight doesn’t have HTTP ok status”

Browsers require that the preflight response is an “ok” HTTP status (typically 200 or 204). If not, the preflight fails and the actual request never runs. (GitHub)

Redirects make this worse because redirect handling for preflights is heavily constrained and often treated as a failure path; “redirect on preflight” has long been a known sharp edge in the Fetch/CORS model. (GitHub)

So if OPTIONS on cas-bridge.xethub.hf.co returns a non-2xx (or redirects, or is blocked at the edge), you get exactly the error you posted.


What “good” looks like for Range over CORS

You need two things to be correct:

1) OPTIONS succeeds for the redirected host

For requests that preflight, cas-bridge.xethub.hf.co must respond to OPTIONS with:

  • Status: 204 (or 200)
  • CORS allow headers including what the browser asked for
  • Ideally Access-Control-Max-Age to reduce repeat preflights

The key is: do not 3xx redirect the preflight and do not return 4xx/5xx.

This is not optional if you want robust compatibility. (GitHub)

2) GET supports 206 Partial Content with CORS headers

When the client does GET with Range, the server usually answers:

  • 206 Partial Content
  • Accept-Ranges: bytes (advertises capability) (MDN WebDocument)
  • Content-Range: bytes start-end/total (tells the client what it got) (MDN WebDocument)
  • Content-Length for the returned chunk

And critically: the 206 response must include Access-Control-Allow-Origin and must expose the headers your JS needs.


Why Access-Control-Expose-Headers matters (people miss this)

Even if the server returns Accept-Ranges and Content-Range, browser JS cannot read them unless they are “safelisted” or explicitly exposed.

MDN’s rule is: only a small set of response headers are exposed by default, everything else needs Access-Control-Expose-Headers. (MDN WebDocument)

This is why projects like pdf.js historically logged errors like “Refused to get unsafe header ‘Accept-Ranges’” when trying to do progressive range loading. (GitHub)

For Parquet readers, exposing at least:

  • Accept-Ranges
  • Content-Range
  • Content-Length

is the difference between “can stream selectively” and “falls back to full download.”


Your requested headers are basically right, with two practical refinements

You proposed:

  • Access-Control-Allow-Origin: *
  • Access-Control-Allow-Methods: GET, HEAD, OPTIONS
  • Access-Control-Allow-Headers: Range, Content-Type
  • Access-Control-Expose-Headers: Content-Range, Accept-Ranges, Content-Length

That is a solid baseline for public, anonymous dataset files.

Two refinements that will reduce breakage:

Refinement A: don’t under-allow request headers

If any client sends If-None-Match, If-Modified-Since, or other common headers, preflight will ask for them and fail if not allowed. Many CDNs solve this by echoing back Access-Control-Request-Headers.

Practical pattern:

  • In OPTIONS, read Access-Control-Request-Headers and return it in Access-Control-Allow-Headers.

This avoids whack-a-mole. It is especially useful when you do not control all downstream libraries.

Refinement B: be explicit about credentials vs wildcard

Access-Control-Allow-Origin: * is only valid for requests without credentials. If credentials are included, wildcard causes a browser error. (MDN WebDocument)

So you likely want two tiers:

  1. Public blobs (no auth cookies, no auth headers from browser):
  • Access-Control-Allow-Origin: *
  • no Access-Control-Allow-Credentials
  1. Gated/private blobs (if ever needed from browser):
  • echo the request Origin instead of *
  • add Vary: Origin for caches (MDN WebDocument)
  • only then consider Access-Control-Allow-Credentials: true

If Hugging Face only wants to support browser range reads for public assets, then the “public tier” alone is enough.


Why this specifically matters for Parquet-WASM and DuckDB-WASM

DuckDB-Wasm explicitly calls out that browser deployments must obey CORS and remote HTTPFS reads depend on the remote server allowing it. (DuckDB)

Also, real bug reports show DuckDB-Wasm tries HEAD, and if that is blocked by CORS, the engine never reaches the “range GET” stage. (GitHub)

So your request is not hypothetical. It maps to an established failure pattern in browser analytics stacks.


Similar cases online (same class of problem)

These are “same shape” incidents: byte-range/progressive loading + missing exposed headers or broken CORS:

  • pdf.js: “Refused to get unsafe header ‘Accept-Ranges’” when trying range-based PDF loading. Root cause is missing Access-Control-Expose-Headers. (GitHub)
  • DuckDB-Wasm: CORS failures on HEAD stop the pipeline before range reads happen. (GitHub)
  • OGC Cloud Optimized GeoTIFF (COG) ecosystem: COG relies on HTTP Range requests; OGC docs explicitly call out CORS considerations around advertising range support. Different domain, same mechanism. (OGC Public Document Repository)
  • Hugging Face Xet bridge operational threads: multiple HF community threads reference cas-bridge.xethub.hf.co as an infrastructure hop that can break downloads or requires allowlisting. (Hugging Face Forums)

Workarounds you can use today (if HF infra does not change quickly)

Workaround 1: run a tiny proxy that terminates CORS correctly

You fetch from HF server-to-server, then serve to browser with correct:

  • OPTIONS handling
  • 206 + Expose-Headers

Downside: you lose “direct” HF edge delivery unless you deploy the proxy at an edge (Cloudflare Workers, Fastly Compute, etc.).

Workaround 2: require user-provided files (local File API)

Parquet-WASM can read from a File handle. No CORS. Obvious UX cost.

Workaround 3: attempt the Xet-native APIs (advanced)

Hugging Face documents a Xet protocol where you first get X-Xet-Hash and then call a reconstruction API; it even recommends batched downloads and mentions using Range. (Hugging Face)
In practice, this still needs CORS on those endpoints if you do it directly from browser, so it is not a guaranteed escape hatch. But it is relevant context when discussing “HF already thinks in ranges.”


Copy-paste issue text (clean, reproducible, actionable)

Title

Enable CORS + HTTP Range support for browser partial reads on cas-bridge.xethub.hf.co (Parquet row-group access)

Summary

Browser-based data tools need Range requests to read Parquet efficiently (footer + selected row groups). Downloads from the Hub redirect to cas-bridge.xethub.hf.co (Xet bridge). The redirected host fails CORS preflight for Range/HEAD workflows, blocking partial reads. (Hugging Face)

Current behavior

  • Plain GET works via redirect.
  • Range workflows fail with: “Response to preflight request doesn’t pass access control check: It does not have HTTP ok status.”
  • This blocks parquet-wasm and DuckDB-Wasm style readers which rely on HEAD + Range or non-safelisted Range patterns. (GitHub)

Expected behavior

  1. OPTIONS to the final redirected host returns 200/204 (no redirect) with appropriate CORS headers. Preflight responses must be “ok” status. (GitHub)
  2. GET with Range returns 206 Partial Content and includes CORS headers, plus exposes Content-Range, Accept-Ranges, and Content-Length so browser JS can consume them. (MDN WebDocument)

Proposed CORS headers (public, anonymous files)

For responses from cas-bridge.xethub.hf.co (and any sibling Xet bridge hosts):

Preflight (OPTIONS)

  • Access-Control-Allow-Origin: *
  • Access-Control-Allow-Methods: GET, HEAD, OPTIONS
  • Access-Control-Allow-Headers: Range, Content-Type (or echo Access-Control-Request-Headers)
  • Access-Control-Max-Age: 86400 (optional, reduces preflight spam)

Actual (GET/HEAD, including 206)

  • Access-Control-Allow-Origin: *
  • Access-Control-Expose-Headers: Content-Range, Accept-Ranges, Content-Length
  • Ensure Accept-Ranges: bytes and Content-Range are present for range responses. (MDN WebDocument)

Notes on credentials (optional)

If any endpoint requires credentials, wildcard * cannot be used and the server must echo Origin and add Vary: Origin. (MDN WebDocument)

Impact

This unblocks efficient browser analytics and visualization on HF-hosted datasets using Parquet row groups, DuckDB-Wasm, parquet-wasm, and similar tooling. DuckDB-Wasm documentation explicitly notes that remote data access requires correct CORS on the hosting site. (DuckDB)


High-quality references worth linking in the issue thread

  • Hugging Face: redirect to cas-bridge.xethub.hf.co shown in Xet migration blog (Hugging Face)
  • Fetch/CORS: preflight must be “ok” status (200/204) (GitHub)
  • Fetch/CORS: redirect + preflight is a known sharp edge (GitHub)
  • MDN CORS guide: Range safelist caveat (MDN WebDocument)
  • MDN Range header: single-range is safelisted, multi-range may preflight (MDN WebDocument)
  • MDN Expose-Headers: non-safelisted headers must be exposed (MDN WebDocument)
  • DuckDB-Wasm: remote HTTPFS requires correct CORS (DuckDB)
  • DuckDB-Wasm issue: HEAD blocked by CORS breaks the pipeline (GitHub)
  • pdf.js historical issues about Accept-Ranges/Content-Range exposure (GitHub)

Summary

  • Your request is standard: browser Parquet needs byte ranges.
  • Redirect to cas-bridge.xethub.hf.co makes CORS enforcement happen on the Xet bridge host. (Hugging Face)
  • Fix requires: OPTIONS returns 200/204 with CORS headers, and 206 responses include CORS + exposed headers. (GitHub)
  • Similar failures exist across pdf.js and DuckDB-Wasm ecosystems. (GitHub)
1 Like

Thank you @John6666 for all of the information! I’m currently using the local proxy workaround in order to get parquet_wasm to be able to make range requests. I raised an issue here using your suggestion Enable CORS + HTTP Range support for browser partial reads on cas-bridge.xethub.hf.co (Parquet row-group access) · Issue #7931 · huggingface/datasets · GitHub

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.