I have this dataset:
If I run this simply command
select * from read_parquet('https://huggingface.co/datasets/aborruso/openncup-focus-pnrr/blob/refs%2Fconvert%2Fparquet/aborruso--openncup-focus-pnrr/train/index.duckdb') limit 2;
I have this error:
Error: IO Error: HTTP GET error: Content-Length from server mismatches requested range, server may not support range requests.
What’s the right URL to access to it, using duckdb cli and https extension?
Hi! This doc page explains how to access the Parquet export of a dataset.
The dataset in question has a single Parquet file (for the
https://huggingface.co/api/datasets/aborruso/openncup-focus-pnrr/parquet/aborruso--openncup-focus-pnrr/train/0.parquet (a quick test in DuckDB CLI on my local machine works as expected)
Thank you very much. I annotate here the steps I prefer to do it:
- open the dataset page and click on API;
- copy the “List the Parquet files for this dataset” curl command
- run it and you have the URL(s) of your parquet dataset file(s)
To be complete, a third way to have them is to edit the dataset url:
/api at the start, and
/parquet at the end
Really useful, thank you very much @severo