Duckdb cli: what's the URL to access to a dataset?

aborruso · August 18, 2023, 6:17am

Hi,
I have this dataset:
https://huggingface.co/datasets/aborruso/openncup-focus-pnrr.

If I run this simply command

select * from read_parquet('https://huggingface.co/datasets/aborruso/openncup-focus-pnrr/blob/refs%2Fconvert%2Fparquet/aborruso--openncup-focus-pnrr/train/index.duckdb') limit 2;

I have this error:

Error: IO Error: HTTP GET error: Content-Length from server mismatches requested range, server may not support range requests.

What’s the right URL to access to it, using duckdb cli and https extension?

Thank you

mariosasko · August 18, 2023, 12:10pm

Hi! This doc page explains how to access the Parquet export of a dataset.

The dataset in question has a single Parquet file (for the train split): https://huggingface.co/api/datasets/aborruso/openncup-focus-pnrr/parquet/aborruso--openncup-focus-pnrr/train/0.parquet (a quick test in DuckDB CLI on my local machine works as expected)

aborruso · August 18, 2023, 1:46pm

Thank you very much. I annotate here the steps I prefer to do it:

open the dataset page and click on API;

copy the “List the Parquet files for this dataset” curl command
run it and you have the URL(s) of your parquet dataset file(s)

severo · August 18, 2023, 2:09pm

To be complete, a third way to have them is to edit the dataset url:

https://huggingface.co/datasets/aborruso/openncup-focus-pnrr

by adding /api at the start, and /parquet at the end

https://huggingface.co/api/datasets/aborruso/openncup-focus-pnrr/parquet

aborruso · August 18, 2023, 3:01pm

Really useful, thank you very much @severo

Topic		Replies	Views
HF space auth to read dataset to ducdkdb (using secrets) Spaces	5	73	November 12, 2024
Is there an easier way to query a dataset via duckdb? Beginners	2	67	November 13, 2024
Parquet file download error: AccessDenied 🤗Datasets	6	395	October 19, 2023
Connection Error when Accessing Dataset URL on Hugging Face 🤗Datasets	5	4944	February 2, 2024
Unable to download large datasets 🤗Datasets	2	47	April 8, 2025

Duckdb cli: what's the URL to access to a dataset?

Related topics