Does the REST API work with private repo?

sl02 · January 5, 2023, 12:09pm

I was experimenting with the REST API with a private repo. Despite providing the user access token in the request header, I receive an error

import requests
from dotenv import load_dotenv
load_dotenv()
per_token = os.getenv('API_PER_TOKEN')
headers = {"Authorization": f"Bearer {per_token}"}
API_URL = "https://datasets-server.huggingface.co/is-valid?dataset=sl02/np-datasets"
def query():
    response = requests.request("GET", API_URL, headers=headers)
    return response.json()
data = query()

{'error': 'The dataset does not exist, or is not accessible without authentication (private or gated). Please retry with authentication.'}
However, when I make the repository public, it returns {'valid': True}. But, when I run the first-rows API, I get the following message

import requests
from dotenv import load_dotenv
load_dotenv()
per_token = os.getenv('API_PER_TOKEN')
headers = {"Authorization": f"Bearer {per_token}"}
API_URL = "https://datasets-server.huggingface.co/first-rows?dataset=sl02/np-datasets&config=default&split=train"
def query():
    response = requests.request("GET", API_URL)
    return response.json()
data = query()

{'error': 'The response is not ready yet. Please retry later.'}

The load_dataset() works in private mode when I set the use_auth_token argument. Any clue what I missing here?

lhoestq · January 5, 2023, 4:22pm

Maybe @severo knows more, but IIRC the REST API is not available yet for private repos.

severo · January 5, 2023, 4:28pm

Hi @sl02. The REST API uses the same rule as the dataset viewer (see The Dataset Preview has been disabled on this dataset - #6 by severo): it’s not available at all for the private datasets for now.

re “The response is not ready yet. Please retry later”: the responses to the API endpoints are pre-computed asynchronously and can take some time to be processed, depending on the dataset itself and on the load of the servers.

ymoslem · February 27, 2025, 5:18am

Hello! The dataset preview is now available for the Pro accounts. Should not it be the case for the API? I cannot do something as simple as retrieving the URLs. Thanks!

headers = {"Authorization": f"Bearer {API_TOKEN}"}

reseponse = requests.get(f"https://datasets-server.huggingface.co/parquet?dataset={dataset_name}")
json_data = reseponse.json()

urls = [f['url'] for f in json_data['parquet_files'] if f['split'] == 'test']

Update

So now this works:

from datasets import load_dataset
import requests

headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = f"https://huggingface.co/api/datasets/{dataset_name}/parquet"

def query():
    response = requests.get(API_URL, headers=headers)
    json_data = response.json()["default"]
    return json_data

urls = query()
print(urls)

However, if we try to download the retrieved URL, it does not work FileNotFoundError

test_dataset = load_dataset("parquet",
                            data_files={"test": urls["test"]},
                            split="test",
                            token=API_TOKEN
                            )

The only solution I found so far, is to manually download the retrieved URLs, something like:

# Manually download the files

import shutil
from tqdm.auto import tqdm

parquet_files = []

for n, url in tqdm(enumerate(urls["test"]), total=len(urls["test"])):

  response = requests.get(url, headers=headers, stream=True)

  with open(f"{n}.parquet", "wb") as f:
      shutil.copyfileobj(response.raw, f)
      parquet_files.append(f"{n}.parquet")


# Load dataset
test_dataset = load_dataset("parquet", data_files=parquet_files)

print(test_dataset)

lhoestq · March 5, 2025, 2:39pm

Hi ! you can load the parquet files from the repo directly:

load_dataset(dataset_name, revision="refs/convert/parquet")

and if you want to load specific files you can pass data_files=[...] (btw it accepts glob patterns)

ymoslem · March 10, 2025, 7:18am

Thanks! I still receive FileNotFoundError. The issue, as in the original post, is that the repository is private. It is my repository, and I am logged in with an access token.

lhoestq · March 11, 2025, 3:20pm

Can you check that your token has the right permissions ? I just tried on my side and I couldn’t reproduce the FileNotFoundError on a the parquet branch of a private repo with a token

Topic		Replies	Views
Error loading a CSV file from a private repo Beginners	1	248	November 25, 2022
Is there a way to load dataset lfs files on github private repo? 🤗Datasets	1	429	March 13, 2023
Uploading a data set Beginners	2	597	June 13, 2024
Help loading a dataset that I pushed to hub 🤗Datasets	4	590	August 16, 2023
Issue with FlaskAPI in a Private Space After Sleeping Mode Beginners	4	12	March 25, 2025

Does the REST API work with private repo?

Update

Related topics