As you may have noticed, each dataset uploaded to the huggingface hub is parsed automatically and the relevant informations are available at an URL like this:
https://datasets-server.huggingface.co/info?dataset=lhoestq/demo1
I’m interested in understanding how this parsing is performed, but I don’t know where to look for the source code.
It doesn’t seem to be in the datasets library. Maybe it’s in the huggingface_hub library ? Or maybe it’s not open-source ?
Any pointer would be appreciated
1 Like
I don’t know much about datasets, but there are a few projects on github that seem to be related.
Not all of the services inside the HF server are open source, though. I guess that’s just how it has to be for security reasons.
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
Thankks for the reply.
It looks like it is somewhere in dataset-viewer indeed:
# Get dataset information
The dataset viewer provides an `/info` endpoint for exploring the general information about dataset, including such fields as description, citation, homepage, license and features.
The `/info` endpoint accepts two query parameters:
- `dataset`: the dataset name
- `config`: the subset name
<inferencesnippet>
<python>
```python
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/info?dataset=ibm/duorc&config=SelfRC"
def query():
response = requests.get(API_URL, headers=headers)
return response.json()
data = query()
```
This file has been truncated. show original
1 Like