This applies to the dataset itself, but especially for the dataset viewer, the README.md file serves as a configuration file. If a model or dataset isn’t recognized automatically, editing the beginning of the README.md file (YAML section) may resolve the issue:
However, there’s always a possibility of a bug, so don’t worry if it doesn’t work…
What is going wrong
Hugging Face is currently treating your repo like one CSV dataset, but your repo actually contains two different kinds of tables:
instances.csv = node table
interactions.csv = edge table
Those two files do not have the same columns. Your dataset page already shows that exact problem: Hugging Face inferred only one subset (default) and one split (train), then failed with DatasetGenerationCastError because one group of files has columns Source, Target, Weight while another has host, version, registration_enabled, Id, Label. (Hugging Face)
So is it just a matter of time?
Probably not.
This does not look like “the upload is still processing.” It looks like a real schema error. The page is already showing a specific failure, not just a temporary loading state. (Hugging Face)
Why this happens
The Hugging Face dataset viewer is built around a tabular idea: one data point is one row, and features are columns. If it auto-detects many CSV files as belonging to one dataset split, it expects them to share one schema. Your graph data breaks that assumption because each graph is stored as two different tables with different columns. (Hugging Face)
Why the Croissant file is almost empty
On Hugging Face, generated Croissant metadata is built from the dataset-viewer / Parquet pipeline. The official Croissant example shows recordSet entries tied to Hugging Face–converted Parquet files for each config. So if the viewer cannot cleanly build the dataset first, the generated Croissant will often be thin or missing useful recordSet entries. (Hugging Face)
The real cause, in one sentence
Your dataset is not failing because it is a graph dataset.
It is failing because Hugging Face is currently reading it as one mixed CSV dataset with incompatible schemas. (Hugging Face)
The easiest fix
Tell Hugging Face explicitly that these are two separate dataset parts.
Use manual config in README.md, with one config for instances.csv and one for interactions.csv. The docs show that dataset configs use config_name, data_files, split, and path. (Hugging Face)
A simple starting point is:
---
configs:
- config_name: instances
data_files:
- split: train
path: "*/*/*/instances.csv"
- config_name: interactions
data_files:
- split: train
path: "*/*/*/interactions.csv"
---
That tells Hugging Face: “do not mix these files together.” This is exactly the kind of problem manual configuration is for. (Hugging Face)
An even better fix
Add a few columns to both table types so every row says which graph it belongs to, for example:
graph_id
software
graph_type
snapshot_date
That way:
- all node rows can live in one clean schema
- all edge rows can live in one clean schema
- users can filter by graph
- Croissant has a much better chance of becoming meaningful
This is not required by the docs word-for-word, but it matches the viewer’s row/column design much better. (Hugging Face)
Best long-term design
The most Hugging Face-friendly structure is often:
one row = one graph snapshot
For example, one processed dataset where each row contains:
- graph metadata
- counts
- maybe paths to node/edge files
- or another structured representation
That works better with the viewer because the viewer is fundamentally row-based. (Hugging Face)
About the missing “Use this dataset” button
I would treat that as a symptom, not the main problem.
First fix the dataset structure so the viewer can understand it. Then check the page again. Right now the clearer signal is the cast error on the page itself. (Hugging Face)
What to do next
-
Add manual configs to separate instances.csv and interactions.csv. (Hugging Face)
-
Re-push the repo.
-
Check Hugging Face’s dataset server endpoints:
/is-valid
/splits
/first-rows
The docs recommend these endpoints for checking validity, available configs/splits, and preview rows. (Hugging Face)
-
Only after that, check /croissant again. (Hugging Face)
Bottom line
You are close.
The issue is not that your dataset is “too unusual” for Hugging Face. The issue is that Hugging Face needs clearer instructions for how to separate your two table types. Once you stop the node CSVs and edge CSVs from being merged into one inferred split, the viewer should improve, and the Croissant output should likely improve too. (Hugging Face)