Triskel Data Cleaned & Structured AI Datasets ($25 USD Flat)
I’ve released a full suite of cleaned, structured, and tokenized datasets ready for direct AI training.
Browse full list: Shop - Triskel Data
Available Now $25 USD Each (Full Access):
- Wikipedia – 26.1B tokens
- Reddit Submissions – 2.6B tokens
- Reddit Comments – 13.0B tokens
- PubMed – 5.8B tokens
- Project CodeNet – 6.1B tokens
- OpenAlex – 77.6B tokens
- Medical Journals – 354M tokens
- GeoNames (All Countries) – 835M tokens only $10 USD
Why the Low Price?
- All datasets are cleaned, deduplicated, and structured
- Formatted in
.jsonl
, tokenized, and ingestion-ready - Fees only cover hosting + processing costs
- No scraped junk, no unstructured mess
License Terms:
- Use for R&D, personal model training, academic fine-tuning
- No redistribution, resale, or commercial deployment
Stop wasting compute on garbage.
Train on signal.