Hi All,
I am new to this forum. I am looking for the Cohere datasets below and see 404. Could someone point alternative location for these?
https://huggingface.co/datasets/Cohere/wikipedia-22-12-simple-embeddings
both en and de are needed.
https://huggingface.co/datasets/Cohere/miracl-en-corpus-22-12
Thanks in advance
GRB
1 Like
Can anyone help? or response from admin?
1 Like
Only third-party mirrors can be found. If you absolutely need the genuine article, you may have to contact Cohere via the Community tab for any model or dataset…
Mirrors / alternative hosts (not the Hugging Face dataset pages)
Wikipedia embeddings mirrors
- Gitee mirror for the simple subset:
hf-datasets/wikipedia-22-12-simple-embeddings (Gitee)
- Gitee AI mirrors for Cohere’s language datasets (including DE):
Cohere/wikipedia-22-12-de-embeddings (Gite AI)
- Another mirror endpoint showing EN: (Gite AI)
MIRACL EN corpus mirror
- Gitee AI mirror:
Cohere/miracl-en-corpus-22-12 (Gite AI)
- Elastic Rally track hosts small packaged subsets for benchmarking (useful if you only need a sample): (rally-tracks.elastic.co)
MIRACL documents (no Cohere embeddings)
- The upstream MIRACL corpus mirror (Apache-2.0):
hf-datasets/miracl-corpus. (Gitee)