[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

jonatasgrosman · December 7, 2020, 2:35am

@thomwolf I wanna contribute too

taihim672 · December 7, 2020, 4:21am

@thomwolf I’d like to work on contributing datasets in Urdu.

thomwolf · December 7, 2020, 6:47am

Ok, added you all, check your spam folder if you feel like you didn’t receive an invitation email!

gdupont · December 7, 2020, 8:22am

The PR is already on its way: https://github.com/huggingface/datasets/pull/1129
I’m working on integrating the full_text of articles to extend the possibilities. Next steps would be to integrate the bib references and the doc embeddings.

Philipp · December 7, 2020, 8:48am

@thomwolf I have a 400 GB German High Quality Web Text Corpus available and would be happy to contribute it. Is this possible or is it too large?

ghosh-r · December 7, 2020, 10:31am

I am genuinely interested to be a part of this effort. Please add me to the Slack channel.

trtm · December 7, 2020, 2:40pm

Hey, @thomwolf I already planned to contribute with some argument mining datasets and would like to use this opportunity to final do so!

mhedderich · December 7, 2020, 3:14pm

@thomwolf Great project! We worked in the past on classification for low-resource languages and I’d like to add datasets from that area.

rpatel12 · December 7, 2020, 3:37pm

I would be glad to add one entity type prediction data and one knowledge graph dataset

chz816 · December 7, 2020, 4:06pm

That is very interesting! How can we participate?

thomwolf · December 7, 2020, 4:47pm

Invited you all to the slack channel!

If you think you didn’t received the invitation, check you your spam folder

See you on the slack

nielsr · December 7, 2020, 4:50pm

I’m gonna add one (or more) Dutch datasets! Sign me up please

thomwolf · December 7, 2020, 4:53pm

You’re added Niels! Welcome

Karthik-Bhaskar · December 7, 2020, 7:38pm

Hi,

I would like to participate. Please add me.

Thanks.

imvladikon · December 7, 2020, 10:00pm

Great initiative! Hope it’s not late, I’d like to join as well and contribute a dataset in Hebrew (have some news dataset)

Jasmeet · December 8, 2020, 12:13am

Hi ,

I am interested in contributing, i know other languages like Hindi and Punjabi.

Bmanikan · December 8, 2020, 6:16am

Hi, I could add some Malayalam and Urdu datasets. Please sign me up!

Prachi · December 8, 2020, 6:46am

Hey! I want to participate.

gautamgupta1811 · December 8, 2020, 8:14am

Hello! I am interested in contributing. Please add me

thomwolf · December 8, 2020, 9:14am

You are now all invited to the slack channel!

If you don’t see the invitation, check you your spam folder

Talk to you on the (super active) slack!

Topic		Replies	Views
Korean NLP - Introductions Languages at Hugging Face	2	1241	June 27, 2023
HuggingFace 🤗 is all you need for NLP and beyond [BLOG] 🤗Transformers	1	852	May 28, 2022
Collaborating with HuggingFace on Python Integration? Site Feedback	1	20	February 3, 2025
EMNLP Picks from the Hugging Face Science Team Research	1	4063	December 2, 2020
New disk usage quota for Hugging Face users, from December 2024 Beginners	3	176	December 11, 2024

[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

Related topics