[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

matthias · December 2, 2020, 12:32pm

hey @thomwolf i’m also happy to participate !!

Vinit1494 · December 2, 2020, 1:02pm

Seems Interesting and looking forward to be a part of it. I would like to help for the Healthcare domain

chinmayrane16 · December 2, 2020, 2:06pm

This is a great opportunity. I would love to contribute!
In particular, I can help with some emotion/humour recognition and generation datasets.

ophelielacroix · December 2, 2020, 4:53pm

I would also like to participate!
Especially interested in adding Danish datasets.

tuner007 · December 2, 2020, 6:20pm

I would like to contribute !

salman · December 2, 2020, 7:57pm

Would really appreciate if you could add the following Question Rewriting datasets (ordered in terms of size of dataset):

QReCC - https://github.com/apple/ml-qrecc
CANARD - https://sites.google.com/view/qanta/projects/canard
TREC CAsT 2019

The question rewriting task is complements question answering. It will also be interesting to see how the new EncoderDecoder models perform on this task.

anirudhr20 · December 2, 2020, 8:49pm

I would like to contribute

ncoop57 · December 2, 2020, 10:34pm

are programming languages welcome or only human languages?

thomwolf · December 2, 2020, 10:35pm

Yes also code datasets!

ncoop57 · December 2, 2020, 10:36pm

Okay, then I’m totally in

vijay · December 3, 2020, 7:12am

Hi
Eager to contribute. Please count me in!

Tahsin-Mayeesha · December 3, 2020, 7:18am

@thomwolf, @patrickvonplaten I want to add bangla nlp datasets to the library. I’m pretty new in here, and I understand the timing is bad too. but I’m more free in the upcoming months and would love to participate in adding the bangla nlp datasets. can you add me to the slack channel?

thomwolf · December 3, 2020, 7:40am

Well timing is actually perfect, if you have a spare 1-2h between now and next Wednesday, you can join, add them yourself and in the process learn how to use the library and earn a special event gift.

I’m adding you

Shivangsharma · December 3, 2020, 7:40am

I want to participate.

leenkweider · December 3, 2020, 7:40am

I would like to contribute

thomwolf · December 3, 2020, 7:55am

Hi all, many people are still joining so we have extended the official end of the sprint to next Wednesday (Dec 9th) so that more people can participate in the event!

Also, the great @canwenxu has designed two gifts for the event:

a special tee-shirt for everyone who has joined to add a dataset during the sprint

tee-shirt1996×1328 157 KB
in addition, we will send a mug of the event for the super active participants who have added three datasets and more and we will invite these participants to join our main dataset slack channel and stay on our slack after the event as main contributors of the datasets library (you people are really so awesome)!

mug_21994×1390 112 KB

So don’t hesitate to ping me and join, it’s still time!

Cheers,

Thom

Tahsin-Mayeesha · December 3, 2020, 8:02am

email : tasmiah.tahsin@northsouth.edu

ttj · December 3, 2020, 8:55am

I want to join!

avinash123 · December 3, 2020, 9:07am

@thomwolf I would like to add the following Datasets for Natural Language Inference Tasks in Hindi(low-resource language). Each Datasets consists of textual entailment pairs in Hindi.

Dataset: BBC Hindi News NLI Dataset
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BBC)

Dataset: Hindi Discourse Modes Dataset (HDA)
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/HDA)

We have built these datasets by recasting them from 2 Hindi Classification Datasets going by the same name. I would also like to add these classification datasets to the library if time permits.

avinash123 · December 3, 2020, 9:13am

@thomwolf In addition to the NLI Hindi Datasets mentioned above I would like to add the following datasets also. We have the classification datasets for these two which we would like to add.

Dataset: BHAAV
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BH)

Dataset: Product Review Hindi
Link: https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/PR

Topic		Replies	Views
Korean NLP - Introductions Languages at Hugging Face	2	1241	June 27, 2023
HuggingFace 🤗 is all you need for NLP and beyond [BLOG] 🤗Transformers	1	859	May 28, 2022
Collaborating with HuggingFace on Python Integration? Site Feedback	1	21	February 3, 2025
EMNLP Picks from the Hugging Face Science Team Research	1	4067	December 2, 2020
New disk usage quota for Hugging Face users, from December 2024 Beginners	3	183	December 11, 2024

[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

Related topics