[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

hey @thomwolf iā€™m also happy to participate !! :hugs:

1 Like

Seems Interesting and looking forward to be a part of it. I would like to help for the Healthcare domain

2 Likes

This is a great opportunity. I would love to contribute! :hugs:
In particular, I can help with some emotion/humour recognition and generation datasets.

1 Like

I would also like to participate!
Especially interested in adding Danish datasets.

1 Like

I would like to contribute ! :hugs:

1 Like

Would really appreciate if you could add the following Question Rewriting datasets (ordered in terms of size of dataset):

  1. QReCC - https://github.com/apple/ml-qrecc
  2. CANARD - https://sites.google.com/view/qanta/projects/canard
  3. TREC CAsT 2019

The question rewriting task is complements question answering. It will also be interesting to see how the new EncoderDecoder models perform on this task.

I would like to contribute

1 Like

are programming languages welcome or only human languages?

Yes also code datasets!

Okay, then Iā€™m totally in :smiley:

1 Like

Hi
Eager to contribute. Please count me in!

1 Like

@thomwolf, @patrickvonplaten I want to add bangla nlp datasets to the library. Iā€™m pretty new in here, and I understand the timing is bad too. but Iā€™m more free in the upcoming months and would love to participate in adding the bangla nlp datasets. can you add me to the slack channel?

1 Like

Well timing is actually perfect, if you have a spare 1-2h between now and next Wednesday, you can join, add them yourself and in the process learn how to use the library and earn a special event gift.

Iā€™m adding you :slight_smile:

1 Like

I want to participate.

1 Like

I would like to contribute

1 Like

Hi all, many people are still joining so we have extended the official end of the sprint to next Wednesday (Dec 9th) so that more people can participate in the event!

Also, the great @canwenxu has designed two gifts for the event:

  • a special tee-shirt for everyone who has joined to add a dataset during the sprint

  • in addition, we will send a :hugs: mug of the event for the super active participants who have added three datasets and more and we will invite these participants to join our main dataset slack channel and stay on our slack after the event as main contributors of the :hugs: datasets library :tada: (you people are really so awesome)!

So donā€™t hesitate to ping me and join, itā€™s still time!

Cheers,

Thom

7 Likes

email : tasmiah.tahsin@northsouth.edu

1 Like

I want to join! :hugs:

1 Like

@thomwolf I would like to add the following Datasets for Natural Language Inference Tasks in Hindi(low-resource language). Each Datasets consists of textual entailment pairs in Hindi.

Dataset: BBC Hindi News NLI Dataset
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BBC)

Dataset: Hindi Discourse Modes Dataset (HDA)
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/HDA)

We have built these datasets by recasting them from 2 Hindi Classification Datasets going by the same name. I would also like to add these classification datasets to the library if time permits.

1 Like

@thomwolf In addition to the NLI Hindi Datasets mentioned above I would like to add the following datasets also. We have the classification datasets for these two which we would like to add.

Dataset: BHAAV
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BH)

Dataset: Product Review Hindi
Link: https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/PR

1 Like