hey @thomwolf iām also happy to participate !!
Seems Interesting and looking forward to be a part of it. I would like to help for the Healthcare domain
This is a great opportunity. I would love to contribute!
In particular, I can help with some emotion/humour recognition and generation datasets.
I would also like to participate!
Especially interested in adding Danish datasets.
I would like to contribute !
Would really appreciate if you could add the following Question Rewriting datasets (ordered in terms of size of dataset):
- QReCC - https://github.com/apple/ml-qrecc
- CANARD - https://sites.google.com/view/qanta/projects/canard
- TREC CAsT 2019
The question rewriting task is complements question answering. It will also be interesting to see how the new EncoderDecoder models perform on this task.
I would like to contribute
are programming languages welcome or only human languages?
Yes also code datasets!
Okay, then Iām totally in
Hi
Eager to contribute. Please count me in!
@thomwolf, @patrickvonplaten I want to add bangla nlp datasets to the library. Iām pretty new in here, and I understand the timing is bad too. but Iām more free in the upcoming months and would love to participate in adding the bangla nlp datasets. can you add me to the slack channel?
Well timing is actually perfect, if you have a spare 1-2h between now and next Wednesday, you can join, add them yourself and in the process learn how to use the library and earn a special event gift.
Iām adding you
I want to participate.
I would like to contribute
Hi all, many people are still joining so we have extended the official end of the sprint to next Wednesday (Dec 9th) so that more people can participate in the event!
Also, the great @canwenxu has designed two gifts for the event:
-
a special tee-shirt for everyone who has joined to add a dataset during the sprint
-
in addition, we will send a
mug of the event for the super active participants who have added three datasets and more and we will invite these participants to join our main dataset slack channel and stay on our slack after the event as main contributors of the
datasets
library(you people are really so awesome)!
So donāt hesitate to ping me and join, itās still time!
Cheers,
Thom
email : tasmiah.tahsin@northsouth.edu
I want to join!
@thomwolf I would like to add the following Datasets for Natural Language Inference Tasks in Hindi(low-resource language). Each Datasets consists of textual entailment pairs in Hindi.
Dataset: BBC Hindi News NLI Dataset
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BBC)
Dataset: Hindi Discourse Modes Dataset (HDA)
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/HDA)
We have built these datasets by recasting them from 2 Hindi Classification Datasets going by the same name. I would also like to add these classification datasets to the library if time permits.
@thomwolf In addition to the NLI Hindi Datasets mentioned above I would like to add the following datasets also. We have the classification datasets for these two which we would like to add.
Dataset: BHAAV
Link: (https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/BH)
Dataset: Product Review Hindi
Link: https://github.com/midas-research/hindi-nli-data/tree/master/Textual_Entailment/PR