[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

wilsonyhlee · December 4, 2020, 10:00pm

I’d love to participate!

jda · December 5, 2020, 12:13am

I’d like to join!

Misbahkhan789 · December 5, 2020, 1:08am

Hey I’d love to participate!

jcblaise · December 5, 2020, 2:30am

I’d love to join! I can contribute resources for the low-resourced Tagalog language.

Email: jan_christian_cruz@dlsu.edu.ph

cubytes · December 5, 2020, 5:10am

I think this initiative is awesome on so many levels. I have need for more high definition “stack-sets” too for what I am trying to accomplish but it might be ideal to have multiple libraries of open-cite datasets.

If the world wants to converge natural language here. I’ll answer that call gladly.

Hello World Converge on this convergence. All people. All language. All speech. Converge on this. All our systems of expression contemplation and communication. I was told Apache Arrow Table can scale quite nicely? Let’s converge on this and put that to the test shall we?

Anyone feel like developing a hugging face -ax extension API to integrate this lovely Rosseta stone with Jax? How about neo4j graph platform intergration?

I have a dataset to submit from Repeval2019:

CODAH: An adversarially-Authored Question Answering Dataset for Common Sense.

I just discovered earlier today that datasets could be tools for the flip side of the model: validation & testing as well.

In the overlap of linguistics, philosophy, and mathematics let’s get some logic connector and discourse marker datasets in the library as well. While on the subject of logic let’s experiment with 3VL or many-valued logic models in here too? Bonus points for finding fundamental logic insights by multi linguistic discourse marker compare and contrast and run models thru infinite Gaussian process I learned about on the distill.pub 2019 visual exploration of Gaussian process.

One more thing…with this emergence of a wide angle broad spectrum multilingual Rosseta stone library – take the opportunity to analyze all the makers especially interpersonal markers across many diverse languages. compare/contrast that too on dual axis infinite Gaussian processing too please and thanks in advance.

Create inference engine for crowd sourced platforms naturally create a hypercore protocol blockchain aspect from stream crypto mining for open cite data?

Sorry for thinking outloud.

DW
cubytes@gmail.com
cubytes Twitter

infinity42 · December 5, 2020, 5:48am

I would like to contribute

param087 · December 5, 2020, 7:03am

Hi @thomwolf, I would love to contribute and do not want to be left aside in this historic sprint .

wahaym · December 5, 2020, 9:05am

I would like to participate

thomwolf · December 5, 2020, 9:12am

Added you all! Ping me if you didn’t receive the invitation!

sushant-prabhu · December 5, 2020, 9:46am

Interested too !

FHaouari · December 5, 2020, 1:18pm

Great initiative. I would love to contribute and add some Arabic datasets to the library.
Here are two Arabic datasets:

ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks paper dataset
ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection paper dataset

vpkprasanna · December 5, 2020, 1:26pm

I am interested to join @thomwolf.
Thank you.

3aliahamwi · December 5, 2020, 1:39pm

I am interested in arabic dataset

gdupont · December 5, 2020, 4:25pm

Not sure it counts purely as healthcare but I added the CORD-19 dataset made by AllenAI (see https://www.semanticscholar.org/cord19) in this PR: https://github.com/huggingface/datasets/pull/1129

It’s a work in progress (only metadata loaded for now) but I’m working on adding article full text and pre-computed document embeddings.

orendar · December 5, 2020, 4:59pm

Hey, amazing initiative! I would love to join as well, and maybe contribute a dataset in Hebrew

olinguyen · December 5, 2020, 5:27pm

I would love to join and participate !

olinguyen · December 5, 2020, 5:43pm

I’d like to help out for this one if you’re open to it!

aseifert · December 5, 2020, 11:22pm

Count me in, please

Davlan · December 5, 2020, 11:37pm

Great work, I would love to contribute some African language datasets. Thanks

lxs1 · December 6, 2020, 1:35am

Hi @thomwolf , I am interested in helping out. Please add me to the slack channel and send any other important information to contribute.

Topic		Replies	Views
Korean NLP - Introductions Languages at Hugging Face	2	1241	June 27, 2023
HuggingFace 🤗 is all you need for NLP and beyond [BLOG] 🤗Transformers	1	852	May 28, 2022
Collaborating with HuggingFace on Python Integration? Site Feedback	1	20	February 3, 2025
EMNLP Picks from the Hugging Face Science Team Research	1	4063	December 2, 2020
New disk usage quota for Hugging Face users, from December 2024 Beginners	3	176	December 11, 2024

[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

Related topics