[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

Iā€™d love to participate!

1 Like

Iā€™d like to join!

1 Like

Hey Iā€™d love to participate!

1 Like

Iā€™d love to join! I can contribute resources for the low-resourced Tagalog language.

Email: jan_christian_cruz@dlsu.edu.ph

1 Like

I think this initiative is awesome on so many levels. I have need for more high definition ā€œstack-setsā€ too for what I am trying to accomplish but it might be ideal to have multiple libraries of open-cite datasets.

If the world wants to converge natural language here. Iā€™ll answer that call gladly.

Hello World :smiley: Converge on this convergence. All people. All language. All speech. Converge on this. All our systems of expression contemplation and communication. I was told Apache Arrow Table can scale quite nicely? Letā€™s converge on this and put that to the test shall we?

Anyone feel like developing a hugging face -ax extension API to integrate this lovely Rosseta stone with Jax? How about neo4j graph platform intergration?

I have a dataset to submit from Repeval2019:

CODAH: An adversarially-Authored Question Answering Dataset for Common Sense.

I just discovered earlier today that datasets could be tools for the flip side of the model: validation & testing as well.

In the overlap of linguistics, philosophy, and mathematics letā€™s get some logic connector and discourse marker datasets in the library as well. While on the subject of logic letā€™s experiment with 3VL or many-valued logic models in here too? Bonus points for finding fundamental logic insights by multi linguistic discourse marker compare and contrast and run models thru infinite Gaussian process I learned about on the distill.pub 2019 visual exploration of Gaussian process.

One more thingā€¦with this emergence of a wide angle broad spectrum multilingual Rosseta stone library ā€“ take the opportunity to analyze all the makers especially interpersonal markers across many diverse languages. compare/contrast that too on dual axis infinite Gaussian processing too please and thanks in advance.

Create inference engine for crowd sourced platforms naturally create a hypercore protocol blockchain aspect from stream crypto mining for open cite data?

Sorry for thinking outloud.

DW
cubytes@gmail.com
cubytes Twitter

1 Like

I would like to contribute

1 Like

Hi @thomwolf, I would love to contribute and do not want to be left aside in this historic sprint :grinning:.

1 Like

I would like to participate

1 Like

Added you all! Ping me if you didnā€™t receive the invitation!

Interested too !

1 Like

Great initiative. I would love to contribute and add some Arabic datasets to the library.
Here are two Arabic datasets:

  1. ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks paper dataset
  2. ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection paper dataset
1 Like

I am interested to join @thomwolf.
Thank you.

1 Like

I am interested in arabic dataset

2 Likes

Not sure it counts purely as healthcare but I added the CORD-19 dataset made by AllenAI (see https://www.semanticscholar.org/cord19) in this PR: https://github.com/huggingface/datasets/pull/1129

Itā€™s a work in progress (only metadata loaded for now) but Iā€™m working on adding article full text and pre-computed document embeddings.

Hey, amazing initiative! I would love to join as well, and maybe contribute a dataset in Hebrew :slight_smile:

1 Like

I would love to join and participate :slight_smile: !

1 Like

Iā€™d like to help out for this one if youā€™re open to it!

Count me in, please :slight_smile:

1 Like

Great work, I would love to contribute some African language datasets. Thanks

1 Like

Hi @thomwolf , I am interested in helping out. Please add me to the slack channel and send any other important information to contribute.

1 Like