[Open-to-the-community] One week team-effort to reach v2.0 of HF datasets library

I’d love to participate!

1 Like

I’d like to join!

1 Like

Hey I’d love to participate!

1 Like

I’d love to join! I can contribute resources for the low-resourced Tagalog language.

Email: jan_christian_cruz@dlsu.edu.ph

1 Like

I think this initiative is awesome on so many levels. I have need for more high definition ā€œstack-setsā€ too for what I am trying to accomplish but it might be ideal to have multiple libraries of open-cite datasets.

If the world wants to converge natural language here. I’ll answer that call gladly.

Hello World :smiley: Converge on this convergence. All people. All language. All speech. Converge on this. All our systems of expression contemplation and communication. I was told Apache Arrow Table can scale quite nicely? Let’s converge on this and put that to the test shall we?

Anyone feel like developing a hugging face -ax extension API to integrate this lovely Rosseta stone with Jax? How about neo4j graph platform intergration?

I have a dataset to submit from Repeval2019:

CODAH: An adversarially-Authored Question Answering Dataset for Common Sense.

I just discovered earlier today that datasets could be tools for the flip side of the model: validation & testing as well.

In the overlap of linguistics, philosophy, and mathematics let’s get some logic connector and discourse marker datasets in the library as well. While on the subject of logic let’s experiment with 3VL or many-valued logic models in here too? Bonus points for finding fundamental logic insights by multi linguistic discourse marker compare and contrast and run models thru infinite Gaussian process I learned about on the distill.pub 2019 visual exploration of Gaussian process.

One more thing…with this emergence of a wide angle broad spectrum multilingual Rosseta stone library – take the opportunity to analyze all the makers especially interpersonal markers across many diverse languages. compare/contrast that too on dual axis infinite Gaussian processing too please and thanks in advance.

Create inference engine for crowd sourced platforms naturally create a hypercore protocol blockchain aspect from stream crypto mining for open cite data?

Sorry for thinking outloud.

DW
cubytes@gmail.com
cubytes Twitter

1 Like

I would like to contribute

1 Like

Hi @thomwolf, I would love to contribute and do not want to be left aside in this historic sprint :grinning:.

1 Like

I would like to participate

1 Like

Added you all! Ping me if you didn’t receive the invitation!

Interested too !

1 Like

Great initiative. I would love to contribute and add some Arabic datasets to the library.
Here are two Arabic datasets:

  1. ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks paper dataset
  2. ArCOV19-Rumors: Arabic COVID-19 Twitter Dataset for Misinformation Detection paper dataset
1 Like

I am interested to join @thomwolf.
Thank you.

1 Like

I am interested in arabic dataset

2 Likes

Not sure it counts purely as healthcare but I added the CORD-19 dataset made by AllenAI (see https://www.semanticscholar.org/cord19) in this PR: https://github.com/huggingface/datasets/pull/1129

It’s a work in progress (only metadata loaded for now) but I’m working on adding article full text and pre-computed document embeddings.

Hey, amazing initiative! I would love to join as well, and maybe contribute a dataset in Hebrew :slight_smile:

1 Like

I would love to join and participate :slight_smile: !

1 Like

I’d like to help out for this one if you’re open to it!

Count me in, please :slight_smile:

1 Like

Great work, I would love to contribute some African language datasets. Thanks

1 Like

Hi @thomwolf , I am interested in helping out. Please add me to the slack channel and send any other important information to contribute.

1 Like