After following the first section of the course, you should be able to fine-tune a model on a text classification problem and upload it back to the Hub. Share your creations here and if you build a cool app using your model, please let us know!
It’s not exactly a project, but I’m super excited to share my first public Kaggle dataset
With the help from good folks at HF, I was able to query the metadata information available on model-hub and upload it as a Kaggle dataset.
It should be helpful to anyone looking to analyze and create EDA/Text-processing notebooks on the metadata of publicly available models. The dataset contains the README modelcard data as well.
Please have a look and provide feedback.
@dk-crazydiv this is very cool! Would you like to add it as a HF dataset as well?
Here is the process in case you’re interested: Sharing your dataset — datasets 1.8.0 documentation
Yes. I was thinking of the following:
- HF modelhub metadata as a HF dataset
- HF datasets metadata as a HF dataset
- HF datasets metadata as a Kaggle dataset
This should complete the inception loop. Will update with progress soon.
great, looking forward to those!
I’ve uploaded the modelhub data in HF datasets. Please provide feedback.
The documentation guide to create and share a dataset were very good, informative and helpful.
I faced a couple of issues(most of which I overcame) while porting the data from kaggle style format to datasets library:
- I couldn’t find any
datetimeobject in Features And I saw a couple of other dataset using
- Since I chose to share it as “community provided”, I had pip installed datasets library and some of the commands in the doc specific to
datasets-cliwhich expected datasets repo cloned didn’t work smoothly with relative paths but worked with absolute paths.
Explore datasetbutton on hub isn’t working on my dataset. Is this because it’s a community provided dataset?
I wrote a beginner-friendly introduction to Huggingface’s
pipeline API on Kaggle-
Published first notebook on the modelhub dataset. Found a couple of interesting insights. Readme analysis attempt is pending.
Hi All, I’ve also written a post on publishing a dataset.
The live sessions of course are super-duper-amazing( course itself is 10x, live speakers make it 100x, only comparable so far to fastai sessions in my experience). Thank you everyone for that. I finally am feeling that by fall, as a results of such high quality livestreams, transformers would be transparent as opposed to me using them only as a pretrained plugnplay blackbox.
More learning-along-the-way projects to follow
this is great stuff @dk-crazydiv - thanks for sharing!
Wrote an introductory article to use the Hugging Face Datasets library for your next NLP project (published on TDS).
I made a small project: fine-tuned a GPT model on my own telegram chat.
To to be precise, I have used Grossmend’s model which is ruDialoGPT3 - a DialoGPT for Russian language (trained on forums). The page provided enough info for me to fine-tune that model!
I exported my own telegram chat (in Russian) and fine-tuned the model on it (for 3 epochs on 30mb of dialogues). After that I uploaded and created a model on Hub as it is described in the videos.
Even though it is in Russian I believe you can still check it out and take some inspiration:
It works great! It uses phrases that are highly specific to my style of writing and that’s cool!
I’m sorry I can’t edit the last message so I’m writing a new one:
I’ve made a Spaces demo using Gradio for my project:
I finished the first section of the course, and as a practice of the concepts learnt I fine-tune a model nlptown/bert-base-multilingual-uncased-sentiment (multilingual model for sentiment analysis on product reviews) on the muchocine dataset, so now it is fine-tune for sentiment analysis on movie reviews in spanish. Here is the Colab of the task. Thanks for the great course!
Edit: I forgot to add the page of the model