After following the first section of the course, you should be able to fine-tune a model on a text classification problem and upload it back to the Hub. Share your creations here and if you build a cool app using your model, please let us know!
It’s not exactly a project, but I’m super excited to share my first public Kaggle dataset
With the help from good folks at HF, I was able to query the metadata information available on model-hub and upload it as a Kaggle dataset.
It should be helpful to anyone looking to analyze and create EDA/Text-processing notebooks on the metadata of publicly available models. The dataset contains the README modelcard data as well.
Please have a look and provide feedback.
@dk-crazydiv this is very cool! Would you like to add it as a HF dataset as well?
Here is the process in case you’re interested: Sharing your dataset — datasets 1.8.0 documentation
Yes. I was thinking of the following:
- HF modelhub metadata as a HF dataset
- HF datasets metadata as a HF dataset
- HF datasets metadata as a Kaggle dataset
This should complete the inception loop. Will update with progress soon.
great, looking forward to those!
Hi Everyone,
I’ve uploaded the modelhub data in HF datasets. Please provide feedback.
The documentation guide to create and share a dataset were very good, informative and helpful.
I faced a couple of issues(most of which I overcame) while porting the data from kaggle style format to datasets library:
- I couldn’t find any
datetime
object in Features And I saw a couple of other dataset usingstring
as well. - Since I chose to share it as “community provided”, I had pip installed datasets library and some of the commands in the doc specific to
datasets-cli
which expected datasets repo cloned didn’t work smoothly with relative paths but worked with absolute paths. - The
Explore dataset
button on hub isn’t working on my dataset. Is this because it’s a community provided dataset?
I wrote a beginner-friendly introduction to Huggingface’s pipeline
API on Kaggle-
Published first notebook on the modelhub dataset. Found a couple of interesting insights. Readme analysis attempt is pending.
Hi All, I’ve also written a post on publishing a dataset.
The live sessions of course are super-duper-amazing( course itself is 10x, live speakers make it 100x, only comparable so far to fastai sessions in my experience). Thank you everyone for that. I finally am feeling that by fall, as a results of such high quality livestreams, transformers would be transparent as opposed to me using them only as a pretrained plugnplay blackbox.
More learning-along-the-way projects to follow
this is great stuff @dk-crazydiv - thanks for sharing!
Wrote an introductory article to use the Hugging Face Datasets library for your next NLP project (published on TDS).
Hi, everyone!
I made a small project: fine-tuned a GPT model on my own telegram chat.
To to be precise, I have used Grossmend’s model which is ruDialoGPT3 - a DialoGPT for Russian language (trained on forums). The page provided enough info for me to fine-tune that model!
I exported my own telegram chat (in Russian) and fine-tuned the model on it (for 3 epochs on 30mb of dialogues). After that I uploaded and created a model on Hub as it is described in the videos.
Even though it is in Russian I believe you can still check it out and take some inspiration:
It works great! It uses phrases that are highly specific to my style of writing and that’s cool!
P.S. It turned out pretty easy and I liked it so much that I decided to create a ready-to-use-colab and a github repository so anyone could fine-tune DialoGpt on exported telegram chats!
I’m sorry I can’t edit the last message so I’m writing a new one:
I’ve made a Spaces demo using Gradio for my project:
Hey everyone,
I finished the first section of the course, and as a practice of the concepts learnt I fine-tune a model nlptown/bert-base-multilingual-uncased-sentiment (multilingual model for sentiment analysis on product reviews) on the muchocine dataset, so now it is fine-tune for sentiment analysis on movie reviews in spanish. Here is the Colab of the task. Thanks for the great course!
Edit: I forgot to add the page of the model
Hello everyone,
I’ve successfully completed Part 1 of the course and fine-tuned a model for the Movie Genre Prediction competition. You can check out my blog post where I detail my journey and findings related to this project: Myblog - Movie Genre Predictions with Hugging Face Transformers
Hey!
I’ve also completed part 1 and used a pre-trained (Brazilian) Portuguese transformer model to perform text classification in a (European Portuguese) dataset with 22 classes. The results were not bad but I intend to improve them. I started doing the course when working in a project for a company, so I was learning and applying it straight away! Also, I would like to train a European Portuguese model, since I did not find any (just Brazilian Portuguese) – but that’s for another episode
Thanks a lot!!
I finished the first part of the course and I fine-tuned xlm-roberta-pre-trained model, specifically trained on the shmuhammad/AfriSenti-twitter-sentiment dataset focusing on Yoruba tweets (my local Language). It aims to perform sentiment classification on Yoruba tweets, and can also serve as a base model for anyone interested in Yoruba language work. Here is the link
I fine tuned a model to classify reviews as positive or negative by training it on a Yelp reviews dataset. I wrote a blog post explaining how I did it.