Share your projects!

After following the first section of the course, you should be able to fine-tune a model on a text classification problem and upload it back to the Hub. Share your creations here and if you build a cool app using your model, please let us know!

7 Likes

It’s not exactly a project, but I’m super excited to share my first public Kaggle dataset

With the help from good folks at HF, I was able to query the metadata information available on model-hub and upload it as a Kaggle dataset.

It should be helpful to anyone looking to analyze and create EDA/Text-processing notebooks on the metadata of publicly available models. The dataset contains the README modelcard data as well.

Please have a look and provide feedback. :slight_smile:

7 Likes

@dk-crazydiv this is very cool! Would you like to add it as a HF dataset as well?

Here is the process in case you’re interested: Sharing your dataset — datasets 1.8.0 documentation

2 Likes

Yes. I was thinking of the following:

  • HF modelhub metadata as a HF dataset
  • HF datasets metadata as a HF dataset
  • HF datasets metadata as a Kaggle dataset

This should complete the inception loop. :smiley: Will update with progress soon.

2 Likes

great, looking forward to those!

Hi Everyone,

I’ve uploaded the modelhub data in HF datasets. Please provide feedback. :slight_smile:

The documentation guide to create and share a dataset were very good, informative and helpful.

I faced a couple of issues(most of which I overcame) while porting the data from kaggle style format to datasets library:

  • I couldn’t find any datetime object in Features And I saw a couple of other dataset using string as well.
  • Since I chose to share it as “community provided”, I had pip installed datasets library and some of the commands in the doc specific to datasets-cli which expected datasets repo cloned didn’t work smoothly with relative paths but worked with absolute paths.
  • The Explore dataset button on hub isn’t working on my dataset. Is this because it’s a community provided dataset?
3 Likes

I wrote a beginner-friendly introduction to Huggingface’s pipeline API on Kaggle-

5 Likes

Published first notebook on the modelhub dataset. Found a couple of interesting insights. :slight_smile: Readme analysis attempt is pending.

3 Likes

Hi All, I’ve also written a post on publishing a dataset.

The live sessions of course are super-duper-amazing( course itself is 10x, live speakers make it 100x, only comparable so far to fastai sessions in my experience). Thank you everyone for that. I finally am feeling that by fall, as a results of such high quality livestreams, transformers would be transparent as opposed to me using them only as a pretrained plugnplay blackbox.

More learning-along-the-way projects to follow :slight_smile:

3 Likes

this is great stuff @dk-crazydiv - thanks for sharing!

Wrote an introductory article to use the Hugging Face Datasets library for your next NLP project (published on TDS).

3 Likes

Hi, everyone!

I made a small project: fine-tuned a GPT model on my own telegram chat.
To to be precise, I have used Grossmend’s model which is ruDialoGPT3 - a DialoGPT for Russian language (trained on forums). The page provided enough info for me to fine-tune that model!
I exported my own telegram chat (in Russian) and fine-tuned the model on it (for 3 epochs on 30mb of dialogues). After that I uploaded and created a model on :hugs: Hub as it is described in the videos.

Even though it is in Russian I believe you can still check it out and take some inspiration:

It works great! It uses phrases that are highly specific to my style of writing and that’s cool!

1 Like

P.S. It turned out pretty easy and I liked it so much that I decided to create a ready-to-use-colab and a github repository so anyone could fine-tune DialoGpt on exported telegram chats!

3 Likes

I’m sorry I can’t edit the last message so I’m writing a new one:
I’ve made a :hugs: Spaces demo using Gradio for my project:

2 Likes

Hey everyone,

I finished the first section of the course, and as a practice of the concepts learnt I fine-tune a model nlptown/bert-base-multilingual-uncased-sentiment (multilingual model for sentiment analysis on product reviews) on the muchocine dataset, so now it is fine-tune for sentiment analysis on movie reviews in spanish. Here is the Colab of the task. Thanks for the great course!

Edit: I forgot to add the page of the model

2 Likes

Hello everyone,

I’ve successfully completed Part 1 of the course and fine-tuned a model for the Movie Genre Prediction competition. You can check out my blog post where I detail my journey and findings related to this project: Myblog - Movie Genre Predictions with Hugging Face Transformers

Hey! :slight_smile:

I’ve also completed part 1 and used a pre-trained (Brazilian) Portuguese transformer model to perform text classification in a (European Portuguese) dataset with 22 classes. The results were not bad but I intend to improve them. I started doing the course when working in a project for a company, so I was learning and applying it straight away! Also, I would like to train a European Portuguese model, since I did not find any (just Brazilian Portuguese) – but that’s for another episode :smiley:

Thanks a lot!!

I finished the first part of the course and I fine-tuned xlm-roberta-pre-trained model, specifically trained on the shmuhammad/AfriSenti-twitter-sentiment dataset focusing on Yoruba tweets (my local Language). It aims to perform sentiment classification on Yoruba tweets, and can also serve as a base model for anyone interested in Yoruba language work. Here is the link