German NLP Repository

Sahajtomar · March 2, 2021, 8:48pm

I am a MSc student in University of Siegen and have keen desire and interest in training NLP models specific to german language. It would be really great to meet people wanting to contribute models or datasets in german language. As of now I have trained and shared 3 models for german langauge such as Question answering models and NER model for legal domain in german.
Feel free to experiment and share your reviews .

More models to come

dikster99 · March 3, 2021, 5:41pm

Hi @Sahajtomar,

I am interested in German/English models that are useful for sentiment analysis and text classification in general (topic detection). I Have labeled datasets for both tasks.

Can you recommend any available model in the repository?

I am facing a blocker issue on a MultiLabel Text Classificatin task (described in a different issue item) would you happen to know how this could be implemented with the current version of Transformers (there seem to be some breaking change around version 3 which make it difficult to review a working sample )…

Thanx Dirk

Sahajtomar · March 3, 2021, 6:33pm

Hey,
Sure … there are two options
1). there are multilingual models like Universal sentence encoder or Sentence bert / roberta models which you can use to get embeddings and train simple ML model over it.
2) There are also zero shot classification models trained on XNLI datasets… which you can use directly. Kindly see this model

Also I am training a model on NLI task specifically in german language… as soon as I am done I will upload the model. Enjoy

dikster99 · March 15, 2021, 9:20am

Hi,

I am actually looking for a TensorFlow model that supports German and English. I am looking at
bert-base-multilingual-uncased-sentiment

right now but this seems to have problems when I load it as TFBertForsequenceClassification or TFAutoModelForsequenceClassification (?) - I get an exception:

cannot reshape array of size 3840 into shape (768,20)

when I execute:
tf_model = TFAutoModelForSequenceClassification.from_pretrained(mydrivePath
, label2id=label2index
, id2label=index2label
, from_pt=True)

(where label2id and id2label are just dictionaries that mab categories to id and back).

Anyways if you happen to know a good Tensorflow model that can be used for Text Classification in English and German I would appreciate the hint

Thanx Dirk

patrickvonplaten · March 17, 2021, 6:19pm

Hey, I’m Patrick, a German Research Engineer at Hugging Face. I will be joining the “Wav2Vec2 Fine-tuning week” starting on Monday next week - see: [Open-to-the-community] XLSR-Wav2Vec2 Fine-Tuning Week for Low-Resource Languages - #14 by ayameRushia .

If you want to participate and have any questions regarding fine-tuning a German speech recognition model, feel free to ping me here

Sahajtomar · March 17, 2021, 9:06pm

hey @patrickvonplaten , I am facing issues with colab disk space. uncompressed data is 22gb for herman language. What are other options to train on large datasets?

patrickvonplaten · March 18, 2021, 6:52am

Hey Sahajtomar,

One option would be to use a colab pro, but we are currently trying to organize more GPU & RAM compute for you guys

sasaadi · October 12, 2021, 11:17am

Hi @Sahajtomar,

Thanks for sharing your trained German models. I would like to know if it would be possible to add a license file with details to your German zero-shot model?

Thanks.

Sahajtomar · October 12, 2021, 11:30am

Hi,
Could you let me know what is license file

sasaadi · October 15, 2021, 11:54am

By license file I mean a file that describes the terms and conditions under which the published model can be used. For example, the BERT model is published under the following license: bert/LICENSE at master · google-research/bert · GitHub

Thanks.

albrecht · November 19, 2021, 7:21pm

Hey @Sahajtomar ! Great job, amazing results, thanks for sharing your work with the community! I was wondering if we could use your model and would really appreciate if you could mention the license under which the model can be used.
Thanks!

chandrantwins · November 21, 2023, 11:40am

Hi @Sahajtomar ,

Do you know any model which support both english and german language for table question answering?

My requirement is , if any body ask

How many customers are there?
Wie viele kunden sind da?

I have to check customers table, to identify how many records in database…

I am quite new too NLP and ML too

Thanks
Chandran

Topic		Replies	Views
RAG Embeddings: German language Beginners	10	6608	May 23, 2024
LLM models to train Aspect-based Sentiment Analysis in German Language Models	0	70	December 9, 2024
Thai NLP - Introductions Languages at Hugging Face	3	1639	October 10, 2022
French NLP - Introduction 🇫🇷 Languages at Hugging Face	4	1221	January 18, 2024
Seeking Advice on Named Entity Recognition with AI Beginners	6	641	February 5, 2025

German NLP Repository

Related topics