Fundamental newbie questions

New to NLP / transformers - tried some examples and it is awesome. Love it! great work.

I am trying to create a Q&A system - to answer questions from a corpus of pdf documents in English.

questions where i need help to correct my understanding -

  1. any example of fine tuning a pre-trained model on your own custom data set from PDF documents available? My understanding is i need to use a pretrained model for QA and then fine tune it with my own questions and answers from my corpus to increase the model accuracy.

  2. need more info on pipelines - specially text2text-generation. how do i see what models and parameters does it use behind the abstraction? in python how do i access the model metadata being used when i use pipelines. I love the abstraction but would also like control on tweaking the parameters being used by the pipeline.

  3. whats the best way to save the models in the cloud so that they can be pointed to for inference instead of getting downloaded?

again - awesome work!

For 1, your use case does seem a little specific, so there is no example of that exactly in our examples. Otherwise all our examples are in the examples folder of the repo and there is a tutorial on how to fine-tune your model on a custom dataset in the documentation.

For 2, if you need more control, you should directly use the tokenizer/model and not the pipeline API. The task summary tutorial shows examples of both ways on most tasks supported by the library.

For 3, you can upload models to our model hub. There is a (paying) inference API to use them directly without downloads.

1 Like