Fixed output length "summarization"/"question-answering"

I’m having some difficulty figuring out how to tackle a particular project. I’m very new to the HF libraries and resources, as well as ML/DL in general.

Project background

The goal is to process an input text and output a fixed-length array of strings whose elements are the data that I need, either inferred or extracted verbatim from the input.

For example, I want to extract the following 4 types of data: name, age, nationality, and specialty.

Given the following text: My name is John. I was born in 1997. My family immigrated to the United States from Taiwan when I was 5. I studied computer science in college and is now working as a software engineer.

The output would be something like this: ["John", "25", "United States", "STEM"]

25 is inferred from the text, as well as STEM. The specialty field is kinda like a classification problem where there will be a fixed range of values to choose from, based on the input text. In this case, it could be something like ("STEM", "social sciences", "arts", "sports", "linguistics"), and the result can be inferred based on the input context.


What kind of problem would this be classified as?

I have a feeling that this would be along the line of text2text generation, maybe something like abstractive summarization. I also tried to look at it as an abstractive question-answering problem, but couldn’t find much resource about that on HF.

How should I go about tackling this problem?

Assuming it’s an abstractive summarization problem, I can following the instructions for summarization tasks on HF, but how would I change the model output to be what I need, i.e. a fixed-length array instead of a string.

Any other approaches worth looking into?

Thank you