Greetings, huggingface community. I want to generate triples(either simple three-word sentences or a little more complex RDF format if possible) from a news texts dataset. The triples will be evaluated by back-translating them into text again and compare the similarity between the original and synthetic news articles. The model should learn some persistent vocabulary in addition to the words in the text, to communicate hints about the text to the back-translator.
- A text-to-RDF model where triple quality is trainable.
- A RDF-to-text model that can learn to imitate journalistic writing.
- A text-vs-text similarity evaluation that provides a training signal to 1 and 2.
I will focus my question on 1) I have seen that some models allow for control over the generation, is this powerful enough to specify my target triple syntax? I have also seen that adapters are an efficient alternative to fine-tuning, would it be better to use those? Something else? If there is no obvious way of doing it, which is the best path to building an extension and to what?
Guidance would be greatly appreciated!