How to rebuild the Library of Alexandria?

Hello,

I’m Delse and I would like some help and advice with a personal project that I’ve been thinking about for a long time now.

I’m a total beginner in artificial intelligence and computer coding; I’m a literature and politics major. Nevertheless, I’m aiming to train a model of language using data from literary texts that I’ve selected myself.

This idea came to me when I was looking for a particular poem by a Persian author from the ninth century AD. I tried to use AIs, particularly chatGPT, to help me select his poems; however, the AI was unable to give me original, authentic texts. It could only give me poems that it had composed itself based on an analysis of the author’s style, even if it meant lying to me about the sources (the link was wrong every time). I then tried to get him to record entire books by the author (which I knew to be authentic) and to do the research in the same prompt; but his answers were very approximate, if not false. I ended up doing the work by hand with a word search that left me with a taste of unfinished business.

For several days now, I’ve been trying to find out as much as I can to develop this tool, but my computer skills are so poor that it takes me hours to understand a concept. That’s why I’m sending out this request like a bottle to the sea.

To clarify, I’d like a helping hand to develop an optimised method between: the price, the capabilities of my computer and my personal abilities to train a language model from selected literary data (I have the skills to organise a corpus). The aim is to be able to make :

  • Make connections between the texts in the corpus
  • Have perfectly authentic quotations from the texts in the corpus
  • Create styles from the texts in the corpus (more precise than at present)
  • The ability for everyone to create a digital ‘’phantom‘’ of their knowledge.
  • And why not even more

Thank you very much for taking the time to read this. I’d like to thank everyone who took the time to reply. And I wish you all success in your projects.

Delse

2 Likes

Hello. I find this very interesting. Please do not bemoan your lack of computer skills.
In my opinion, the most important thing is to have what you want to do and the ideas and expertise to do it.

First of all, HF Discord has more active users than this forum, so it is better to ask your question there if possible.

I’m relatively unfamiliar with LLM training, but it seems to me that your goal is achievable to some extent by training LLM. Simply put, you find a good LLM that is not as powerful as ChatGPT, but is a good base, and then you give it a lot of reading of that writer’s writings to give it real knowledge.
This is not the purpose of this project, but there are a variety of models in HF, for example, models that have learned medical knowledge, models that have learned programming, models that have learned to write long sentences, and so on. You can follow these.
If you have any questions, please ask and one of our users, including myself, will answer.

2 Likes