Dark zones of LLM

The idea of building embeddings from embeddings or CNN for LLM

Let’s imagine that we have a neural network that can assemble a sequence of embeddings into one vector of higher dimension. It’s like a convolutional network, but not for images, but for embeddings/texts. That is, we get coordinates in the space of meaning not for individual tokens, but for their totality, for the entire text/text fragment. This resulting vector cannot be made very large due to the amount of calculations and we will not get a good network for large texts, not to mention all the knowledge and texts accumulated by humanity. But the problem can be solved for one sentence, for a theorem, a chemical formula, an engineering solution, a poem, a short story. We are fine with the loss of the original sequence of tokens, as long as the meaning of this sequence is preserved.

Such an add-on to embeddings (let’s call it Super LLM/SLLM) will be trained without a teacher - at the same time an encoder and a decoder. The encoder’s task is to construct a vector of a larger size, but in a single quantity. A sequence of embeddings is supplied to the encoder input, and one vector is output. The decoder’s task is to reconstruct the embedding sequence from it. One vector is supplied to the decoder input, and a sequence of embeddings is expected at the output. The loss function will need to be constructed so that the decoder tries to restore the meaning rather than just the sequence of embeddings leading to the same sequence of tokens. You need to train this SLLM after finishing training the basic LLM, but without alignments.

The big advantage is that SLLM may be trained not on the entire corpus of texts of the basic LLM. For example, we may be interested in just one area of natural laws, human knowledge, or art. Or some subset of these areas. In human understanding, this SLLM with a basic LLM should literally understand the meaning of texts in their entirety, but in a narrow subject area.

Zones of darkness

Now why is all this needed? The term zone of darkness may not be accurate, I do not imply any occultism. The term does not carry anything dark, scary or dangerous. It simply means the absence of light. We cannot make out, what is actually happening at this point, in this vector, in this embedding. The dark zone is a set of parameter values of the final vector that never occur for the source data.

The SLLM encoder creates a vector from incoming embeddings. Common sense dictates that with a wide variety of types, topics and content of texts, the encoder should try to occupy the entire available multidimensional space of meanings, should try to use all possible combinations of parameters with small values. But it may turn out that there are voids in the distribution of embdings. Perhaps there are areas with such parameter values that are not encountered when processing all embeddings during SLLM training. There may be areas, even closed areas, into which the resulting vectors do not fall or almost do not fall. It may even turn out that these areas are not chaotic, but have certain sizes or shapes. Why does this happen? Why doesn’t SLLM try to occupy these areas in its training? Is this a random process or not? How common are such areas?

I have an assumption that such areas should be formed and contain knowledge and patterns that SLLM was able to capture in the learning process, but which are not directly found in the texts used for learning. Humanity either did not understand these patterns or ignored this knowledge.

Why dark zones might be important

There is a well-known example of how a neural network used to analyze fluorography was able to determine race from images. Although people do not know how to do this (even if because they were never interested in this) and could not teach it. That is, the neural network itself discovered new patterns and knowledge during training. Didn’t the same thing happen in LLM/Transformers? Couldn’t dark zones be not just emptiness, but new knowledge, ideas and concepts? We can force the SLMM and LLM decoders to give us text that matches the selected embeddings from the darkness zone. What will we see there? Rave? Hallucination? Just a platitude? Or an unusual idea? Ignored pattern? We won’t know until we try.

Is the study of dark zones a new, unappreciated and underappreciated method of scientific research? We train SLLM based on our existing knowledge and experimental results. And after that, we feed embeddings from the dark zones to the decoder input. Will we be able to obtain new knowledge, new theories, unknown chemical compounds? And if you train transformers in several sciences at the same time, will you get new scientific disciplines at the intersection of existing ones? What if the results obtained from the embeddings of the dark zones were used to train the next version of LLM? Wouldn’t it turn out to be a neural network for understanding the world, which first generalizes what is already known, then tries to obtain new knowledge from the areas of darkness, selects the least delusional from them and uses this knowledge to further educate itself, and this continues the cycle.

Zones of darkness (if they exist) are a clear formal criterion for where and how to get new knowledge. Embeddings that are not encountered during SLLM training are found and sent to the decoder input, first SLLM, then basic LLM.

A simple experiment to test the idea’s functionality

We take a set of texts on mathematics at the elementary school level. We exclude from training all fragments of texts on any one selected topic. Let’s train SLLM, let’s call this neural network NN1. We add the previously excluded texts and additionally train NN1, let’s call this network NN2. Now we feed the initially excluded texts to the input of the NN2 encoder and get a set of embeddings that should not be encountered when training NN1 (this can even be checked if you save all the embeddings encountered). And we feed these embeddings to the input of both the NN1 decoder and the NN2 decoder. And then we compare the results. If the idea works, we should get similar results. You can feed them further to the inputs of the basic LLM decoder and compare the generated texts.

This means that NN1 already has knowledge that did not exist explicitly during training. And they are in areas of darkness. And it’s enough to either learn to find such zones, or simply go through the most suitable options.