My goal is to seamlessly integrate a non-neural LM into the Hugging Face ecosystem. The system, MBLM, implements a fast approximate k-NN next word predictor that can run in autoregressive (CausalLM) mode. Internally the core next-word prediction step produces a probability distribution over tokens that could be exposed to the outside.
I’ve been looking into writing a custom version of a PreTrainedModel and was looking for some guidelines when the model is truly non-neural (but functionally compatible as sketched above).
Shameless plug: this is a CPU-only eco-friendly LLM alternative with great scaling abilities. Incremental learning, fast, explicit memorization of training data.
Thanks for sharing tips!
Antal