Is large language model and foundation model the same thing?

I read a lot about foundation model and large language model.

However, I dont find a clear definition what exactly is a foundation model. Is large language model and foundation model the same thing?

1 Like

from On the Opportunities and Risks of Foundation Models,

We introduce the term foundation models to fill a void in describing the paradigm shift we are witnessing; we briefly recount some of our reasoning for this decision. Existing terms (e.g., pretrained model, self-supervised model) partially capture the technical dimension of these models, but fail to capture the significance of the paradigm shift in an accessible manner for those beyond machine learning. In particular, foundation model designates a model class that are distinctive in their sociological impact and how they have conferred a broad shift in AI research and deployment. In contrast, forms of pretraining and self-supervision that technically foreshadowed foundation models fail to clarify the shift in practices we hope to highlight.

Note that the term “foundation model” is a bit controversial or at least ambiguous. vs. Large language model is “just” a language model that’s large so rather unambiguous in my opinion =)

relevant links courtesy of @lewtun :

A large language model (LLM) is any statistical model of language built on “large” swaths of data and “a lot” of parameters. “Large” and “a lot” is compared to previous approaches just a few years ago; think terabytes of training data rather than megabytes.

  • To understand what a “language model” is more generally, check out “bigram model”, which was very common until 10 years ago.

A foundation model is a term introduced by Stanford and the Stanford Center for Research on Foundation Models (CRFM). Although the paper isn’t entirely clear, characteristics of a foundation model as introduced in this paper include:

  1. Large Language Modeling is used. However, it can be for a model with just language abilities or for models handling language-images or language-somethingelse. The commonality across these models is the use of modern LLM approaches, among other things.
  2. They are trained using self-supervision.
  3. They can function as a “pretrained” model for further fine-tuning.
  4. They are trained on unconsented data that may be in violation of copyright/license.

Other terms to consider in this family include “Large Neural Network”, “Base Model”, “Pretrained Model”, “Parent Model”, “Large self-supervised model”, “Self-supervised multimodal model” – and more!