Explain about Mamba and Jamba

Hi everyone!
When I started writing the article, I didn’t know what category was for me. Because Mamba and Jamba are new tech.
I try to understand the new tech, so watching videos and arxiv papaers. But I have some trouble to understand.
Who can explain about Jamba and Mamba comparing transformer?
I know these new tech will play new role in AI field instead of transformer.
Here are some relevant urls.

https://arxiv.org/pdf/2312.00752

In jamba tutorial video, they mentioned that jamba is powerful to dealing with long context prompting. but I think RAG using vector DB is more powerful than jamba. Am I right?
And I used jamba but as I mentioned above, RAG is more powerful than jamba.
Here is the sample space.

1 Like

Hey !

First and foremost, it’s completely normal to not understand those models in a one shot because they rely on very specific principles and it is impossible to clearly vulgarize them according to me.

To give you some tools to further deepen the resaearch, Mamba and Jamba both share a common ground which is : Selective State Space Models (SSMs). The architecture differs from transformers as it does not rely on attention. It is a middle ground between RNN and CNN according to me.

Like RNNs:

  • They maintain an internal state that gets updated sequentially
  • They can handle variable-length sequences
  • They have a form of memory that carries information forward

Like CNNs:

  • They can process data in parallel
  • They use convolution-like operations
  • They’re efficient to train and can leverage hardware acceleration

The key difference that makes SSMs special is how they handle state updates:

  1. Unlike RNNs, which can suffer from vanishing/exploding gradients, SSMs use a more stable state update mechanism
  2. Unlike CNNs, which have a fixed receptive field, SSMs can theoretically capture dependencies of any length through their state
  3. They can be implemented efficiently using parallel hardware while maintaining the ability to process sequential information

An analogy might help: If you think of:

  • RNNs as being like reading a book one word at a time and keeping notes
  • CNNs as being like looking at different parts of an image through sliding windows
  • Then SSMs are like scanning a document while maintaining a running summary, but being able to do multiple scans in parallel

The problem is that SSMs don’t outperform transformers because they have a sequence length issue which alters their abilities to capture information between two very distant sentences. Hence, Jamba proposed to stack transformers and SSMs blocks. The mixture should be the best of both worlds. However, it is still not the best model as combining both advantages comes also with combining both drawbacks.

Also, RAG is not a model so you cannot compare a RAG to any LLMs. A RAG is a framework which uses an LLM at some point but the whole retrieval part is done by using encoders which are not LLMs.

Hope this helps !

1 Like

Help for your answering.
I think you are misunderstanding of my question.
I mean in retrieval problem, what is better what is useful? Vector DB using LLM(transformers) or Jamba?
In this aspect, I think that Vector DB is better then Jamba.

1 Like

Vector DB is not better than Jamba because it’s simply two different things. A Vector DB is a storage where you feed in embeddings produced after encoding a lot of resources that will be later used for retrieval. Jamba does not produce such encoding because it is “just” an LLM, something trained to generate the response. To do good retrieval you need an encoder and those are for example : sentence bert, MB25, ModernBert…

So VectorDB != Jamba

1 Like

Then you mean that if I want to chat with the large amount of data(e.g)private company’s data), at that time I should use jamba not vectorDB.
I used two methods, Vector DB + openai is better than only Jamba & jamba + vector DB!

1 Like

Yes use a Vector DB and Open AI for a rag it will be probably better than Jamba as the benchmarks are saying.

1 Like