Explain about Mamba and Jamba

Alanturner2 · January 17, 2025, 8:09am

Hi everyone!
When I started writing the article, I didn’t know what category was for me. Because Mamba and Jamba are new tech.
I try to understand the new tech, so watching videos and arxiv papaers. But I have some trouble to understand.
Who can explain about Jamba and Mamba comparing transformer?
I know these new tech will play new role in AI field instead of transformer.
Here are some relevant urls.

https://arxiv.org/pdf/2312.00752

In jamba tutorial video, they mentioned that jamba is powerful to dealing with long context prompting. but I think RAG using vector DB is more powerful than jamba. Am I right?
And I used jamba but as I mentioned above, RAG is more powerful than jamba.
Here is the sample space.

samchain · January 19, 2025, 1:14pm

Hey !

First and foremost, it’s completely normal to not understand those models in a one shot because they rely on very specific principles and it is impossible to clearly vulgarize them according to me.

To give you some tools to further deepen the resaearch, Mamba and Jamba both share a common ground which is : Selective State Space Models (SSMs). The architecture differs from transformers as it does not rely on attention. It is a middle ground between RNN and CNN according to me.

Like RNNs:

They maintain an internal state that gets updated sequentially
They can handle variable-length sequences
They have a form of memory that carries information forward

Like CNNs:

They can process data in parallel
They use convolution-like operations
They’re efficient to train and can leverage hardware acceleration

The key difference that makes SSMs special is how they handle state updates:

Unlike RNNs, which can suffer from vanishing/exploding gradients, SSMs use a more stable state update mechanism
Unlike CNNs, which have a fixed receptive field, SSMs can theoretically capture dependencies of any length through their state
They can be implemented efficiently using parallel hardware while maintaining the ability to process sequential information

An analogy might help: If you think of:

RNNs as being like reading a book one word at a time and keeping notes
CNNs as being like looking at different parts of an image through sliding windows
Then SSMs are like scanning a document while maintaining a running summary, but being able to do multiple scans in parallel

The problem is that SSMs don’t outperform transformers because they have a sequence length issue which alters their abilities to capture information between two very distant sentences. Hence, Jamba proposed to stack transformers and SSMs blocks. The mixture should be the best of both worlds. However, it is still not the best model as combining both advantages comes also with combining both drawbacks.

Also, RAG is not a model so you cannot compare a RAG to any LLMs. A RAG is a framework which uses an LLM at some point but the whole retrieval part is done by using encoders which are not LLMs.

Hope this helps !

Alanturner2 · January 19, 2025, 11:40pm

Help for your answering.
I think you are misunderstanding of my question.
I mean in retrieval problem, what is better what is useful? Vector DB using LLM(transformers) or Jamba?
In this aspect, I think that Vector DB is better then Jamba.

samchain · January 20, 2025, 9:36am

Vector DB is not better than Jamba because it’s simply two different things. A Vector DB is a storage where you feed in embeddings produced after encoding a lot of resources that will be later used for retrieval. Jamba does not produce such encoding because it is “just” an LLM, something trained to generate the response. To do good retrieval you need an encoder and those are for example : sentence bert, MB25, ModernBert…

So VectorDB != Jamba

Alanturner2 · January 20, 2025, 1:12pm

Then you mean that if I want to chat with the large amount of data(e.g)private company’s data), at that time I should use jamba not vectorDB.
I used two methods, Vector DB + openai is better than only Jamba & jamba + vector DB!

samchain · January 20, 2025, 1:44pm

Yes use a Vector DB and Open AI for a rag it will be probably better than Jamba as the benchmarks are saying.

Topic		Replies	Views
Fine-Tuning a Mamba Model with using Hugging Face Transformers 🤗Transformers	1	184	March 18, 2025
Finetuned State-space/mamba model not working on huggingspace model Beginners	0	67	April 25, 2024
Finetuned State-space/mamba model not working on huggingface model 🤗Transformers	0	105	April 25, 2024
Mamba for token classification task Models	2	395	June 4, 2024
Starting with AI and assistants Beginners	2	475	May 11, 2025

Explain about Mamba and Jamba

Related topics