CUDA error when trying to run nomic-embed-text-v1.5

Lassassin · July 16, 2024, 10:08am

I have a 4070ti super, and I want to embed around 315k+ data locally. When I use my CPU the code below works fine, but when I set it to the GPU, i keep getting this CUDA error message

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

, even though my GPU’s VRAM is not completely used (I checked using task manager). I tried reinstalling everything, downgrading my GPU drivers to the one in cuda 12.4, but still no luck. Lowering the batch size and sentences size just lets it run a few iterations before the error occurs. What am I doing wrong here? Is my VRAM not being released after an iteration or something?

start = 0
inc = 64
iteration = 1
matryoshka_dim = 512
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

# gpu = 0
device = torch.device("cuda:0"if torch.cuda.is_available() else "cpu")
# torch.cuda.set_device(gpu)

# device = torch.device("cpu")

for i in tqdm(range(start, len(rows), inc)):
	end = min(i + inc, len(rows))
	# print(start, end)
	sentences = rows[start:end]
	embeddings = model.encode(sentences, convert_to_tensor=True, device=device)
	embeddings = F.layer_norm(embeddings, normalized_shape=(embeddings.shape[1],))
	embeddings = embeddings[:, :matryoshka_dim]
	embeddings = F.normalize(embeddings, p=2, dim=1)
	# write to file in fk_ro_v
	with open("./fk_ro_v/ro_" + str(iteration) + ".pkl", "wb") as f:
		pickle.dump(embeddings, f)
	torch.cuda.empty_cache()
	iteration += 1
	start += inc

VRAM Usage

torch 2.5.0.dev20240715+cu124
torchaudio 2.4.0.dev20240715+cu124
torchvision 0.20.0.dev20240715+cu124

Topic		Replies	Views
Out of memory error when creating a lot of embeddings Models	2	5011	March 4, 2023
Bert embedding on GPU Beginners	0	325	June 22, 2024
OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 🤗Transformers	0	511	June 5, 2024
RuntimeError: CUDA out of memory even with simple inference Beginners	1	5371	January 16, 2022
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 39.56 GiB total capacity; 37.84 GiB already allocated; 242.56 MiB free; 37.96 GiB reserved in total by PyTorch) 🤗Transformers	2	5345	June 7, 2023

CUDA error when trying to run nomic-embed-text-v1.5

Related topics