I am trying to build one text-to-sql with huggingface chatdb/natural-sql-7b model, it seems it is getting stuck every time and not generating any result. here is my code. Another problem is its notworking with "cuda". It's showing "torch is not compiled w

import torch
from db import get_schema
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("chatdb/natural-sql-7b")
model = AutoModelForCausalLM.from_pretrained(
    "chatdb/natural-sql-7b",
    device_map="auto",
    torch_dtype=torch.float16,
)

question = 'How many employees are there?'

prompt = f"""
### Task 

Generate a SQL query to answer the following question: `{question}` 

### PostgreSQL Database Schema 
The query will run on a database with the following schema: 

{get_schema()}


### Answer 
Here is the SQL query that answers the question: `{question}` 
```sql
"""

print ("Question: " + question)
print ("SQL: ")

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

generated_ids = model.generate(
    **inputs,
    num_return_sequences=1,
    eos_token_id=100001,
    pad_token_id=100001,
    max_new_tokens=400,
    do_sample=False,
    num_beams=1,

)

outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
print(outputs[0].split("```sql")[-1])

Output:

Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 39.13it/s]
Some parameters are on the meta device because they were offloaded to the disk and cpu.
Question: How many employees are there?
SQL:

1 Like

torch will not work properly with CUDA unless you install it according to the instructions on the official site.
Also, the 7B model requires tens of GB of VRAM, so I think it may be OOM and stuck.

Hey John, thanks for the response. Cuda problem I have resolved. I am getting the result too, but its taking too long (8mins). How can I make the model work fast? Could you please suggest proper specs where it will work super fast?
My machine specs:
i7 10thgen, 16GB RAM, NVIDIA 1660 TI 6GB, 2 TB SSD

This is a VRAM requirement calculator for GGUF, and if you multiply the result of Q4_K_M by about 4, you can get the amount of VRAM required without quantization.
25GB of VRAM would be enough… that’s too expensive for such GeForce…