Config attn_impl triton

Hey !

I’m very new using hugginface models and transformers lib.

My question is : Do I need a triton server running localy when setting up triton in the config ?

My config looks like this :

config = AutoConfig.from_pretrained(
    "replit/replit-code-v1-3b",
    trust_remote_code=True,
    low_cpu_mem_usage=True
)
config.attn_config['attn_impl'] = 'triton'

I always get this error when using triton attn_impl :

Traceback (most recent call last):
  File "<string>", line 21, in _fwd_kernel
KeyError: ('2-.-0-.-0-f7e0f7a56c159db561be078bc846de51-2b0c5161c53c71b37ae20a9996ee4bb8-c1f92808b4e4644c1732e8338187ac87-d962222789c30252d492a16cca3bf467-12f7ac1ca211e037f62a7c0c323d9990-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.float32, torch.float32, 'fp32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), ('vector', True, 128, False, False, False, 128, 128), (True, True, True, True, True, True, True, (False,), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (True, False), (True, False), (True, False), (True, False), (True, False), (False, False), (False, False), (True, False), (True, False), (True, False), (True, False)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
 
***

File "/home/wraken/.local/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
    name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: CUDA: Error- no device

My intuition tells me that I need a triton server running somewhere, but I’m not sure at all because it works with other attn_impl settings.

So do I really need a server running or its related to something else like bad cuda drivers ?

I’m using wsl2 ubuntu with a 3080 graphic card

Thanks and have a nice day !

Okay problem was WSL2.
Tried on a ubuntu system and it works fine !