When loading âmicrosoft/Phi-3-mini-4k-instructâ using transformers.pipeline with âload_in_4bit=Trueâ I get a warning stating:
WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.920b6cf52a79ecff578cc33f61922b23cbc88115.modeling_phi3:`flash-attention` package not found, consider installing for better performance: No module named 'flash_attn'.
WARNING:transformers_modules.microsoft.Phi-3-mini-4k-instruct.920b6cf52a79ecff578cc33f61922b23cbc88115.modeling_phi3:Current `flash-attenton` does not support `window_size`. Either upgrade or use `attn_implementation='eager'`.
Perhaps I have loaded it incorrectly? I have accelerate in my environment
flash-attention package not found, consider installing for better performance: No module named âflash_attnâ.
Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'.