A data science team wants to use a Pre-built open source NLP model. The team wants to use huggingface open source ML models. Now, simply downloading and executing the model is not allowed by choice of the developers. The model needs to be scanned for vulnerabilities and issues and then approved for usage.
How can we scan the model and know the vulnerabilities it might bring? Is there a tool that can do similar checks as the DAST tool does for the py libraries?
I am also looking for something similar. Not seen many work around vulnerability scans for LLMs
Use garak to scan LLM. Its open source at the moment.
This file has been truncated.
# garak LLM probe: Frequently Asked Questions
## How do I pronounce garak?
Good question! Emphasis on the first bit, GA-rak.
Both 'a's like a in English hat, or à in French, or æ in IPA.
## What's this tool for?
`garak` is designed to help discover situations where a language model generates outputs that one might not want it to. If you know `nmap` or `metasplot` for traditional netsec/infosec analysis, then `garak` aims to operate in a similar space for language models.
## How does it work?
`garak` has probes that try to look for different "vulnerabilities". Each probs sends specific prompts to models, and gets multiple generations for each prompt. LLM output is often stochastic, so a single test isn't very informative. These generations are then processed by "detectors", which will look for "hits". If a detector registers a hit, that attempt is registered as failing. Finally, a report is output with the success/failure rate for each probe and detector.
## Do these results have scientific validity?
No. The scores from any probe don't operate on any kind of normalised scale. Higher passing percentage is better, but that's it. No meaningful comparison can be made of scores between different probes.