Monitoring ML and LLM models in production for drift, trust, and safety

Hi all —

I wanted to share a quick look at what we’ve been working on at InsightFinder AI, and get feedback from anyone who’s solving similar problems.

Over the past year we’ve seen more teams deploying not just traditional ML models, but also LLMs, into production systems. These are outstanding issues:

  • Data drift and model drift (input and output) degrading performance over time.
  • No clear way to evaluate LLM outputs for hallucinations, bias, or sensitive data leakage.
  • Trouble pinpointing why something went wrong when anomalies happen.
  • Lack of visibility into costs and performance metrics across models.

We tried to solve these issues - this demo walks through the current version:
:link: (https://youtu.be/7aPwvO94fXg)

main highlights:

We built a platform that tries to address those by making it easy to:

:white_check_mark: Onboard a model with its metadata and data sources (Snowflake, Elastic, etc.)
:white_check_mark: Set up monitors for specific use cases (data quality, drift, hallucinations, etc.)
:white_check_mark: Dig into issues with a diagnostic “workbench” for root cause analysis.
:white_check_mark: See dashboards of costs, failed evaluations, and overall model health.

We’re still actively improving it, and it’d be really helpful to know what you’d want or what doesn’t resonate. Happy to answer questions about how it works or share more details about the underlying implementation if anyone’s curious.

Thanks for taking a look.