What we learned building a privacy-first layer for LLMs

Hi everyone

After experimenting with PII anonymization pipelines, we started building a more structured approach to using LLMs with sensitive data.

A few things that surprised us:

  • Naive regex + NER breaks quickly at scale

  • Context loss can hurt model outputs more than expected

  • Re-identification pipelines get tricky in multi-step workflows

We ended up moving toward a design where:

  • sensitive data is abstracted before inference

  • mappings are handled separately

  • models never see raw PII

Curious how others are approaching this—especially in production settings.

1 Like