How are entities actually used in LLM-based systems like search agents data pipeline (e.g., Qwen Agent CPT, DS-V3.2)?

I’m trying to understand how entity extraction / entity-based methods are used inside LLM (GenAI) systems, especially in search-agent architectures. Recent papers like Qwen Agent CPT and DS-V3.2 mention using entities in their Search Agent pipeline — for example, detecting long-tail entities and generating QA pairs around them.

In my previous understanding, this kind of entity-based approach doesn’t really work well with traditional models like BERT or BiLSTM classifiers, especially when dealing with long-tail cases or generating QA data. But with modern LLMs, what does the actual workflow look like?

  • How do LLMs perform entity extraction in this context?

  • What does the prompting process look like?

  • How are entities turned into QA pairs or retrieval units?

  • Are there any best practices, templates, or empirical learnings?

I’m essentially looking for practical insights on how entities are operationalized in LLM-based agents.

Thanks!

1 Like

I gathered resources for now.