If anyone would be interested to know about AI safety in Knowledge graph:
Abstract
Clinical Graph-LLMs achieve high benchmark accuracy while bypassing theirknowledge graphs—a failure we term Structural Hallucination. Correctness via memorisation rather than graph traversal is a safety risk: such models cannot be trusted when evidence is updated or in rare-disease settings where parametric priors are absent.
I formalise the Structural Alignment Score (SAS), measuring causal sensitivity to Counterfactual Edge Deletion (e∗removed from G):

State-of-the-art Graph-LLMs score SAS ≈0.00, confirming their correct predictions are structurally fraudulent. We propose Topology-Constrained Decoding(TCD), which hard-masks the LLM logit distribution to KG-verified neighbours at step 0, raising KG-grounded token probability from 0.33% to 100% (+99.7 pp,BioGPT) with SAS ≈0.94 and zero hallucination by construction. GAT-TCD extends this with a 2-layer Graph Attention Network that ranks the constraint set by structural relevance, achieving SAS ≈ 1.00 on primary therapeutic edges (e.g., Imatinib->ABL1) while deliberately yielding SAS ≈0.23 under random CED— concentrating faithfulness where clinical evidence is strongest. PrimeKG-trained GAT weights rank biologically plausible targets (e.g. artenimol: CYCS top-1, GAPDH rank-2) and CED shifts generations when the top node is removed (3/10 drugs, step-0 SAS = 30% under prefix-TCD). We introduce Transparency Debt to frame the systemic risk of deploying accurate but structurally unfaithful models, and call on the community to adopt SAS as a standard reporting metric alongsideaccuracy.
Introduction:
Graph-LLMs ground LLM reasoning in verifiable KGs such as PrimeKG [1], promising traceable
clinical diagnostics—but this promise rests on a precarious foundation.
Structural Hallucination occurs when a model produces a correct clinical ansIr by bypassing the
supplied graph and drawing on parametric memory instead of graph evidence.
To illustrate, consider the Phone-Charger Trap: even when the critical biological edge
(Imatinib, inhibits, BCR-ABL) is explicitly removed from the input graph, state-of-the-art mod-
els such as G-Retriever [2] continue to predict the same relationship. The model’s reasoning is
decoupled from the graph structure entirely—a finding I characterise formally through the CED
protocol and case study.
This failure exposes growing Transparency Debt . While models
maintain high benchmark accuracy, they remain structurally unfaithful. In clinical settings—where
the validity of a specific causal pathway (e.g., genomic mutation ->disease progression) is more
critical than a general probabilistic prediction—this decoupling is a safety risk.
I therefore argue that for clinical Graph-LLMs, structural faithfulness must be treated as a mandatory
prerequisite rather than an optional secondary property. My contributions are:
- I expose Structural Hallucination via CED experiments on PrimeKG and formalise SAS(Jensen-Shannon Divergence) to quantify structural faithfulness .
- I propose TCD, which hard-masks the LLM logit distribution to KG neighbours at inference time, making hallucination impossible by construction .
- I extend TCD to GAT-TCD, using a 2-layer GAT to attention-rank the constraint set,achieving SAS ≈1.00 on primary therapeutic edges .
- I show that PrimeKG-trained GAT weights improve semantic ranking and CED-shifted generations (e.g. artenimol: CYCS->GAPDH after node deletion; Section 7.5).
- We evaluate on 5 drugs across BioGPT, demonstrating +97–100 pp KG-grounding improvement with zero hallucination by construction .
Full write up is here:
Any feedback is welcome.