← Sergei Solovev · TradFi → AI → DeFi

When Retrieval Hurts: where RAG actively degrades Solidity vulnerability detection

2026-06-06 · Sergei Solovev, HSE University

Most RAG evaluations show you the wins. I ran the experiment the other way.

In "When Retrieval Hurts," I systematically tested where retrieval-augmented generation degrades Solidity vulnerability detection compared to a plain LLM baseline — not where it helps. The result: RAG reliably hurts on novel vulnerability patterns with no close analogues in the retrieval corpus, on under-specified queries that pull noisy neighbors, and when retrieved context introduces plausible-but-wrong code paths that the model treats as authoritative. In those regimes, F1 drops are material, not noise.

This matters for anyone deploying AI in smart-contract security pipelines. A retrieval layer you haven't stress-tested for failure modes is a liability, not a safety net. The honest-eval method — falsification first, confirmation second — is the only framework I trust for high-stakes automated auditing, and this paper is the proof of method before it becomes a production component of the ai-yield-vault security stack.

Preprint: https://figshare.com/articles/preprint/When_Retrieval_Hurts/32141182

#SmartContracts #RAG #ML