Empirical study showing a sample-size sign reversal in naive RAG for Solidity vulnerability detection: +2.0% Macro-F1 at n=100 flips to -2.7% at n=250 on SolidiFI. Argues for bootstrap confidence intervals in any RAG evaluation.
Keywords: smart contract security; retrieval-augmented generation; Solidity; vulnerability detection; bootstrap confidence intervals