Skip to content
SecurityAI

Researcher Spends $1,500 Testing If LLMs Can Hack a Vulnerable Firebase App

A security researcher built a deliberately vulnerable React Native app with a Firebase backend to evaluate whether LLMs could independently discover and exploit a common broken access control vulnerability. After spending $1,500 on tests across multiple models including GPT-5.5, DeepSeek V4 Pro, Claude Sonnet 4.6, and others, GPT-5.5 achieved the best results with a 7/10 solve rate by correctly targeting Firebase misconfigurations rather than the hardened API. The experiment revealed significant variance in model behavior, with many models refusing, fixating on incorrect attack vectors, or burning through millions of tokens without success, highlighting both the potential and current limitations of autonomous AI security testing.

Read full article →