Researcher Spends $1,500 Testing If LLMs Can Hack a Vulnerable Firebase App

tldr / webdev 20h ago 7

A security researcher built a deliberately vulnerable React Native app with a Firebase backend to evaluate whether LLMs could independently discover and exploit a common broken access control vulnerability. After spending $1,500 on tests across multiple models including GPT-5.5, DeepSeek V4 Pro, Claude Sonnet 4.6, and others, GPT-5.5 achieved the best results with a 7/10 solve rate by correctly targeting Firebase misconfigurations rather than the hardened API. The experiment revealed significant variance in model behavior, with many models refusing, fixating on incorrect attack vectors, or burning through millions of tokens without success, highlighting both the potential and current limitations of autonomous AI security testing.

Read full article →

More Security

Biohub open-sources AI world model for protein biology and drug design

Google to pay SpaceX $920M monthly for xAI data center GPU capacity

OpenAI Ships Million-Line Product Written Entirely by Codex Agents