Researcher Spends $1,500 Testing If LLMs Can Hack a Vulnerable Firebase App
Security researcher Kasra built a fake React Native book-review app with a hardened FastAPI backend but misconfigured Firebase to test whether LLMs could discover and exploit broken access controls. After spending $1,500 across multiple models, GPT-5.5 solved the challenge in 7 of 10 runs, while DeepSeek V4 Pro managed 3/10 and most others either refused on safety grounds, exhausted budgets, or fixated on the wrong attack surface. The experiment highlights that while some frontier models can reproduce real-world Firebase exploits, cost, guardrails, and inconsistency still make them unreliable autonomous pentesters.