AI Is a Fucking Liar, Part 1

BlackArbs Admin

If you don't have a strategy to handle this, it will cost you money, time, or both. There are a dozen ways it reveals itself, but the outcome is almost always the same. It's an uncomfortable truth that has to be accounted for upfront.

Before I continue: LLMs are genuinely impressive technology. When used correctly they're a serious force multiplier, and I use them daily. But between the hype, the marketing, and widespread misconceptions, they're also being used dangerously.

When I first started using Claude as a coding assistant, I went deep fast, implementing as many ideas as I could, as quickly as possible. Then it became clear that the corporations were aggressively overselling the capability. Shocking, I know.

The first time I realized the LLM was incapable of writing mission-critical Python, I was building an iteration of a research pipeline. A one-shot didn't work, which was expected. So I worked through the debug, and that's when I spotted it: dict.get().

You might ask: what's the problem?

In Python, dict.get() is really dict.get(key, default). If you don't specify a default, it returns None. That's fine in throwaway scripts. In mission-critical code, it's a silent killer. It introduces a hidden default. Example:

correlation = signal_data.get("correlation", 0.0)
if correlation >= some_value:
    do_stuff()

If data upstream is corrupted, or if the correlation calculation failed due to a bug, you will never know. The value quietly becomes 0.0. Everything downstream is now worthless garbage, and your system keeps running like nothing happened.

The LLM couldn't stop doing this. But I didn't fully understand that yet.

So I updated the CLAUDE.md, wrote explicit instructions: fail-fast behavior, no hidden defaults, never use dict.get() conventions. That's what we were told, right? The model is smart, just write specific rules in the markdown file and it'll follow them.

We kept working. The script started breaking again. Same pattern, again and again, and now I'm getting more frustrated because the reality is not matching expectations they sold us on. Each time I asked why it wasn't following the rules, I got the same answer: it reads the file, then ignores it when it has to lean on the training data.

That's when it clicked. LLMs are trained on an ocean of shit code. It acts like a swamp dragging the responses back to a stochastic average of mediocrity, optimized to run and exit cleanly, not to fail correctly. No amount of prompting or markdown file engineering can override it. This was my first real introduction to what "AI" actually is.

There are dozens of landmines like this. You will step on them, it's inevitable. The only variable is whether you have the tools to catch them before they damage something.

If you want to avoid the hellscape I went through, that's what Forge Ship is built for. It's a modern quant assurance framework designed to catch exactly these failure modes, the subtle silent ones that cost you without ever announcing themselves.

One more thing: LLMs have advanced since then. They still do this exact pattern today. And dict.get() is just one example. I have a list of these that could fill a book. We're just getting started.

Forge Ship — the framework — production skeleton, engineering guardrails, deploy configs. Early adopters get in at $499.
Forge Learn — the course — six sessions, live cohort, deployed pipeline in your own AWS account. Early bird pricing runs through May 16, class starts June 6.

Written by hand,

Brian Christopher, CFA
BlackArbs LLC

Want to work together directly? Reply to this email.

Enjoyed this post?

Subscribe for more research and trading insights.

By clicking "Subscribe," you agree to our Terms of Use and acknowledge our Privacy Policy. You can unsubscribe at any time.

No spam. Unsubscribe anytime.