Lessons from 15 months of building LLM agents

N. S. Bradford

Dec 16, 2023

Or: What I want out of an LLM observability platform

Read →

7 Comments

Sean Thielen-Esparza

Language

Dec 28, 2023Liked by N. S. Bradford

So happy that I finally got around to reading this. Very informative!

I think there's an implicit assumption in this post is that your target customers need consistent results and predictable outcomes. Understandably so, btw. But since agents are built on non-deterministic LLMs, achieving enterprise-grade quality seems crazy... unless you're framing the product for early-adopter customers mid-market and smaller. I think one way to unlock better outcomes with agents is to involve humans in their workflows, enabling them review any proposed actions that the agent want to take (ie. spend some money, push some code, etc). It might be crazy, but I've been thinking about whether there's an opportunity to build agentic software for everyday consumers that leans into this interaction paradigm instead of trying to eliminate it entirely, at this stage.

Curious what you think.

Expand full comment

Reply (1)

N. S. Bradford

Dec 28, 2023Author

totally! the reason for determinism is rooted in agent is reliability, with major exceptions:

- building a product where you want high variance (like writing poems)

- building in a pass@k-style architecture, where you generate many results and choose the best one

- intrinsically unsolveable by agent (like requiring non-indexed data to answer)

I think "human in the loop" (or "agent in the loop") workflows will be around for many years yet, and the UX is the key, just as how ChatGPT was fundamentally a UX advance to make LLMs easy to interact with

Expand full comment

Gordon

Dec 17, 2023

Love this post! I'm interested in the caching section. Do you have an example for that in action?

Expand full comment

Reply (1)

N. S. Bradford

Dec 17, 2023·edited Dec 17, 2023Author

thanks! you can get up and running with Helicone easily just by swapping a URL, otherwise a Postgres table with a string key and JSON payload field is all you need, depending on your preference. Here's the Python to serialize.

```

def _stable_serialize(request: ChatCompletionRequest) -> str:

return json.dumps(request.get_request_body_dict(), sort_keys=True)

def _get_cache_key(request: ChatCompletionRequest) -> str:

request_serialized = _stable_serialize(request)

hash = hashlib.sha256(request_serialized.encode()).hexdigest()

return hash

```

Expand full comment

Reply (1)

Gordon

Dec 17, 2023

Thanks, well I initially didn't fully grasp the purpose of caching and how it could help speedup evaluation, so I hoped to understand that given a code context.

Expand full comment

Vincent Claes

Dec 17, 2023

I have spent the last weeks building a chatbot and i resonate with your article. Caching is a great strategy to speed up your testing, thanks!

Expand full comment

Reply (1)

N. S. Bradford

Dec 17, 2023Author

Thanks! would love to checkout the chatbot if it's public

Expand full comment

Nick Bradford's blog

Lessons from 15 months of building LLM agents