Michael McAndrew

2025-04-21

Making Long Context Prompts Efficient in my Telegram Bot

Not long after I added @ symbol support to my Telegram LLM bot, Simon Willison released an update to the llm CLI and library with a feature that turned out to be a very nice fit for improving what I had just built. The feature is called Fragments, and it’s designed to store long pieces of input context in a DB more efficient way.

To quickly recap, my original implementation of @link was to detect an inline URL with regex, fetch the associated content using Firecrawl, and insert it directly into the prompt. This works just fine, but because every prompt and its response is stored in an SQLite DB in my telegram bot, if I made multiple prompts against the same @link, it would store a duplicate of that long piece of context as part of the prompt every time.

Fragments offers a better way to handle this.

How Fragments Help

The main idea behind Fragments is to treat long-form context as a reusable object. Rather than stuffing raw content into each new prompt, you can store it once in a Fragment DB table. The Fragment then has a hash associated with it, generated by its text, which makes it easy to search and reuse. In the llm CLI, this is backed by a SQLite database, so once a Fragment is created, it will only ever be stored once based on its hash.

The Fragment system isn’t officially exposed in the LLM library Python API. Willison admits in the Python API docs that the usefulness of Fragments is very limited in the Python API alone:

This mechanism has limited utility in Python, as you can also assemble the contents of these strings together into the prompt= and system= strings directly. Fragments become more interesting if you are working with LLM’s mechanisms for storing prompts to a SQLite database, which are not yet part of the stable, documented Python API.

However, my Telegram bot happens to use the same underlying database that the CLI relies on. This made it possible to take advantage of Fragment persistence without much additional work.

Now, when a user sends something like:

@https://en.wikipedia.org/wiki/The_True_Record

in a prompt, the web page at the link is converted to Markdown with Firecrawl as before. However, when storing the prompt and its response after the request to the model is complete, rather than storing that prompt context, it will check the Fragments table in the DB to see if it exists. It does so by hashing the text (Markdown from Firecrawl in this case) and checking if that already exists, and reuses the Fragment. Pretty neat!

One nice artifact of this is that I can query all of the prompts that a particular Fragment was used. For example, by doing:

llm logs -f my_fragment_hash

I can get a list of all of the prompts that the Fragement was used in. For example, here are all of the recent prompts that I made against the Wiki for IP Over Avian Carriers:

# 2025-04-21T13:12:03    conversation: 01jsc7m8fvg3yfecvbhen4w6j7 id: 01jsc7nbas0fv9e8fwms3p015b

Model: **anthropic/claude-3-7-sonnet-latest**

## Prompt

Can you summarise this wiki for me? @https://en.wikipedia.org/wiki/IP_over_Avian_Carriers

# 2025-04-22T20:02:58    conversation: 01jsfhha8028ccdy9qxs37khsv id: 01jsfhhqwva0n7mpxwn88ecc5r

Model: **anthropic/claude-3-7-sonnet-latest**

## Prompt

Summarise @https://en.wikipedia.org/wiki/IP_over_Avian_Carriers

Future Improvements

This does feel like a starting point for trying out some prompt caching for long context in future for long-context threads. I can see a world where the long context is kept warm in the cache for quick responses and cheaper requests, but I think this will require a bit of thought on the implementation details.

I have now made the repo public, so you can take a look. If you're interested in the specific Fragments implementation details, you can check out the PR for it here too.