5 Powerful Python Decorators to Optimize LLM Applications

Image by Editor

Table of Contents

# Introduction

Python decorators are tailor-made solutions that are designed to help simplify complex software logic in a variety of applications, including LLM-based ones. Dealing with LLMs often involves coping with unpredictable, slow—and frequently expensive—third-party APIs, and decorators have a lot to offer for making this task cleaner by wrapping, for instance, API calls with optimized logic.

Let’s take a look at five useful Python decorators that will help you optimize your LLM-based applications without noticeable extra burden.

The accompanying examples illustrate the syntax and approach to using each decorator. They are sometimes shown without actual LLM use, but they are code excerpts ultimately designed to be part of larger applications.

# 1. In-memory Caching

This solution comes from Python’s functools standard library, and it is useful for expensive functions like those using LLMs. If we had an LLM API call in the function defined below, wrapping it in an LRU (Least Recently Used) decorator adds a cache mechanism that prevents redundant requests containing identical inputs (prompts) in the same execution or session. This is an elegant way to optimize latency issues.

This example illustrates its use:

from functools import lru_cache
import time

@lru_cache(maxsize=100)
def summarize_text(text: str) -> str:
    print("Sending text to LLM...")
    time.sleep(1) # A simulation of network delay
    return f"Summary of {len(text)} characters."

print(summarize_text("The quick brown fox.")) # Takes one second
print(summarize_text("The quick brown fox.")) # Instant

# 2. Caching On Persistent Disk

Speaking of caching, the external library diskcache takes it a step further by implementing a persistent cache on disk, namely via a SQLite database: very useful for storing results of time-consuming functions such as LLM API calls. This way, results can be quickly retrieved in later calls when needed. Consider using this decorator pattern when in-memory caching is not sufficient because the execution of a script or application may stop.

import time
from diskcache import Cache

# Creating a lightweight local SQLite database directory
cache = Cache(".local_llm_cache")

@cache.memoize(expire=86400) # Cached for 24 hours
def fetch_llm_response(prompt: str) -> str:
    print("Calling expensive LLM API...") # Replace this by an actual LLM API call
    time.sleep(2) # API latency simulation
    return f"Response to: {prompt}"

print(fetch_llm_response("What is quantum computing?")) # 1st function call
print(fetch_llm_response("What is quantum computing?")) # Instant load from disk happens here!

# 3. Network-resilient Apps

Since LLMs may often fail due to transient errors as well as timeouts and “502 Bad Gateway” responses on the Internet, using a network resilience library like tenacity along with the @retry decorator can help intercept these common network failures.

The example below illustrates this implementation of resilient behavior by randomly simulating a 70% chance of network error. Try it several times, and sooner or later you will see this error coming up: totally expected and intended!

import random
from tenacity import retry, wait_exponential, stop_after_attempt, retry_if_exception_type

class RateLimitError(Exception): pass

# Retrying up to 4 times, waiting 2, 4, and 8 seconds between each attempt
@retry(
    wait=wait_exponential(multiplier=2, min=2, max=10),
    stop=stop_after_attempt(4),
    retry=retry_if_exception_type(RateLimitError)
)
def call_flaky_llm_api(prompt: str):
    print("Attempting to call API...")
    if random.random() < 0.7: # Simulating a 70% chance of API failure
        raise RateLimitError("Rate limit exceeded! Backing off.")
    return "Text has been successfully generated!"

print(call_flaky_llm_api("Write a haiku"))

# 4. Client-side Throttling

This combined decorator uses the ratelimit library to control the frequency of calls to a (usually highly demanded) function: useful to avoid client-side limits when using external APIs. The following example does so by defining Requests Per Minute (RPM) limits. The provider will reject prompts from a client application when too many concurrent prompts are launched.

from ratelimit import limits, sleep_and_retry
import time

# Strictly enforcing a 3-call limit per 10-second window
@sleep_and_retry
@limits(calls=3, period=10)
def generate_text(prompt: str) -> str:
    print(f"[{time.strftime('%X')}] Processing: {prompt}")
    return f"Processed: {prompt}"

# First 3 print immediately, the 4th pauses, thereby respecting the limit
for i in range(5):
    generate_text(f"Prompt {i}")

# 5. Structured Output Binding

The fifth decorator on the list uses the magentic library in conjunction with Pydantic to provide an efficient interaction mechanism with LLMs via API, and obtain structured responses. It simplifies the process of calling LLM APIs. This process is important for coaxing LLMs to return formatted data like JSON objects in a reliable fashion. The decorator would handle underlying system prompts and Pydantic-led parsing, optimizing the usage of tokens as a result and helping keep a cleaner codebase.

To try this example out, you will need an OpenAI API key.

# IMPORTANT: An OPENAI_API_KEY set is required to run this simulated example
from magentic import prompt
from pydantic import BaseModel

class CapitalInfo(BaseModel):
    capital: str
    population: int

# A decorator that easily maps the prompt to the Pydantic return type
@prompt("What is the capital and population of {country}?")
def get_capital_info(country: str) -> CapitalInfo:
    ... # No function body needed here!

info = get_capital_info("France")
print(f"Capital: {info.capital}, Population: {info.population}")

# Wrapping Up

In this article, we listed and illustrated five Python decorators based on diverse libraries that take on particular significance when used in the context of LLM-based applications to simplify logic, make processes more efficient, or improve network resilience, among other aspects.

Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

What's Hot

India’s Modi promises fast-track courts for exam fraud fuelling protests | Protests News

How to Canary and Roll Back Model, Prompt, or Tool Changes Without Breaking Production

Check Out This Nifty 3D Playdate Game Demo

5 Powerful Python Decorators to Optimize LLM Applications

5 Tips To Improve Your Process (Supported By A Case Study)

What AI Startup Advisors See That Founders Often Miss

Visualizing Patterns in Solutions: How Data Structure Affects Coding Style

Kaggle + Google’s Free 5-Day Agentic AI Course

10 Newsletters Keeping You Ahead in AI

Agentic AI vs Automation: Key Differences Explained

Black Swans in Artificial Intelligence — Dan Rose AI

Every Clue That Tony Stark Was Always Doctor Doom

We let ChatGPT judge impossible superhero debates — here’s how it ruled

Most Popular

Black Swans in Artificial Intelligence — Dan Rose AI

Every Clue That Tony Stark Was Always Doctor Doom

We let ChatGPT judge impossible superhero debates — here’s how it ruled

Subscribe to Updates

What's Hot

5 Powerful Python Decorators to Optimize LLM Applications

# Introduction

# 1. In-memory Caching

# 2. Caching On Persistent Disk

# 3. Network-resilient Apps

# 4. Client-side Throttling

# 5. Structured Output Binding

# Wrapping Up

Related posts:

5 Tips To Improve Your Process (Supported By A Case Study)

What AI Startup Advisors See That Founders Often Miss

Visualizing Patterns in Solutions: How Data Structure Affects Coding Style

Related Posts

Subscribe to Updates