Image by Author
# Introduction
Python is one of the most beginner-friendly languages out there. But if you’ve worked with it for a while, you’ve probably run into loops that take minutes to finish, data processing jobs that hog all your memory, and more.
You don’t need to become a performance optimization expert to make significant improvements. Most slow Python code is because of a handful of common issues that are straightforward to fix once you know what to look for.
In this article, you’ll learn five practical techniques to speed up slow Python code, with before-and-after examples that show the difference.
You can find the code for this article on GitHub.
# Prerequisites
Before we get started, make sure you have:
- Python 3.10 or higher installed
- Familiarity with functions, loops, and lists
- Some familiarity with the time module from the standard library
For a couple of examples, you will also need the following libraries:
# 1. Measuring Before Optimizing
Before modifying a single line of code, you need to know where the slowness actually is. Optimizing the wrong part of your code wastes time and can even make things worse.
Python’s standard library includes a simple way to time any block of code: the time module. For more detailed profiling, cProfile shows you exactly which functions are taking the longest.
Let’s say you have a script that processes a list of sales records. Here is how to find the slow part:
import time
def load_records():
# Simulate loading 100,000 records
return list(range(100_000))
def filter_records(records):
return [r for r in records if r % 2 == 0]
def generate_report(records):
return sum(records)
# Time each step
start = time.perf_counter()
records = load_records()
print(f"Load : {time.perf_counter() - start:.4f}s")
start = time.perf_counter()
filtered = filter_records(records)
print(f"Filter : {time.perf_counter() - start:.4f}s")
start = time.perf_counter()
report = generate_report(filtered)
print(f"Report : {time.perf_counter() - start:.4f}s")
Output:
Load : 0.0034s
Filter : 0.0060s
Report : 0.0012s
Now you know where to focus. filter_records() is the slowest step, followed by load_records(). So that’s where any optimization effort will pay off. Without measuring, you might have spent time optimizing generate_report(), which was already fast.
The time.perf_counter() function is more precise than time.time() for short measurements. Use it whenever you are timing code performance.
Rule of thumb: never guess where the bottleneck is. Measure first, then optimize.
# 2. Using Built-in Functions and Standard Library Tools
Python’s built-in functions — sum(), map(), filter(), sorted(), min(), max() — are implemented in C under the hood. They are significantly faster than writing equivalent logic in pure Python loops.
Let’s compare manually summing a list versus using the built-in:
import time
numbers = list(range(1_000_000))
# Manual loop
start = time.perf_counter()
total = 0
for n in numbers:
total += n
print(f"Manual loop : {time.perf_counter() - start:.4f}s → {total}")
# Built-in sum()
start = time.perf_counter()
total = sum(numbers)
print(f"Built-in : {time.perf_counter() - start:.4f}s → {total}")
Output:
Manual loop : 0.1177s → 499999500000
Built-in : 0.0103s → 499999500000
As you can see, using built-in functions is nearly 6x faster.
The same principle applies to sorting. If you need to sort a list of dictionaries by a key, Python’s sorted() with a key argument is both faster and cleaner than sorting manually. Here is another example:
orders = [
{"id": "ORD-003", "amount": 250.0},
{"id": "ORD-001", "amount": 89.99},
{"id": "ORD-002", "amount": 430.0},
]
# Slow: manual comparison logic
def manual_sort(orders):
for i in range(len(orders)):
for j in range(i + 1, len(orders)):
if orders[i]["amount"] > orders[j]["amount"]:
orders[i], orders[j] = orders[j], orders[i]
return orders
# Fast: built-in sorted()
sorted_orders = sorted(orders, key=lambda o: o["amount"])
print(sorted_orders)
Output:
[{'id': 'ORD-001', 'amount': 89.99}, {'id': 'ORD-003', 'amount': 250.0}, {'id': 'ORD-002', 'amount': 430.0}]
As an exercise, try to time the above approaches.
Rule of thumb: before writing a loop to do something common — summing, sorting, finding the max — check if Python already has a built-in for it. It almost always does, and it is almost always faster.
# 3. Avoiding Repeated Work Inside Loops
One of the most common performance mistakes is doing expensive work inside a loop that could be done once outside it. Every iteration pays the cost, even when the result never changes.
Here is an example: validating a list of product codes against an approved list.
import time
approved = ["SKU-001", "SKU-002", "SKU-003", "SKU-004", "SKU-005"] * 1000
incoming = [f"SKU-{str(i).zfill(3)}" for i in range(5000)]
# Slow: len() and list membership check on every iteration
start = time.perf_counter()
valid = []
for code in incoming:
if code in approved: # list search is O(n) — slow
valid.append(code)
print(f"List check : {time.perf_counter() - start:.4f}s → {len(valid)} valid")
# Fast: convert approved to a set once, before the loop
start = time.perf_counter()
approved_set = set(approved) # set lookup is O(1) — fast
valid = []
for code in incoming:
if code in approved_set:
valid.append(code)
print(f"Set check : {time.perf_counter() - start:.4f}s → {len(valid)} valid")
Output:
List check : 0.3769s → 5 valid
Set check : 0.0014s → 5 valid
The second approach is much faster, and the fix was just moving one conversion outside the loop.
The same pattern applies to anything expensive that does not change between iterations, like reading a config file, compiling a regex pattern, or opening a database connection. Do it once before the loop, not once per iteration.
import re
# Slow: recompiles the pattern on every call
def extract_slow(text):
return re.findall(r'\d+', text)
# Fast: compile once, reuse
DIGIT_PATTERN = re.compile(r'\d+')
def extract_fast(text):
return DIGIT_PATTERN.findall(text)
Rule of thumb: if a line inside your loop produces the same result every iteration, move it outside.
# 4. Choosing the Right Data Structure
Python gives you several built-in data structures — lists, sets, dictionaries, tuples — and choosing the wrong one for the job can make your code much slower than it needs to be.
The most important distinction is between lists and sets for membership checks using the in operator:
- Checking whether an item exists in a list takes longer as the list grows, as you have to scan through it one by one
- A set uses hashing to answer the same question in constant time, regardless of size
Let’s look at an example: finding which customer IDs from a large dataset have already placed an order.
import time
import random
all_customers = [f"CUST-{i}" for i in range(100_000)]
ordered = [f"CUST-{i}" for i in random.sample(range(100_000), 10_000)]
# Slow: ordered is a list
start = time.perf_counter()
repeat_customers = [c for c in all_customers if c in ordered]
print(f"List : {time.perf_counter() - start:.4f}s → {len(repeat_customers)} found")
# Fast: ordered is a set
ordered_set = set(ordered)
start = time.perf_counter()
repeat_customers = [c for c in all_customers if c in ordered_set]
print(f"Set : {time.perf_counter() - start:.4f}s → {len(repeat_customers)} found")
Output:
List : 16.7478s → 10000 found
Set : 0.0095s → 10000 found
The same logic applies to dictionaries when you need fast key lookups, and to the collections module’s deque when you are frequently adding or removing items from both ends of a sequence — something lists are slow at.
Here is a quick reference for when to reach for which structure:
| Need | Data Structure to Use |
|---|---|
| Ordered sequence, index access | list |
| Fast membership checks | set |
| Key-value lookups | dict |
| Counting occurrences | collections.Counter |
| Queue or deque operations | collections.deque |
Rule of thumb: if you are checking if x in something inside a loop and something has more than a few hundred items, it should probably be a set.
# 5. Vectorizing Operations on Numeric Data
If your code processes numbers — calculations across rows of data, statistical operations, transformations — writing Python loops is almost always the slowest possible approach. Libraries like NumPy and pandas are built for exactly this: applying operations to entire arrays at once, in optimized C code, without a Python loop in sight.
This is called vectorization. Instead of telling Python to process each element one at a time, you hand the whole array to a function that handles everything internally at C speed.
import time
import numpy as np
import pandas as pd
prices = [round(10 + i * 0.05, 2) for i in range(500_000)]
discount_rate = 0.15
# Slow: Python loop
start = time.perf_counter()
discounted = []
for price in prices:
discounted.append(round(price * (1 - discount_rate), 2))
print(f"Python loop : {time.perf_counter() - start:.4f}s")
# Fast: NumPy vectorization
prices_array = np.array(prices)
start = time.perf_counter()
discounted = np.round(prices_array * (1 - discount_rate), 2)
print(f"NumPy : {time.perf_counter() - start:.4f}s")
# Fast: pandas vectorization
prices_series = pd.Series(prices)
start = time.perf_counter()
discounted = (prices_series * (1 - discount_rate)).round(2)
print(f"Pandas : {time.perf_counter() - start:.4f}s")
Output:
Python loop : 1.0025s
NumPy : 0.0122s
Pandas : 0.0032s
NumPy is nearly 100x faster for this operation. The code is also shorter and cleaner. No loop, no append(), just a single expression.
If you are already working with a pandas DataFrame, the same principle applies to column operations. Always prefer column-level operations over looping through rows with iterrows():
df = pd.DataFrame({"price": prices})
# Slow: row-by-row with iterrows
start = time.perf_counter()
for idx, row in df.iterrows():
df.at[idx, "discounted"] = round(row["price"] * 0.85, 2)
print(f"iterrows : {time.perf_counter() - start:.4f}s")
# Fast: vectorized column operation
start = time.perf_counter()
df["discounted"] = (df["price"] * 0.85).round(2)
print(f"Vectorized : {time.perf_counter() - start:.4f}s")
Output:
iterrows : 34.5615s
Vectorized : 0.0051s
The iterrows() function is one of the most common performance traps in pandas. If you see it in your code and you are working on more than a few thousand rows, replacing it with a column operation is almost always worth doing.
Rule of thumb: if you are looping over numbers or DataFrame rows, ask whether NumPy or pandas can do the same thing as a vectorized operation.
# Conclusion
Slow Python code is usually a pattern problem. Measuring before optimizing, leaning on built-ins, avoiding repeated work in loops, picking the right data structure, and using vectorization for numeric work will cover the vast majority of performance issues you will run into as a beginner.
Start with tip one every time: measure. Find the actual bottleneck, fix that, and measure again. You will be surprised how much headroom there is before you need anything more advanced.
The five techniques in this article cover the most common causes of slow Python code. But sometimes you need to go further:
- Multiprocessing — if your task is CPU-bound and you have a multi-core machine, Python’s
multiprocessingmodule can split the work across cores - Async I/O — if your code spends most of its time waiting on network requests or file reads,
asynciocan handle many tasks concurrently - Dask or Polars — for datasets too large to fit in memory, these libraries scale beyond what pandas can handle
These are worth exploring once you have applied the basics and still need more headroom. Happy coding!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.
