Image by Author
Â
#Â Introduction
Â
Claude Code is an agentic coding environment. Unlike a chatbot that answers questions and waits, Claude Code can read your files, run commands, make changes, and independently work through problems while you watch, redirect, or step away entirely.
This changes how you work. Instead of writing code yourself and asking Claude to review it, you describe what you want and Claude figures out how to build it. Claude explores, plans, and implements. But this autonomy still comes with a learning curve. Claude works within certain constraints you need to understand.
In this article you will learn the best practical techniques for using Claude Code on the Claude.ai web interface to accelerate your data science work. It covers core workflows from initial data cleaning to final model evaluation with specific examples in pandas, matplotlib, and scikit-learn.
Â
#Â Core Principles For Effective Collaboration
Â
First, adopt these foundational practices for working with Claude on the web interface. They help Claude understand your context and provide better, more relevant assistance.
- Use the @ symbol for context: The most powerful feature for data science is file referencing. Type @ in the chat and select your data file; it can be customer_data.csv or a script, e.g. model_training.py, to give Claude its full content. For directories, @src/ provides a file listing. This ensures Claude’s advice is based on your actual data and code.
- Make use of Plan Mode for complex tasks: Before making changes to multiple files, like refactoring a data processing pipeline, activate Plan Mode. Claude will analyze your code and propose a step-by-step plan. Review and refine this plan before any code is executed, preventing missteps in complex projects.
- Enable extended thinking: For challenging problems like optimizing a slow data transformation or debugging a model accuracy issue, ensure Claude’s “thinking” is enabled. This allows it to reason through complex logic step-by-step before giving you a final, well-considered answer.
Â
#Â Intelligent Data Cleaning And Exploration
Â
Data cleaning is most times the most time-consuming step. Claude can help you navigate and clean datasets efficiently.
- Rapid data profiling: After uploading a comma-separated values (CSV) file using @, ask Claude for a summary. A prompt like, “Analyze @customer_data.csv. Provide summary statistics, check for missing values, and identify potential outliers in numerical columns” gives you an instant diagnostic report.
- Automating cleaning steps: Describe the issue and let Claude write the pandas code. For example, if a “Duration” column has impossible values — like a 450-minute workout — you can ask Claude to cap or remove outliers.
Â
//Â Example Prompt and Output
Your prompt: “In the df DataFrame, the ‘Age’ column has some values over 120. Write pandas code to replace any value in ‘Age’ greater than 120 with the column’s median age“
Claude:
import pandas as pd
# Assuming df is already loaded
median_age = df['Age'].median()
df.loc[df['Age'] > 120, 'Age'] = median_age
print(f"Replaced outliers with median age: {median_age}")
Â
Claude can also help with more advanced cleaning, like dropping unnecessary columns, fixing indexes, or using .str methods to clean text data.
Â
#Â Creating An Effective Visualization With Claude Code
Â
Claude helps you move from raw data to insightful matplotlib or seaborn plots quickly.
- From question to chart, you can describe what you want to see. For example: “Create a matplotlib figure with two subplots. On the left, a histogram of ‘Transaction_Amount’ with 30 bins. On the right, a scatter plot of ‘Transaction_Amount’ vs. ‘Customer_Age’, colored by ‘Purchase_Category’.”
- You can style and polish your output. Ask Claude to improve an existing chart: “Take this plot code and make it publication-quality. Add a clear title, format the axis labels, adjust the color palette for colorblind readers, and ensure the layout is tight.”
Â
//Â Example Prompt for a Common Plot
Your prompt: “Write code to create a grouped bar chart showing the average ‘Sales’ for each ‘Region’ (x-axis) broken down by ‘Product_Line’. Use the ‘Set3’ colormap from matplotlib.cm.”
Claude will generate the complete figure code, including data grouping with pandas and the plotting logic with matplotlib.
Â
#Â Streamlining Model Prototyping
Â
Claude does well at building the foundation for machine learning projects, allowing you to focus on analysis and interpretation.
- Building the model pipeline involves you providing your feature and target dataframes and asking Claude to construct a robust training script. A good prompt would look like this: “Using scikit-learn, write a script that:
- Splits the data in @features.csv and @target.csv with a 70/30 ratio and a random state of 42.
- Creates a preprocessing column transformer that scales numerical features and one-hot encodes categorical ones.
- Trains a
RandomForestClassifier. - Outputs a classification report and a confusion matrix plot.
- You can get interpretation and results and iterate. Paste your model’s output — for example, a classification report or feature importance array — and ask for insights: “Explain this confusion matrix. Which classes are most commonly confused? Suggest two ways to improve precision for the minority class.”
Following scikit-learn’s estimator application programming interface (API) is key for building compatible and reusable models. This involves properly implementing __init__, fit, and predict and using trailing underscores for learned attributes, e.g. model_coef_.
An example would be code for a simple train-test workflow. Claude can quickly generate this standard boilerplate.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
# Load your data
# X = features, y = target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Evaluate
predictions = model.predict(X_test)
print(f"Model MAE: {mean_absolute_error(y_test, predictions):.2f}")
Â
//Â Key File Reference Methods in Claude Code
Â
| Method | Syntax Example | Best Use Case |
|---|---|---|
| Reference Single File | Explain the model in @train.py | Getting help with a specific script or data file |
| Reference Directory | List the main files in @src/data_pipeline/ | Understanding project structure |
| Upload Image/Chart | Use the upload button | Debugging a plot or discussing a diagram |
Â
#Â Conclusion
Â
Learning the fundamentals of Claude Code for data science is about using it as a collaborative partner. Start your session by providing context with @ references. Use Plan Mode to scope out major changes safely. For deep analysis, ensure extended thinking is enabled.
The true power emerges when you iteratively refine prompts: use Claude’s initial code output, then ask it to “optimize for speed,” “add detailed comments,” or “create a validation function” based on the result. This turns Claude from a code generator into a force multiplier for your problem-solving skills.
Â
Â
Shittu Olumide is a software engineer and technical writer passionate about leveraging cutting-edge technologies to craft compelling narratives, with a keen eye for detail and a knack for simplifying complex concepts. You can also find Shittu on Twitter.
