Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Overwatch season 3 skins, Mythics, and battle pass contents revealed

    June 15, 2026

    I Ate KFC’s ‘Supergirl’ Meal

    June 15, 2026

    The 2027 BMW X5 M60e Will Be A Forbidden Fruit In The US

    June 15, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Building Time-Series Machine Learning Models with sktime in Python
    Building Time-Series Machine Learning Models with sktime in Python
    Business & Startups

    Building Time-Series Machine Learning Models with sktime in Python

    gvfx00@gmail.comBy gvfx00@gmail.comJune 15, 2026No Comments8 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



     

    Table of Contents

    Toggle
    • # Introduction
    • # Prerequisites
    • # What Makes sktime Useful
    • # Setting Up the Dataset
    • # Splitting Time Series Data for Training and Testing
    • # Defining the Forecasting Horizon
    • # Building a Preprocessing and Forecasting Pipeline
    • # Evaluating the Forecast
    • # Swapping in a Different Forecaster
    • # Cross-Validating Across Time
    • # Next Steps
      • Related posts:
    • Top 7 n8n Workflow Templates for Data Science
    • RIP OpenClaw? Meet Claude Dispatch
    • AI Rapper Turns Out To Be A Racist

    # Introduction

     
    If you work with sensor readings, server metrics, or any data that arrives over time, you already know that standard scikit-learn pipelines don’t quite fit. Time series data has structure that tabular models ignore: seasonality, trend, temporal ordering, and the fact that future values depend on past ones.

    sktime is a Python library built specifically for this. It gives you a scikit-learn-style API — fit, predict, transform — but designed from the ground up for time series. You can do forecasting, classification, regression, and clustering on time series, all with a consistent interface.

    In this article, you’ll work through an example problem: forecasting temperature readings from an industrial HVAC sensor. You’ll learn how sktime handles time series data, how to build preprocessing pipelines, how to fit forecasters, and how to evaluate them.

    You can get the code on GitHub.

     

    # Prerequisites

     
    You’ll need Python 3.10 or higher and a basic familiarity with pandas. Install everything you need with:

    pip install sktime pmdarima statsmodels

     

    If you’d rather have all optional dependencies in one shot, pip install sktime[all_extras] covers them.

     

    # What Makes sktime Useful

     
    It helps to understand the problem sktime is solving. In scikit-learn, your data is a 2D table — rows are samples, columns are features. Time series data breaks this assumption because each “row” is actually a sequence of values over time, and the order of those values matters.

    The main data containers you’ll use are:

     

    Data Type Representation Description
    Series pd.Series or pd.DataFrame A single time series used in vanilla forecasting.
    Panel pd.DataFrame with a 2-level MultiIndex A collection of multiple independent time series.
    Hierarchical pd.DataFrame with a 3+ level MultiIndex A structured set of time series with aggregation levels across multiple dimensions.

     

    For the time index itself, sktime supports several time indexes: DatetimeIndex, PeriodIndex, Int64Index, and RangeIndex on your pandas objects. The index must be monotonic. If you’re using DatetimeIndex, the freq attribute should be set.

     

    # Setting Up the Dataset

     
    Let’s create a realistic dataset. Imagine an HVAC sensor in a factory that records temperature every hour. The readings have a daily seasonal pattern (higher during working hours), a slight upward trend due to summer, and some noise.

    import numpy as np
    import pandas as pd
    
    np.random.seed(42)
    
    # 90 days of hourly readings starting Jan 1, 2026
    n_hours = 90 * 24
    timestamps = pd.date_range(start="2026-01-01", periods=n_hours, freq="h")
    
    # Trend: gradual 5-degree rise over 90 days
    trend = np.linspace(0, 5, n_hours)
    
    # Daily seasonality: temperature peaks at 2pm, dips at 4am
    hour_of_day = np.arange(n_hours) % 24
    daily_cycle = 4 * np.sin(2 * np.pi * (hour_of_day - 4) / 24)
    
    # Noise
    noise = np.random.normal(0, 0.8, n_hours)
    
    # Base temperature around 20°C
    temperature = 20 + trend + daily_cycle + noise
    
    # Introduce a few missing values (sensor dropout)
    dropout_indices = [300, 301, 302, 1440, 1441]
    temperature[dropout_indices] = np.nan
    
    y = pd.Series(temperature, index=timestamps, name="temp_celsius")
    y.index.freq = pd.tseries.frequencies.to_offset("h")
    
    print(y.head())
    print(f"\nShape: {y.shape}")
    print(f"Missing values: {y.isna().sum()}")
    print(f"Index type: {type(y.index)}")

     

    Output:

    2026-01-01 00:00:00    16.933270
    2026-01-01 01:00:00    17.063277
    2026-01-01 02:00:00    18.522783
    2026-01-01 03:00:00    20.190095
    2026-01-01 04:00:00    19.821941
    Freq: h, Name: temp_celsius, dtype: float64
    
    Shape: (2160,)
    Missing values: 5
    Index type: 

     

     

    # Splitting Time Series Data for Training and Testing

     
    Splitting time series data is different from tabular data — you can’t shuffle rows. You must always split chronologically: train on earlier data, test on later data.

    sktime provides temporal_train_test_split for this purpose:

    from sktime.split import temporal_train_test_split
    
    # Hold out the last 7 days (168 hours) as the test set
    y_train, y_test = temporal_train_test_split(y, test_size=168)
    
    print(f"Train: {y_train.index[0]} → {y_train.index[-1]}")
    print(f"Test:  {y_test.index[0]} → {y_test.index[-1]}")
    print(f"Train size: {len(y_train)}, Test size: {len(y_test)}")

     

    Output:

    Train: 2026-01-01 00:00:00 → 2026-03-24 23:00:00
    Test:  2026-03-25 00:00:00 → 2026-03-31 23:00:00
    Train size: 1992, Test size: 168

     

    The function ensures the split is clean and chronological — no data leakage from the future into the training set.

     

    # Defining the Forecasting Horizon

     
    Before fitting any model, you need to tell sktime which time steps you want to predict. This is the ForecastingHorizon.

    from sktime.forecasting.base import ForecastingHorizon
    
    # Predict 168 steps ahead (7 days of hourly data)
    # is_relative=False means we're using absolute timestamps
    fh = ForecastingHorizon(y_test.index, is_relative=False)
    
    print(f"Horizon length: {len(fh)}")
    print(f"First forecast point: {fh[0]}")
    print(f"Last forecast point:  {fh[-1]}")

     

    This gives:

    Horizon length: 168
    First forecast point: 2026-03-25 00:00:00
    Last forecast point:  2026-03-31 23:00:00

     

    You can also use relative horizons like fh = [1, 2, 3, ..., 168], which means “1 step ahead, 2 steps ahead, …”. Absolute horizons are cleaner when you have actual timestamps you want predictions for.

     

    # Building a Preprocessing and Forecasting Pipeline

     
    Real sensor data has missing values, seasonal patterns, and trend — you need to handle all of these before or during forecasting. sktime’s TransformedTargetForecaster lets you chain transformations with a forecaster into a single estimator. The transformations are applied to the target series y before fitting, and automatically reversed on the way out during prediction.

    from sktime.forecasting.exp_smoothing import ExponentialSmoothing
    from sktime.forecasting.compose import TransformedTargetForecaster
    from sktime.transformations.series.impute import Imputer
    from sktime.transformations.series.detrend import Deseasonalizer, Detrender
    
    pipeline = TransformedTargetForecaster(
        steps=[
            # Step 1: Fill missing sensor readings using linear interpolation
            ("imputer", Imputer(method="linear")),
            # Step 2: Remove the linear trend so the forecaster sees a stationary series
            ("detrender", Detrender()),
            # Step 3: Remove the daily seasonality (sp=24 for hourly data with 24-hour cycles)
            ("deseasonalizer", Deseasonalizer(model="additive", sp=24)),
            # Step 4: Forecast the cleaned, stationary residuals
            ("forecaster", ExponentialSmoothing(trend=None, seasonal=None)),
        ]
    )
    
    pipeline.fit(y_train, fh=fh)
    y_pred = pipeline.predict()
    
    print(y_pred.head())

     

    Output:

    2026-03-25 00:00:00    21.210066
    2026-03-25 01:00:00    21.788986
    2026-03-25 02:00:00    22.615184
    2026-03-25 03:00:00    23.688449
    2026-03-25 04:00:00    24.621127
    Freq: h, Name: temp_celsius, dtype: float64

     

    Here’s what each step does:

    • Imputer(method="linear") fills missing values by linearly interpolating between the surrounding readings, which works well for sensor data.
    • Detrender() fits a linear trend to the training series and subtracts it; on prediction it adds the trend back.
    • Deseasonalizer(sp=24) removes the 24-hour cycle from the residuals; sp stands for seasonal period.
    • Finally, ExponentialSmoothing forecasts the detrended, deseasonalized residuals.
    • When predict() is called, all inverse transformations are applied in reverse order automatically, and you get back predictions in the original temperature scale.

     

    # Evaluating the Forecast

     
    sktime integrates with standard evaluation metrics. For forecasting, mean absolute error (MAE) and mean absolute percentage error (MAPE) are common choices.

    from sktime.performance_metrics.forecasting import (
        mean_absolute_error,
        mean_absolute_percentage_error,
    )
    
    mae = mean_absolute_error(y_test, y_pred)
    mape = mean_absolute_percentage_error(y_test, y_pred)
    
    print(f"MAE:  {mae:.3f} °C")
    print(f"MAPE: {mape*100:.2f}%")

     

    Output:

    MAE:  0.584 °C
    MAPE: 2.40%

     

     

    # Swapping in a Different Forecaster

     
    One of the biggest advantages of the sktime interface is that swapping the underlying algorithm requires changing just one line. Let’s try an ARIMA model in place of exponential smoothing and compare.

    from sktime.forecasting.arima import ARIMA
    
    pipeline_arima = TransformedTargetForecaster(
        steps=[
            ("imputer", Imputer(method="linear")),
            ("detrender", Detrender()),
            ("deseasonalizer", Deseasonalizer(model="additive", sp=24)),
            # ARIMA(1,1,1) on the cleaned residuals
            ("forecaster", ARIMA(order=(1, 1, 1), suppress_warnings=True)),
        ]
    )
    
    pipeline_arima.fit(y_train, fh=fh)
    y_pred_arima = pipeline_arima.predict()
    
    mae_arima = mean_absolute_error(y_test, y_pred_arima)
    mape_arima = mean_absolute_percentage_error(y_test, y_pred_arima)
    
    print(f"ARIMA MAE:  {mae_arima:.3f} °C")
    print(f"ARIMA MAPE: {mape_arima*100:.2f}%")

     

    Output:

    ARIMA MAE:  0.586 °C
    ARIMA MAPE: 2.41%

     

    The key point is that the preprocessing steps — imputation, detrending, deseasonalization — stayed identical. You only changed the final forecaster, and everything else composed cleanly around it.

     

    # Cross-Validating Across Time

     
    Holding out a single test window can be misleading. sktime provides time series cross-validation through splitters that respect temporal ordering.

    SlidingWindowSplitter uses a rolling window: the training window slides forward in time, always staying the same length. ExpandingWindowSplitter grows the training set cumulatively as you move forward, which is more appropriate when you want to use all available history.

    from sktime.split import ExpandingWindowSplitter
    from sktime.forecasting.model_evaluation import evaluate
    
    # Expanding window: start with 1800-hour train set, evaluate on 168-hour windows
    cv = ExpandingWindowSplitter(
        initial_window=1800,
        fh=list(range(1, 169)),
        step_length=168,
    )
    
    results = evaluate(
        forecaster=pipeline,
        y=y,
        cv=cv,
        scoring=mean_absolute_error,
        return_data=False,
    )
    
    print(results[["test__DynamicForecastingErrorMetric", "fit_time"]].round(3))
    print(f"\nMean CV MAE: {results['test__DynamicForecastingErrorMetric'].mean():.3f} °C")

     

    Output:

       test__DynamicForecastingErrorMetric  fit_time
    0                                0.627     0.274
    1                                0.585     0.100
    
    Mean CV MAE: 0.606 °C

     

    evaluate returns a DataFrame with per-fold metrics and timing. The cross-validation MAE confirms that the model generalizes consistently across different time windows in the data.

     

    # Next Steps

     
    This article covered the core forecasting workflow in sktime, but the library extends far beyond basic prediction tasks.

    It also supports time-series classification, probabilistic forecasting with uncertainty estimates, training shared models across multiple related time series, adapting traditional machine learning algorithms for sequential forecasting, and automating model selection and tuning workflows.

    One of sktime’s biggest strengths is its consistent API and integration with the broader Python machine learning ecosystem, making experimentation easier for both beginners and experienced practitioners. The sktime docs and example notebooks are especially well-written and are worth bookmarking if you regularly work with forecasting or temporal data problems.
     
     

    Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.



    Related posts:

    5 Things You Need to Know Before Using OpenClaw

    How Confessions Can Keep Language Models Honest?

    3 NumPy Tricks for Numerical Performance

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleCNET’s Shopping Experts Found the Best Deals of the Week So You Don’t Have To
    Next Article Is Lebanon included? Country hopeful for US-Iran ceasefire, despite doubts | Israel attacks Lebanon News
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    3 NumPy Tricks for Numerical Performance

    June 12, 2026
    Business & Startups

    Pairing Claude Code with Local Models

    June 12, 2026
    Business & Startups

    How to Generate AI Videos using Gemini

    June 12, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025196 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025122 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202596 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025196 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 2025122 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202596 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.