Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Living in the dark: Gaza’s struggle for electricity | Israel-Palestine conflict News

    March 29, 2026

    Excel 101: Cell and Column Merge vs Combine

    March 29, 2026

    Today’s NYT Mini Crossword Answers for March 29

    March 29, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»3 Hyperparameter Tuning Techniques That Go Beyond Grid Search
    3 Hyperparameter Tuning Techniques That Go Beyond Grid Search
    Business & Startups

    3 Hyperparameter Tuning Techniques That Go Beyond Grid Search

    gvfx00@gmail.comBy gvfx00@gmail.comJanuary 20, 2026No Comments9 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    3 Hyperparameter Tuning Techniques That Go Beyond Grid Search
    Image by Author

     

    Table of Contents

    Toggle
    • # Introduction
    • # Performing Initial Setup
    • # Implementing Randomized Search
    • # Applying Bayesian Optimization
    • # Utilizing Successive Halving
    • # Comparing the Final Results
      • Related posts:
    • Building Practical MLOps for a Personal ML Project
    • 7 XGBoost Tricks for More Accurate Predictive Models
    • 5 Techniques to Present AI-Generated Insights

    # Introduction

     
    When building machine learning models with moderate to high complexity, there is an ample range of model parameters that are not learned from data, but instead must be set by us a priori: these are known as hyperparameters. Models like random forest ensembles and neural networks have a variety of hyperparameters to be adjusted, such that each one can take one of many different values. As a result, the possible ways to configure even a small subset of hyperparameters become nearly endless. This entails a problem: identifying the optimal configuration of these hyperparameters — i.e. the one(s) yielding the best model performance — could become like trying to find a needle in a haystack — or even worse: in an ocean.

    This article builds on a previous guide from Machine Learning Mastery regarding the art of hyperparameter tuning, and adopts a hands-on approach to illustrate the use of intermediate to advanced hyperparameter tuning techniques in practice.

    Specifically, you will learn how to apply these three hyperparameter tuning techniques:

    • randomized search
    • bayesian optimization
    • successive halving

     

    # Performing Initial Setup

     
    Before beginning, we will import the necessary libraries and dependencies — if you have a “Module not Found” error for any of these, be sure to pip install the library in question first. We will be using NumPy, scikit-learn, and Optuna:

    import numpy as np
    import time
    from sklearn.datasets import load_digits
    from sklearn.model_selection import train_test_split, cross_val_score
    from sklearn.ensemble import RandomForestClassifier
    import optuna
    import warnings
    warnings.filterwarnings('ignore')

     

    We will also load the dataset used in the three examples: Modified National Institute of Standards and Technology (MNIST), a dataset for classification of low-resolution images of handwritten digits.

    print("=" * 70)
    print("LOADING MNIST DATASET FOR IMAGE CLASSIFICATION")
    print("=" * 70)
    
    # Load digits dataset (lightweight version of MNIST: 8x8 images, 1797 samples)
    digits = load_digits()
    X, y = digits.data, digits.target
    
    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    print(f"Training instances: {X_train.shape[0]}")
    print(f"Test instances: {X_test.shape[0]}")
    print(f"Features: {X_train.shape[1]}")
    print(f"Classes: {len(np.unique(y))}")
    print()

     

    Next, we define a hyperparameter search space; that is, we identify which parameters and subsets of values within each one we want to try in combination.

    print("=" * 70)
    print("HYPERPARAMETER SEARCH SPACE")
    print("=" * 70)
    
    # Typical hyperparameters to explore in a random forest ensemble
    param_space = {
        'n_estimators': (10, 200),      # Number of trees
        'max_depth': (5, 50),            # Maximum tree depth
        'min_samples_split': (2, 20),   # Min samples to split node
        'min_samples_leaf': (1, 10),    # Min samples in leaf node
        'max_features': (0.1, 1.0)      # Fraction of features to consider
    }
    
    print("Search space:")
    for param, bounds in param_space.items():
        print(f"  {param}: {bounds}")
    print()

     

    As a final preparatory step, we define a function that will be reused. It encapsulates the process of training and evaluating a random forest ensemble model under one specific hyperparameter configuration, using cross-validation (CV) alongside classification accuracy to determine the model’s quality. Note that this function may be called a large number of times by each of the three techniques we will implement — as many as there are hyperparameter value combinations to try.

    def evaluate_model(params, X_train, y_train, cv=3):
        # Instantiate a random forest model with given hyperparameters
        model = RandomForestClassifier(
            n_estimators=int(params['n_estimators']),
            max_depth=int(params['max_depth']),
            min_samples_split=int(params['min_samples_split']),
            min_samples_leaf=int(params['min_samples_leaf']),
            max_features=float(params['max_features']),
            random_state=42,
            n_jobs=-1  # Use all CPU cores for speed
        )
        
        # Use CV to measure performance
        # This gives us a more robust estimate than a single train/val split
        scores = cross_val_score(model, X_train, y_train, cv=cv, 
                                 scoring='accuracy', n_jobs=-1)
        # Return the average cross-validation accuracy
        return np.mean(scores)

     

    Now we are ready to try the three techniques!

     

    # Implementing Randomized Search

     
    As its name suggests, randomized search randomly samples hyperparameter combinations from the search space, rather than exhaustively trying all possible combinations in a pre-defined search space, like grid search does. Every trial is independent, with no knowledge gained from previous trials. Still, this is a highly effective method in many situations, usually finding high-quality solutions more quickly than grid search.

    Here is how a randomized search can be implemented and used on random forest ensembles to classify MNIST data:

    def randomized_search(n_trials=30):
        start_time = time.time() # Optional: used to measure execution time
        results = []
        
        print(f"\nRunning {n_trials} random trials...")
        
        for i in range(n_trials):
            # RANDOM SAMPLING: hyperparameters are sampled independently using numpy's random number generation
            params = {
                'n_estimators': np.random.randint(param_space['n_estimators'][0], 
                    param_space['n_estimators'][1]),
                'max_depth': np.random.randint(param_space['max_depth'][0], 
                    param_space['max_depth'][1]),
                'min_samples_split': np.random.randint(param_space['min_samples_split'][0], 
                    param_space['min_samples_split'][1]),
                'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], 
                    param_space['min_samples_leaf'][1]),
                'max_features': np.random.uniform(param_space['max_features'][0], 
                    param_space['max_features'][1])
            }
            
            # Evaluate a randomly defined configuration
            score = evaluate_model(params, X_train, y_train)
            results.append({'params': params, 'score': score})
            
            # Provide a progress update every 10 trials, for informative purposes
            if (i + 1) % 10 == 0:
                best_so_far = max(results, key=lambda x: x['score'])
                print(f"  Trial {i+1}/{n_trials}: Best score so far = {best_so_far['score']:.4f}")
        
        # Measure total time taken
        elapsed_time = time.time() - start_time
        
        # Identify best configuration found
        best_result = max(results, key=lambda x: x['score'])
        
        print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
        print(f"Best validation accuracy: {best_result['score']:.4f}")
        print(f"Best parameters: {best_result['params']}")
        
        return best_result, results
    
    # Call the method to perform randomized search over 30 trials
    random_best, random_results = randomized_search(n_trials=30)

     

    Comments are provided alongside the code to facilitate understanding. The results obtained will be similar to the following:

    Running 30 random trials...
      Trial 10/30: Best score so far = 0.9617
      Trial 20/30: Best score so far = 0.9617
      Trial 30/30: Best score so far = 0.9617
    
    ✓ Completed in 64.59 seconds
    Best validation accuracy: 0.9617
    Best parameters: {'n_estimators': 195, 'max_depth': 16, 'min_samples_split': 8, 'min_samples_leaf': 2, 'max_features': 0.28306570555707966}

     

    Take note of the time it took to run the hyperparameter search process, as well as the best validation accuracy achieved. In this case, it appears 10 trials were sufficient to find the optimal configuration.

     

    # Applying Bayesian Optimization

     
    This method employs an auxiliary or surrogate model — specifically, a probabilistic model based on Gaussian processes or tree-based structures — to predict the best-performing hyperparameter settings. Trials are not independent; each trial “learns” from previous trials. Additionally, this method attempts to balance exploration (trying new areas in the solution space) and exploitation (refining promising areas). In summary, we have a smarter method than grid and randomized search.

    The Optuna library provides a specific implementation of bayesian optimization for hyperparameter tuning that uses a Tree-structured Parzen Estimator (TPE). It classifies trials into “good” or “bad” groups, models the probabilistic distribution across each, and samples from promising regions.

    The whole process can be implemented as follows:

    def bayesian_optimization(n_trials=30):
        """
        Implementation of Bayesian optimization using Optuna library.
        """
        start_time = time.time()
        
        def objective(trial):
            """
            Optuna objective function: given a trial, returns a score.
            """
            # Optuna can suggest values based on past performance
            params = {
                'n_estimators': trial.suggest_int('n_estimators', 
                    param_space['n_estimators'][0],
                    param_space['n_estimators'][1]),
                'max_depth': trial.suggest_int('max_depth',
                    param_space['max_depth'][0],
                    param_space['max_depth'][1]),
                'min_samples_split': trial.suggest_int('min_samples_split',
                    param_space['min_samples_split'][0],
                    param_space['min_samples_split'][1]),
                'min_samples_leaf': trial.suggest_int('min_samples_leaf',
                    param_space['min_samples_leaf'][0],
                    param_space['min_samples_leaf'][1]),
                'max_features': trial.suggest_float('max_features',
                    param_space['max_features'][0],
                    param_space['max_features'][1])
            }
            
            # Evaluate and return score (maximizing by default in Optuna)
            return evaluate_model(params, X_train, y_train)
        
        # The create_study() function is used in Optuna to manage and run
        # the overall optimization process
        print(f"\nRunning {n_trials} Bayesian optimization trials...")
        
        study = optuna.create_study(
            direction='maximize',  # We want to maximize accuracy
            sampler=optuna.samplers.TPESampler(seed=42)  # Bayesian algorithm
        )
        
        # Perform optimization process with progress callback
        def callback(study, trial):
            if trial.number % 10 == 9:
                print(f"  Trial {trial.number + 1}/{n_trials}: Best score = {study.best_value:.4f}")
        
        study.optimize(objective, n_trials=n_trials, callbacks=[callback], show_progress_bar=False)
        
        elapsed_time = time.time() - start_time
        
        print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
        print(f"Best validation accuracy: {study.best_value:.4f}")
        print(f"Best parameters: {study.best_params}")
        
        return study.best_params, study.best_value, study
    
    bayesian_best_params, bayesian_best_score, bayesian_study = bayesian_optimization(n_trials=30)

     

    Output (summarized):

    ✓ Completed in 62.66 seconds
    Best validation accuracy: 0.9673
    Best parameters: {'n_estimators': 150, 'max_depth': 33, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 0.19145126698170384}

     

    # Utilizing Successive Halving

     
    The final of the three methods, successive halving, balances the size of the search space with the allocated computing resources per possible configuration. It starts with an ample array of configurations but limited resources (e.g. training data) per configuration, gradually removing poor performers and allocating more resources to promising configurations — similar to a real-world tournament where stronger contestants “survive.”

    The following implementation applies successive halving guided by gradually modifying the training set size.

    def successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0):
        
        start_time = time.time()
        
        # Step 1: Defining initial hyperparameter configurations at random
        print(f"\nGenerating {n_initial} initial random configurations...")
        configs = []
        for _ in range(n_initial):
            config = {
                'n_estimators': np.random.randint(param_space['n_estimators'][0], 
                    param_space['n_estimators'][1]),
                'max_depth': np.random.randint(param_space['max_depth'][0], 
                    param_space['max_depth'][1]),
                'min_samples_split': np.random.randint(param_space['min_samples_split'][0], 
                    param_space['min_samples_split'][1]),
                'min_samples_leaf': np.random.randint(param_space['min_samples_leaf'][0], 
                    param_space['min_samples_leaf'][1]),
                'max_features': np.random.uniform(param_space['max_features'][0], 
                    param_space['max_features'][1])
            }
            configs.append(config)
        
        # Step 2: apply tournament-like successive rounds of elimination
        current_configs = configs
        current_resource = min_resource
        round_num = 1
        
        while len(current_configs) > 1 and current_resource <= max_resource:
            # Determine amount of training instances to use in the current round
            n_samples = int(len(X_train) * current_resource)
            print(f"\n--- Round {round_num}: Evaluating {len(current_configs)} configs ---")
            print(f"    Using {current_resource*100:.0f}% of training data ({n_samples} samples)")
            
            # Subsample training instances
            indices = np.random.choice(len(X_train), size=n_samples, replace=False)
            X_subset = X_train[indices]
            y_subset = y_train[indices]
            
            # Evaluate all current configs with the current resources
            scores = []
            for i, config in enumerate(current_configs):
                score = evaluate_model(config, X_subset, y_subset, cv=2)  # Use cv=2 (minimum)
                scores.append(score)
                
                if (i + 1) % 10 == 0 or (i + 1) == len(current_configs):
                    print(f"    Evaluated {i+1}/{len(current_configs)} configs...")
            
            # Elimination policy: keep top-performing half only
            n_keep = max(1, len(current_configs) // 2)
            sorted_indices = np.argsort(scores)[::-1]  # Descending order
            current_configs = [current_configs[i] for i in sorted_indices[:n_keep]]
            
            best_score = scores[sorted_indices[0]]
            print(f"    → Keeping top {n_keep} configs. Best score: {best_score:.4f}")
            
            # Update resources, doubling them for the next round
            current_resource = min(current_resource * 2, max_resource)
            round_num += 1
        
        # Final evaluation of best config found, given full training set
        best_config = current_configs[0]
        final_score = evaluate_model(best_config, X_train, y_train, cv=3)
        
        elapsed_time = time.time() - start_time
        
        print(f"\n✓ Completed in {elapsed_time:.2f} seconds")
        print(f"Best validation accuracy: {final_score:.4f}")
        print(f"Best parameters: {best_config}")
        
        return best_config, final_score
    
    halving_best, halving_score = successive_halving(n_initial=32, min_resource=0.25, max_resource=1.0)

     

    The final result obtained may look like the following:

    ✓ Completed in 56.18 seconds
    Best validation accuracy: 0.9645
    Best parameters: {'n_estimators': 158, 'max_depth': 39, 'min_samples_split': 5, 'min_samples_leaf': 2, 'max_features': 0.2269785516325355}

     

     

    # Comparing the Final Results

     
    In summary, all three methods found the optimal configuration with a validation accuracy ranging between 96% and 97%, with bayesian optimization achieving the best result by a small margin. The results are more discernible in terms of efficiency, with successive halving producing the quickest results in just over 56 seconds, compared to the 62-64 seconds taken by the other two techniques.
     
     

    Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

    Related posts:

    11 Business & Tech Factors to Consider Before You Start

    How to Use Microsoft Power Automate? [In Under 10 Minutes]

    Our favourite Black Friday deal to Learn SQL, AI, Python, and become a certified data analyst!

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleDr. Gladys West, whose mathematical models inspired GPS, dies at 95
    Next Article Gaza’s ‘phase two’ from a distance: Why hope still feels out of reach | Israel-Palestine conflict News
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Excel 101: Cell and Column Merge vs Combine

    March 29, 2026
    Business & Startups

    Use New Google AI Studio Tools to Build Full-Stack App in Minutes

    March 28, 2026
    Business & Startups

    Analytics Patterns Every Data Scientist Should Master

    March 28, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025122 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025122 Views

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.