Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    Can the UN Security Council be reformed? | United Nations

    February 15, 2026

    Build a Powerful AI Research Pipeline with LM Studio and NotebookLM

    February 15, 2026

    OpenAI researcher quits over ChatGPT ads, warns of “Facebook” path

    February 15, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Build a Smart Recommendation System with Collaborative Filtering
    Build a Smart Recommendation System with Collaborative Filtering
    Business & Startups

    Build a Smart Recommendation System with Collaborative Filtering

    gvfx00@gmail.comBy gvfx00@gmail.comFebruary 15, 2026No Comments8 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Recommendation systems are the invisible engines that can personalize our social media, OTTs and e-commerce. Whether you are scrolling through Netflix for a new show or browsing Amazon for a gadget, these algorithms are working behind the scenes to predict something for you. One of the most effective ways to do this is by looking at how other people with similar tastes have behaved. This is the core of modern personalization. In this article, we will explore how to build one of these systems using collaborative filtering and make it smarter using OpenAI. Without any further ado, let’s dive in. 

    Table of Contents

    Toggle
    • What is Collaborative Filtering?
      • Downsides of a few Collaborative Filtering techniques
    • Singular Value Decomposition (SVD)
    • Movie Recommendation System
      • Understanding the Dataset 
      • Pre-requisites 
    • The Surprise Library for Data Splitting & SVD
      • Python Code 
        • 1. Data Preparation 
        • 2. Splitting the data and Finding the Best Model 
        • 3. The Smart Twist 
        • 4. Running the system 
    • Conclusion
    • Frequently Asked Questions
        • Login to continue reading and enjoy expert-curated content.
      • Related posts:
    • Finding the Best Gradient Boosting Method
    • Building Practical MLOps for a Personal ML Project
    • "Thinking with Images" in a 3B Model

    What is Collaborative Filtering?

    Collaborative filtering is a technique to make recommendations from a collection of different users. The intuition here is that if User 1 and User 2 both liked the same movies, they probably have similar tastes. If User 1 then watches a new movie and likes it, the system will recommend that movie to User 2. It does not need to know anything else like the genre or actors, it only needs to know who liked it. 

    The user-item matrix is used to perform collaborative filtering. This is generally created by using an item column like movies to create a pivot table with each value as a column in the resultant table.  

    Read more: Guide to Collaborative Filtering

    Downsides of a few Collaborative Filtering techniques

    There are two common ways to perform collaborative filtering, but both have downsides: 

    1. User-User Filtering: This finds users who are similar to you. The problem is that the number of users in a system can grow to millions, making it computationally very slow to compare everyone. Also, people’s tastes change over time, which can confuse the system or require very frequent retrainings for the system. 
    2. Item-Item Filtering: This finds movies based on the item to item similarity. While this is more stable than user-user filtering, it still struggles with sparsity. This happens because most users only rate a fraction of the thousands of movies available.  

    Singular Value Decomposition (SVD)

    The intuition here is to use matrix factorization using Singular Value Decomposition (SVD) to decompose a sparse matrix into lower dimension later factor matrices. This is a user-item collaborative filtering technique and this is the collaborative filtering technique we’ll pick for our recommendation system.  

    Movie Recommendation System

    Let’s understand the data and build our recommendation system with the earlier discussed SVD Collaborative Filtering technique. 

    Note: Due to the size of the code only the important parts of the code have been explained, you can refer to the complete notebook here: (https://www.kaggle.com/code/mounishv/movie-recommender) 

    Understanding the Dataset 

    For this project, we are using The Movies Dataset (https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset), which is a collection of metadata for over 45,000 films. While the full dataset is massive, we specifically use the ratings_small.csv file. This smaller version contains about 100,000 ratings from 700 users on 9,000 movies. We use the small version because it allows us to train models quickly.  

    Pre-requisites 

    We will use: 

    The Surprise Library for Data Splitting & SVD

    The Surprise library is specifically built for recommendations. It simplifies the process of loading data and testing different algorithms. Before training, we split our data into a training set and a test set using surprise and also use the inbuilt implementation for SVD.  

    Python Code 

    The provided code follows a professional workflow for building and refining a model. 

    Requirements 

    !pip install "numpy<2" 
    !pip install -q openai

    Note: Restart your Colab session before you proceed  

    1. Data Preparation 

    The code first merges movie IDs from different files to ensure the ratings and movie titles match up correctly. 

    import pandas as pd 
    from surprise import Dataset, Reader, SVD 
    from surprise.model_selection import GridSearchCV, train_test_split 
    from surprise import accuracy 
    
    # Kaggle path for The Movies Dataset 
    path="/kaggle/input/the-movies-dataset/" 
    
    # Loading relevant files 
    ratings = pd.read_csv(path + 'ratings_small.csv') 
    
    metadata = pd.read_csv(path + 'movies_metadata.csv', low_memory=False) 
    links = pd.read_csv(path + 'links_small.csv') 
    
    ratings['movieId'] = pd.to_numeric(ratings['movieId'], errors="coerce").astype('Int32') 
    
    ratings = ratings.merge(links[['movieId', 'tmdbId']], on='movieId', how='left')

    2. Splitting the data and Finding the Best Model 

    # Initialize the Reader for Surprise (ratings are 1-5) 
    reader = Reader(rating_scale=(0.5, 5.0)) 
    
    # Load the dataframe into Surprise format 
    data = Dataset.load_from_df( 
       ratings[['userId', 'movieId', 'rating']], 
       reader 
    ) 
    
    # Split into 75% training and 25% testing 
    trainset, testset = train_test_split(data, test_size=0.25, random_state=42)

    Instead of guessing the best settings, the code uses GridSearchCV. This automatically tests different versions of the SVD to find the one with the lowest RMSE. 

    # Define the parameter grid 
    param_grid = { 
       'n_factors': [10, 20, 50], 
       'n_epochs': [10, 20], 
       'lr_all': [0.005, 0.01], # learning rate 
       'reg_all': [0.02, 0.1]   # regularization 
    } 
    
    # Run Grid Search with 3-fold cross-validation 
    gs = GridSearchCV(SVD, param_grid, measures=['rmse'], cv=3, n_jobs=-1) 
    gs.fit(data) 
    
    # Best RMSE score 
    print(f"Best RMSE score found: {gs.best_score['rmse']}") 
    
    # Combination of parameters that gave the best RMSE score 
    print(f"Best parameters: {gs.best_params['rmse']}")
    Best RMSE score found: 0.8902760026938319 

    Best parameters: {'n_factors': 50, 'n_epochs': 20, 'lr_all': 0.01, 'reg_all': 0.1}

    3. The Smart Twist 

    The most unique part of this code is how it uses an LLM to help the user. Once the SVD model predicts the top 5 movies for a user, an LLM (GPT-4.1 mini) asks a question to help the user pick just one. 

    import numpy as np 
    from openai import OpenAI 
    from collections import defaultdict 
    from sklearn.metrics.pairwise import cosine_similarity 
    
    client = OpenAI(api_key=OPENAI_API_KEY)

    We’ll define two functions to implement our idea. One function get_top_5_for_user will retrieve 5 recommendations for the user and the other smart_recommendation will perform the following tasks: 

    • Uses metadata to get more context on the 5 movies 
    • Passes them to an LLM to phrase a question to the user 
    • The answer from the user will be used to give his final recommendation using cosine similarity. 

    Question-Creation logic 

    movie_list_str = "\n".join([f"- {m['title']}: {m['desc']}" for m in movie_info]) 
    prompt = f"I have selected these 5 movies for a user based on their history:\n{movie_list_str}\n\n" \ 
            "Frame one short, engaging question to help the user choose between these specific options." 
    
    question = client.chat.completions.create( 
       model="gpt-4.1-mini", 
       messages=[{"role": "user", "content": prompt}] 
    ).choices[0].message.content

    Semantic matching logic (using cosine similarity)

    resp_vec = client.embeddings.create( 
       input=[user_response], 
       model="text-embedding-3-small" 
    ).data[0].embedding 
    
    movie_texts = [f"{m['title']} {m['desc']}" for m in movie_info] 
    
    movie_vecs = [e.embedding for e in client.embeddings.create( 
       input=movie_texts,
       model="text-embedding-3-small" 
    ).data] 
    
    scores = cosine_similarity([resp_vec], movie_vecs)[0] 
    winner_idx = np.argmax(scores)

    4. Running the system 

    Predicting User Rating: 

    # Pick a random user and movie from the test set 
    uid = testset[0][0] 
    iid = testset[0][1] 
    true_r = testset[0][2]
    
    pred = final_model.predict(uid, iid) 
    
    print(f"\nUser: {uid}") 
    print(f"Movie: {iid}") 
    print(f"Actual Rating: {true_r}") 
    print(f"Predicted Rating: {pred.est:.2f}")
    User: 30 
    Movie: 2856 
    Actual Rating: 4.0
    Predicted Rating: 3.72

    Smart Recommender 

    top_5 = get_top_5_for_user(predictions, target_uid=testset[0][0]) 
    final_movie, score = llm_recommendation(top_5, metadata, links) 
    
    print(f"\nFinal Recommendation: {final_movie['title']} (Match Score: {score:.2f})")
    Agent: Are you in the mood for a gripping drama, a thrilling action-packed story, a classic comedy adventure, or an enchanting animated fantasy? 

    Your answer:  animated movie 

    Final Recommendation: How to Train Your Dragon (Match Score: 0.32)

    As you can see, when I said animated movie, the system recommended “How To Train Your Dragon” based on my current mood. Making use of the cosine similarity between my answer and the movie descriptions to pick the final recommendation. 

    Conclusion

    We have successfully built our smart recommendation system. By using SVD using the Surprise library we have mitigated the issues with other collaborative filtering techniques. Adding an LLM to the mix makes the system better and also mood-based rather than having a static system, although the level of personalization could be even higher by including the user data as well in the question. Also it’s important to note that we have to frequently retrain a collaborative filtering model on the latest data to keep the recommendations relevant.  

    Frequently Asked Questions

    Q1. What is the similarity used in User-User Collaborative Filtering?

    A. It is Pearson correlation, it measures similarity between two users by comparing their rating patterns and checking how strongly their preferences move together.

    Q2. What is cosine similarity?

    A. Cosine similarity measures how similar two vectors are by calculating the angle between them, commonly used for text and embeddings. 

    Q3. What is association rule mining?

    A. Association rule mining finds relationships between items in datasets, like products frequently bought together, using support, confidence, and lift metrics. 


    Mounish V

    Passionate about technology and innovation, a graduate of Vellore Institute of Technology. Currently working as a Data Science Trainee, focusing on Data Science. Deeply interested in Deep Learning and Generative AI, eager to explore cutting-edge techniques to solve complex problems and create impactful solutions.

    Login to continue reading and enjoy expert-curated content.

    Related posts:

    How Do You Create A Sentiment Analysis Process?

    How to Access and Use Qwen3-Coder-Next?

    CSV vs. Parquet vs. Arrow: Storage Formats Explained

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleWe’re Tracking Streaming Price Hikes in 2026: Spotify, Paramount Plus, Crunchyroll and Others
    Next Article ‘Very hopeful’: Cautious optimism among Gen Z Bangladeshis after key vote | Bangladesh Election 2026 News
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Build a Powerful AI Research Pipeline with LM Studio and NotebookLM

    February 15, 2026
    Business & Startups

    Beginner’s Guide to Automating ML Workflows

    February 15, 2026
    Business & Startups

    Building Your Modern Data Analytics Stack with Python, Parquet, and DuckDB

    February 15, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    BMW Will Put eFuel In Cars Made In Germany From 2028

    October 14, 202511 Views

    Best Sonic Lego Deals – Dr. Eggman’s Drillster Gets Big Price Cut

    December 16, 20259 Views

    What is Fine-Tuning? Your Ultimate Guide to Tailoring AI Models in 2025

    October 14, 20259 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.