Close Menu

    Subscribe to Updates

    Get the latest news from tastytech.

    What's Hot

    BMW X7 Rival Gets Automatic Doors

    May 12, 2026

    AI automates HR compliance, except for the area tech companies need

    May 12, 2026

    Guardrails for LLMs: Measuring AI ‘Hallucination’ and Verbosity

    May 12, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram
    tastytech.intastytech.in
    Subscribe
    • AI News & Trends
    • Tech News
    • AI Tools
    • Business & Startups
    • Guides & Tutorials
    • Tech Reviews
    • Automobiles
    • Gaming
    • movies
    tastytech.intastytech.in
    Home»Business & Startups»Guardrails for LLMs: Measuring AI ‘Hallucination’ and Verbosity
    Guardrails for LLMs: Measuring AI ‘Hallucination’ and Verbosity
    Business & Startups

    Guardrails for LLMs: Measuring AI ‘Hallucination’ and Verbosity

    gvfx00@gmail.comBy gvfx00@gmail.comMay 12, 2026No Comments6 Mins Read
    Share
    Facebook Twitter LinkedIn Pinterest Email



     

    Table of Contents

    Toggle
    • # Introduction
    • # Setting a Complexity Budget with Textstat
    • # Implementing the LangChain Pipeline
    • # Wrapping Up
      • Related posts:
    • Win 2026! 9 AI Prompts to Enter Beast Mode This New Year
    • Here is How to Use it
    • 5 Useful Python Scripts for Busy Data Analysts

    # Introduction

     
    Large language models (LLMs) have a taste for using “flowery”, sometimes overly verbose language in their responses. Ask a simple question, and chances are you may get flooded with paragraphs of overly detailed, enthusiastic, and complex prose. This usual behavior is rooted in their training, as they are optimized to be as helpful and conversational as possible.

    Unfortunately, verbosity is a serious aspect to have under the radar, and can be argued to often correlate with an increased odds of a major issue: hallucinations. The more words are generated in a response, the higher the chances of drifting from grounded knowledge and venturing into “the art of fabrication”.

    In sum, robust guardrails are needed to prevent this double-sided problem, starting with verbosity checks. This article shows how to use the Textstat Python library to measure readability and detect overly complex responses before they reach the end user, forcing the model to refine its response.

     

    # Setting a Complexity Budget with Textstat

     
    The Textstat Python library can be used to compute scores such as the automated readability index (ARI); it estimates the grade level (level of study) needed to understand a piece of text, such as a model response. If this complexity metric exceeds a budget or threshold — such as 10.0, equivalent to a 10th-grade reading level — a re-prompting loop can be automatically triggered to require a more concise, simpler response. This strategy not only dispels flowery language but may also help reduce hallucination risks, because the model adheres to core facts more strictly as a result.

     

    # Implementing the LangChain Pipeline

     
    Let’s see how to implement the above-described strategy and integrate it into a LangChain pipeline that can be easily run in a Google Colab notebook. You will need a Hugging Face API token, obtainable for free at https://huggingface.co/settings/tokens. Create a new “secret” named HF_TOKEN on the left-hand side menu of Colab by clicking on the “Secrets” icon (it looks like a key). Paste the generated API token in the “Value” field, and you are all set up!

    To start, install the necessary libraries:

    !pip install textstat langchain_huggingface langchain_community

     

    The following code is Google Colab-specific, and you may need to adjust it accordingly if you are working in a different environment. It focuses on recovering the stored API token:

    from google.colab import userdata
    
    # Obtain Hugging Face API token saved in your Colab session's Secrets
    HF_TOKEN = userdata.get('HF_TOKEN')
    
    # Verify token recovery
    if not HF_TOKEN:
        print("WARNING: The token 'HF_TOKEN' wasn't found. This may cause errors.")
    else:
        print("Hugging Face Token loaded successfully.")

     

    In the following piece of code, we perform several actions. First, it sets up components for local text generation via a pre-trained Hugging Face model — specifically distilgpt2. After that, the model is integrated into a LangChain pipeline.

    import textstat
    from langchain_core.prompts import PromptTemplate
    # Importing necessary classes for local Hugging Face pipelines
    from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
    from langchain_community.llms import HuggingFacePipeline
    
    # Initializing a free-tier, local-friendly, compatible LLM for text generation
    model_id = "distilgpt2"
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForCausalLM.from_pretrained(model_id)
    
    # Creating a text-generation pipeline
    pipe = pipeline(
        "text-generation", 
        model=model, 
        tokenizer=tokenizer, 
        max_new_tokens=100,
        device=0 # Use GPU if available, otherwise it will default to CPU
    )
    
    # Wrapping the pipeline in HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)

     

    Our core mechanism for measuring and managing verbosity is implemented next. The following function generates a summary of text passed to it (assumed to be an LLM’s response) and tries to ensure the summary does not exceed a threshold level of complexity. Note that when using an appropriate prompt template, generation models like distilgpt2 can be used for obtaining text summaries, although the quality of such summarizations may not match that of heavier, summarization-focused models. We chose this model due to its reliability for local execution in a constrained environment.

    def safe_summarize(text_input, complexity_budget=10.0):
        print("\n--- Starting Summary Process ---")
        print(f"Input text length: {len(text_input)} characters")
        print(f"Target complexity budget (ARI score): {complexity_budget}")
    
        # Step 1: Initial Summary Generation
        print("Generating initial comprehensive summary...")
        base_prompt = PromptTemplate.from_template(
            "Provide a comprehensive summary of the following: {text}"
        )
        chain = base_prompt | llm
        summary = chain.invoke({"text": text_input})
        print("Initial Summary generated:")
        print("-------------------------")
        print(summary)
        print("-------------------------")
    
        # Step 2: Measure Readability
        ari_score = textstat.automated_readability_index(summary)
        print(f"Initial ARI Score: {ari_score:.2f}")
    
        # Step 3: Enforce Complexity Budget
        if ari_score > complexity_budget:
            print("Budget exceeded! Initial summary is too complex.")
            print("Triggering simplification guardrail...")
            simplification_prompt = PromptTemplate.from_template(
                "The following text is too verbose. Rewrite it concisely "
                "using simple vocabulary, stripping away flowery language:\n\n{text}"
            )
            simplify_chain = simplification_prompt | llm
            simplified_summary = simplify_chain.invoke({"text": summary})
    
            new_ari = textstat.automated_readability_index(simplified_summary)
            print("Simplified Summary generated:")
            print("-------------------------")
            print(simplified_summary)
            print("-------------------------")
            print(f"Revised ARI Score: {new_ari:.2f}")
            summary = simplified_summary
        else:
            print("Initial summary is within complexity budget. No simplification needed.")
    
        print("--- Summary Process Finished ---")
        return summary

     

    Notice also in the code above that ARI scores are calculated to estimate text complexity.

    The final part of the code example tests the function defined previously, passing sample text and a complexity budget of 10.0, and printing the final results.

    # 1. Providing some highly verbose, complex sample text
    sample_text = """
    The inextricably intertwined permutations of cognitive computational arrays within the 
    realm of Large Language Models often precipitate a cascade of unnecessarily labyrinthine 
    lexical structures. This propensity for circumlocution, whilst seemingly indicative of 
    profound erudition, frequently obfuscates the foundational semantic payload, thereby 
    rendering the generated discourse significantly less accessible to the quintessential layperson.
    """
    
    # 2. Calling the function
    print("Running summarizer pipeline...\n")
    final_output = safe_summarize(sample_text, complexity_budget=10.0)
    
    # 3. Printing the final result
    print("\n--- Final Guardrailed Summary ---")
    print(final_output)

     

    The resulting printed messages may be quite lengthy, but you will see a subtle decrease in the ARI score after calling the pre-trained model for summarization. Do not expect miraculous results, though: the model chosen, while lightweight, is not great at summarizing text, so the ARI score reduction is rather modest. You can try using other models like google/flan-t5-small to see how they perform for text summarization, but be warned — these models will be heavier and harder to run.

     

    # Wrapping Up

     
    This article shows how to implement an infrastructure for measuring and controlling overly verbose LLM responses by calling an auxiliary model to summarize them before approving their level of complexity. Hallucinations are a byproduct of high verbosity in many scenarios. While the implementation shown here focuses on assessing verbosity, there are specific checks that can also be used for measuring hallucinations — such as semantic consistency checks, natural language inference (NLI) cross-encoders, and LLM-as-a-judge solutions.
     
     

    Iván Palomares Carrascosa is a leader, writer, speaker, and adviser in AI, machine learning, deep learning & LLMs. He trains and guides others in harnessing AI in the real world.

    Related posts:

    Excel 101: COUNT and COUNTIF Functions

    What is Agentic AI?

    A Guide to Kedro: Your Production-Ready Data Science Toolbox

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous Article‘The Rings of Power’: Here’s When Season 3 Hits Prime Video
    Next Article AI automates HR compliance, except for the area tech companies need
    gvfx00@gmail.com
    • Website

    Related Posts

    Business & Startups

    Top 10 LLM Research Papers of 2026

    May 11, 2026
    Business & Startups

    10 GitHub Repositories to Master FastAPI

    May 11, 2026
    Business & Startups

    Understanding AI Agent Memory Patterns: A Guide with LangGraph

    May 9, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025150 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202581 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202576 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram

    Subscribe to Updates

    Get the latest tech news from tastytech.

    About Us
    About Us

    TastyTech.in brings you the latest AI, tech news, cybersecurity tips, and gadget insights all in one place. Stay informed, stay secure, and stay ahead with us!

    Most Popular

    Black Swans in Artificial Intelligence — Dan Rose AI

    October 2, 2025150 Views

    Every Clue That Tony Stark Was Always Doctor Doom

    October 20, 202581 Views

    We let ChatGPT judge impossible superhero debates — here’s how it ruled

    December 31, 202576 Views

    Subscribe to Updates

    Get the latest news from tastytech.

    Facebook X (Twitter) Instagram Pinterest
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 TastyTech. Designed by TastyTech.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.