Portfolio | Prince Pal

Building sophisticated AI agents is the cutting edge of modern engineering. But even the smartest Large Language Models (LLMs) can occasionally stumble—generating inaccuracies, attempting critical actions without oversight, or missing vital information needed to complete a task.

This is where Human in the Loop (HiL) becomes absolutely indispensable.

HiL is a powerful design pattern that allows your AI agents to pause indefinitely, wait for human feedback or clarification, and then seamlessly resume their task. By integrating human input at key stages, you enable validation, corrections, and informed decisions—ultimately building more powerful, trustworthy reasoning agents.

The Core Problem: When Agents Go Rogue

Without HiL, agents can act too quickly, leading to costly mistakes that could have been easily prevented with a simple human checkpoint.

Consider this scenario: You build a refund agent designed to automatically process student refund requests from emails. You ask the agent, "Do we have any refund requests? I want to process them today."

Without HiL, the agent might immediately process the request because it has access to the necessary tools. But refund processes are critical financial actions—you cannot rely 100% on AI, as models can hallucinate or misinterpret context. You need a human to review if the request is legitimate before initiating the refund.

By implementing HiL, the agent will interrupt the process upon finding a request and ask for human approval: "Type 1 to approve, 2 to reject." This pattern ensures the agent only performs critical actions once approval is explicitly given.

Why Your AI Needs a Human Touch

Human in the Loop transforms simple automatons into robust, trustworthy systems. Here's why it's essential:

Accuracy: Validate AI decisions before executing critical actions
Safety: Prevent expensive mistakes or unauthorized operations
Flexibility: Handle edge cases that require human judgment
Trust: Build confidence in your AI systems by maintaining oversight
Learning: Gather feedback to improve agent performance over time

The key insight is that AI agents don't need to be perfect—they just need to know when to ask for help.

The HiL Workflow: Ask, Pause, Resume

One of the most common HiL applications is gathering clarifying information from users. This is crucial when the agent realizes it lacks data needed to achieve its goal.

Let's walk through a weather lookup example:

Step-by-Step Flow

1. The Agent Reasons

The agent receives a query: "What's the weather like?" It immediately recognizes that it's missing the user's location—a critical piece of information.

2. Calling the Ask Human Tool

The agent calls a special function called the ask human tool. This tool is provided to the LLM so it can invoke it whenever human input is needed.

3. Interruption

The ask human tool immediately pauses the entire workflow and passes the clarifying question to the human: "Where are you currently located?"

4. Human Input

The workflow waits indefinitely. The human provides the required answer: "Delhi, India" or "San Francisco, California."

5. Resumption

The human response is propagated back to the agent. The agent now has the necessary information and resumes the workflow from the exact point of interruption.

6. Tool Execution

The agent uses a web search tool to look up current weather conditions at the specified location.

7. Final Response

The result is returned to the agent, which provides the final answer: "It's currently 28°C and sunny in Delhi, India."

# Simplified conceptual example
async def weather_agent(query):
    # Agent realizes it needs location
    if location_missing(query):
        # Interrupt and ask human
        location = await ask_human("Where are you located?")

    # Resume with the provided location
    weather = await search_weather(location)
    return f"Current weather in {location}: {weather}"

Figure 1: HiL workflow showing interruption and resumption points

How LangGraph Makes HiL Possible

Implementing HiL requires a robust way to manage state—the process must halt mid-execution and later restart exactly where it left off. This is where LangGraph shines.

The Interrupt Function

The core mechanism is LangGraph's specialized interrupt function. This function is designed to stop your workflow execution mid-stream to collect user inputs. It pauses the workflow "gracefully," saves the program state, and allows execution to continue later.

LangGraph offers two ways to use interrupts:

1. Configuration-Based Interrupts

Configure the agent to interrupt on a specific tool or node during the compile step:

from langgraph.graph import StateGraph
from langgraph.checkpoint.memory import MemorySaver

# Create graph with checkpoint
memory = MemorySaver()
graph = StateGraph(AgentState)

# Add nodes and edges
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)

# Compile with interrupt configuration
app = graph.compile(
    checkpointer=memory,
    interrupt_before=["tools"]  # Pause before tool execution
)

2. Explicit Interrupt Calls

Use the interrupt function directly within a node:

from langgraph.types import interrupt

def approval_node(state):
    # Pause for human approval
    approval = interrupt("Please approve this action (yes/no):")

    if approval.lower() == "yes":
        return {"approved": True}
    else:
        return {"approved": False}

Persistence Layer and Checkpoints

The interrupt function relies on LangGraph's persistence layer, which saves the entire graph state, allowing execution to be paused indefinitely.

Think of this like checkpoints or autosave in a video game. LangGraph automatically checkpoints the graph state after each step.

How it works:

When the workflow is interrupted, the checkpoint saves the state at that exact point
The system uses threads and checkpoints to manage state
A unique thread ID must be passed via configuration when invoking the agent
Multiple concurrent conversations or sessions can be managed simultaneously

# Invoke with thread ID
config = {"configurable": {"thread_id": "user-123"}}
result = await app.ainvoke({"messages": [user_message]}, config)

# Later, resume from the same checkpoint
config = {"configurable": {"thread_id": "user-123"}}
resume_result = await app.ainvoke(
    Command(resume="Delhi, India"),
    config
)

Why Not Use Simple Input Methods?

You might wonder: why not just use Python's input() function? Here's why LangGraph's approach is superior:

Web and API Compatible: Works in web applications and APIs, not just CLI
Multi-User Support: Handles multiple users and sessions concurrently
Crash Recovery: Survives program crashes and restarts
Asynchronous: Non-blocking, allowing for better performance
Scalable: Works in distributed systems and microservices

Beyond Clarification: Key HiL Design Patterns

While asking clarifying questions is powerful, HiL supports several sophisticated patterns for integrating human oversight:

1. Approve or Reject Pattern

Pause the graph before a critical step to allow human review and approval. If rejected, the graph can take an alternative path.

Use Case: Preventing accidental API calls or unauthorized financial transactions.

def refund_workflow(state):
    # Agent identifies refund request
    request = analyze_email(state["email"])

    # Interrupt for approval
    decision = interrupt(
        f"Refund request for ${request.amount}. Approve? (1=yes, 2=no)"
    )

    if decision == "1":
        process_refund(request)
        return {"status": "refunded", "amount": request.amount}
    else:
        return {"status": "rejected", "reason": "human_declined"}

2. Review and Edit State Pattern

Allow humans to review and edit the graph state, useful for correcting mistakes or refining LLM output.

Use Case: A human reviews a generated LinkedIn post draft, provides feedback ("make this shorter"), and the agent iterates until approval.

def content_generation_workflow(state):
    # Generate initial draft
    draft = generate_linkedin_post(state["topic"])

    # Show to human for review
    feedback = interrupt(f"Draft:\n{draft}\n\nFeedback (or 'approve'):")

    if feedback.lower() != "approve":
        # Iterate with feedback
        revised = revise_post(draft, feedback)
        state["draft"] = revised
        # Could loop back for another review

    return {"final_post": draft, "approved": True}

3. Provide Additional Context Pattern

Explicitly require human input for clarification or additional details to complete a complex task.

Use Case: Supporting complex multi-turn conversations where the agent needs domain-specific information.

Design Pattern Summary Table

Best Practices for Implementing HiL

1. Be Strategic About Interruptions

Don't interrupt for every minor decision—only for:

Critical or expensive operations
Actions that can't be easily undone
Situations where the agent lacks confidence
Requests for sensitive information

2. Provide Clear Context

When interrupting, give the human enough context to make an informed decision:

# ❌ Poor context
approval = interrupt("Approve?")

# ✅ Clear context
approval = interrupt(
    f"About to send email to {recipient}\n"
    f"Subject: {subject}\n"
    f"Preview: {body[:100]}...\n"
    f"Send? (yes/no)"
)

3. Handle Timeouts Gracefully

Set reasonable timeouts and default behaviors:

def safe_interrupt(message, timeout=300, default="reject"):
    try:
        response = interrupt(message, timeout=timeout)
        return response
    except TimeoutError:
        return default

4. Log All Human Decisions

Keep an audit trail of human interventions:

def log_human_decision(state, decision, context):
    state["audit_log"].append({
        "timestamp": datetime.now(),
        "decision": decision,
        "context": context,
        "user": state["user_id"]
    })
    return state

Real-World Example: Email Assistant

Let's build a complete example of an email assistant that uses HiL to ensure safe operations:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt
from typing import TypedDict

class EmailState(TypedDict):
    inbox: list
    action: str
    approved: bool
    result: str

def check_inbox(state: EmailState):
    # Simulate checking inbox
    emails = fetch_emails()
    urgent = [e for e in emails if e.priority == "high"]
    return {"inbox": urgent, "action": "review"}

def request_approval(state: EmailState):
    if not state["inbox"]:
        return {"approved": True, "action": "none"}

    # Show urgent emails to human
    email_summary = "\n".join([
        f"- From: {e.sender}, Subject: {e.subject}"
        for e in state["inbox"]
    ])

    decision = interrupt(
        f"Found {len(state['inbox'])} urgent emails:\n"
        f"{email_summary}\n\n"
        f"Reply to all? (yes/no)"
    )

    return {"approved": decision.lower() == "yes"}

def process_emails(state: EmailState):
    if not state["approved"]:
        return {"result": "Skipped by user"}

    # Process approved emails
    for email in state["inbox"]:
        send_reply(email)

    return {"result": f"Replied to {len(state['inbox'])} emails"}

# Build the graph
workflow = StateGraph(EmailState)
workflow.add_node("check", check_inbox)
workflow.add_node("approve", request_approval)
workflow.add_node("process", process_emails)

workflow.set_entry_point("check")
workflow.add_edge("check", "approve")
workflow.add_conditional_edges(
    "approve",
    lambda s: "process" if s["approved"] else END
)
workflow.add_edge("process", END)

# Compile with checkpointing
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

# Use the agent
config = {"configurable": {"thread_id": "email-session-1"}}
result = await app.ainvoke({"inbox": []}, config)

Common Pitfalls to Avoid

1. Over-Interrupting

Problem: Asking for approval at every step frustrates users.

Solution: Batch related decisions or only interrupt for truly critical actions.

2. Poor Error Messages

Problem: Vague interrupt messages confuse users.

Solution: Provide clear context, options, and consequences.

3. No Timeout Handling

Problem: Workflows hang indefinitely if the user doesn't respond.

Solution: Implement reasonable timeouts with safe defaults.

4. Losing State

Problem: Not using proper checkpointing causes lost work.

Solution: Always use LangGraph's checkpointer with persistent storage.

Advanced: Conditional HiL

Sometimes you want HiL only in specific circumstances:

def smart_agent_node(state):
    result = agent_reasoning(state)

    # Only interrupt if confidence is low
    if result["confidence"] < 0.7:
        verification = interrupt(
            f"Low confidence ({result['confidence']:.0%}). "
            f"Verify: {result['action']}? (yes/no)"
        )
        if verification.lower() != "yes":
            # Try alternative approach
            result = fallback_reasoning(state)

    return result

Testing HiL Workflows

Test your HiL implementations thoroughly:

import pytest

@pytest.mark.asyncio
async def test_approval_workflow():
    config = {"configurable": {"thread_id": "test-1"}}

    # Start workflow
    result = await app.ainvoke({"action": "refund"}, config)
    assert result["status"] == "awaiting_approval"

    # Simulate human approval
    result = await app.ainvoke(
        Command(resume="approve"),
        config
    )
    assert result["status"] == "completed"

@pytest.mark.asyncio
async def test_rejection_workflow():
    config = {"configurable": {"thread_id": "test-2"}}

    # Start and reject
    await app.ainvoke({"action": "refund"}, config)
    result = await app.ainvoke(
        Command(resume="reject"),
        config
    )
    assert result["status"] == "rejected"

Summary: Building the Future of Intelligent Agents

Human in the Loop is a vital concept for building robust, reliable AI agents. It moves agents beyond simple, linear execution, allowing them to dynamically seek external input when uncertainty, cost, or missing information arises.

Key Takeaways:

HiL prevents costly mistakes by adding human checkpoints at critical moments
LangGraph's interrupt function provides the foundation for pausable workflows
Checkpoints and persistence enable workflows to resume exactly where they left off
Three main patterns: Approve/Reject, Review/Edit, and Provide Context
Best practices: Be strategic, provide context, handle timeouts, and log decisions

By utilizing frameworks like LangGraph, engineers can harness the power of checkpoints and the interrupt function to seamlessly integrate human feedback—whether asking for a simple location, gathering research insights, or reviewing critical tool execution.

This ability to pause, interact, and resume is the key to creating intelligent systems that operate with transparency, accuracy, and trustworthiness.

Mastering Human in the Loop: Building Powerful, Reasoning AI Agents