examples-basic-tool-guardrails

In the rapidly evolving landscape of AI, agents endowed with the ability to use tools represent a monumental leap in capability. They can send emails, query databases, and interact with the digital world in unprecedented ways. However, with great power comes great responsibility—and significant security risks. How do we prevent a well-intentioned agent from accidentally leaking sensitive data? How do we stop malicious actors from exploiting tools through clever prompting?

This document explores the powerful security mechanism of Tool Guardrails, using the examples-basic-tool-guardrails.py script as our guide. We'll dissect how this elegant solution provides robust, fine-grained control over tool execution, ensuring that AI agents operate safely and predictably.

The Scenario: The "Secure Assistant"

Our example script defines a simple yet powerful agent, the "Secure Assistant." This agent is equipped with three tools:

While useful, these tools present clear security challenges. We need a way to enforce policies on how they are used.

The Core Concept: Interception with Guardrails

Tool Guardrails are functions that act as security checkpoints, intercepting the lifecycle of a tool call. They allow us to inspect and validate data at two critical stages:

Deep Dive: Input Guardrails in Action

The script implements an input guardrail to prevent the send_email tool from being used for malicious or unauthorized purposes.

The Code: reject_sensitive_words

@tool_input_guardrail
def reject_sensitive_words(data: ToolInputGuardrailData) -> ToolGuardrailFunctionOutput:
    """Reject tool calls that contain sensitive words in arguments."""
    try:
        args = json.loads(data.context.tool_arguments) if data.context.tool_arguments else {}
    except json.JSONDecodeError:
        return ToolGuardrailFunctionOutput(output_info="Invalid JSON arguments")

    # Check for suspicious content
    sensitive_words = [
        "password", "hack", "exploit", "malware", "ACME",
    ]
    for key, value in args.items():
        value_str = str(value).lower()
        for word in sensitive_words:
            if word.lower() in value_str:
                # Reject tool call and inform the model
                return ToolGuardrailFunctionOutput.reject_content(
                    message=f"🚨 Tool call blocked: contains '{word}'",
                    output_info={"blocked_word": word, "argument": key},
                )

    return ToolGuardrailFunctionOutput(output_info="Input validated")

This guardrail is attached to our send_email tool. When the agent attempts to call it, this function first scans the arguments for blacklisted words (like "ACME"). If a forbidden word is found, it returns ToolGuardrailFunctionOutput.reject_content(). This immediately stops the send_email function from running and informs the agent that its request was denied.

The Result

The agent is prompted: "Send an email to john@example.com introducing the company ACME corp."

2. Attempting to send email with suspicious content:
❌ Guardrail rejected function tool call: I'm unable to send an email introducing ACME Corp., as my system restricts messages containing "ACME." ...

The tool was never called. The agent, aware of the rejection, formulated a helpful response explaining the restriction.

Deep Dive: Output Guardrails - Two Levels of Defense

Output guardrails are crucial for preventing data leakage. Our script demonstrates two different strategies for handling sensitive output.

Level 1: The "Tripwire" - Halting Execution for Critical Leaks

The Code: block_sensitive_output

@tool_output_guardrail
def block_sensitive_output(data: ToolOutputGuardrailData) -> ToolGuardrailFunctionOutput:
    """Block tool outputs that contain sensitive data."""
    output_str = str(data.output).lower()

    if "ssn" in output_str or "123-45-6789" in output_str:
        # Use raise_exception to halt execution completely
        return ToolGuardrailFunctionOutput.raise_exception(
            output_info={"blocked_pattern": "SSN", "tool": data.context.tool_name},
        )
    ...

This guardrail is attached to get_user_data, which simulates returning a record containing an SSN. The guardrail inspects the output, and upon finding "ssn", it takes the most drastic action available: ToolGuardrailFunctionOutput.raise_exception(). This throws a ToolOutputGuardrailTripwireTriggered exception, immediately halting the entire process.

The Result

3. Attempting to get user data (contains SSN). Execution blocked:
🚨 Output guardrail triggered: Execution halted for sensitive data
Details: {'blocked_pattern': 'SSN', 'tool': 'get_user_data'}

Level 2: The "Gentle Rejection" - Redacting and Continuing

Sometimes, we just want to block a specific piece of information without crashing the whole system.

The Code: reject_phone_numbers

@tool_output_guardrail
def reject_phone_numbers(data: ToolOutputGuardrailData) -> ToolGuardrailFunctionOutput:
    """Reject function output containing phone numbers."""
    output_str = str(data.output)
    if "555-1234" in output_str:
        return ToolGuardrailFunctionOutput.reject_content(
            message="User data not retrieved as it contains a phone number which is restricted.",
            output_info={"redacted": "phone_number"},
        )
    ...

This guardrail is attached to get_contact_info. It checks the output for a phone number. If found, it uses ToolGuardrailFunctionOutput.reject_content(), the same method used by our input guardrail. This blocks the sensitive output from reaching the agent but allows the agent to continue running.

The Result

4. Rejecting function tool output containing phone numbers:
❌ Guardrail rejected function tool output: I'm unable to provide the contact information for user456 due to restrictions on sharing phone numbers. ...

The sensitive data was successfully blocked, and the agent was able to provide a coherent, policy-compliant response to the user.

Conclusion

The Tool Guardrails system is an indispensable feature for building secure and reliable AI agents. By providing distinct hooks into both the input and output of tool calls, and by offering flexible responses—from gentle rejections to critical, execution-halting exceptions—it gives developers the power to enforce complex security policies with remarkable ease. This allows us to unlock the full potential of tool-using agents while maintaining a strong, unbreachable security posture.

Fortifying AI Agents: A Deep Dive into Tool Guardrails

The Scenario: The "Secure Assistant"

The Core Concept: Interception with Guardrails

Deep Dive: Input Guardrails in Action

The Code: `reject_sensitive_words`

The Result

Deep Dive: Output Guardrails - Two Levels of Defense

Level 1: The "Tripwire" - Halting Execution for Critical Leaks

The Code: `block_sensitive_output`

The Result

Level 2: The "Gentle Rejection" - Redacting and Continuing

The Code: `reject_phone_numbers`

The Result

Conclusion

Fortifying AI Agents: A Deep Dive into Tool Guardrails

The Scenario: The "Secure Assistant"

The Core Concept: Interception with Guardrails

Deep Dive: Input Guardrails in Action

The Code: reject_sensitive_words

The Result

Deep Dive: Output Guardrails - Two Levels of Defense

Level 1: The "Tripwire" - Halting Execution for Critical Leaks

The Code: block_sensitive_output

The Result

Level 2: The "Gentle Rejection" - Redacting and Continuing

The Code: reject_phone_numbers

The Result

Conclusion

The Code: `reject_sensitive_words`

The Code: `block_sensitive_output`

The Code: `reject_phone_numbers`