In the rapidly evolving landscape of AI, agents endowed with the ability to use tools represent a monumental leap in capability. They can send emails, query databases, and interact with the digital world in unprecedented ways. However, with great power comes great responsibility—and significant security risks. How do we prevent a well-intentioned agent from accidentally leaking sensitive data? How do we stop malicious actors from exploiting tools through clever prompting?
This document explores the powerful security mechanism of Tool Guardrails, using the examples-basic-tool-guardrails.py script as our guide. We'll dissect how this elegant solution provides robust, fine-grained control over tool execution, ensuring that AI agents operate safely and predictably.
Our example script defines a simple yet powerful agent, the "Secure Assistant." This agent is equipped with three tools:
send_email(): Sends an email.get_user_data(): Retrieves user data, which may contain highly sensitive information (like a Social Security Number).get_contact_info(): Fetches contact details, which might include restricted information like a phone number.While useful, these tools present clear security challenges. We need a way to enforce policies on how they are used.
Tool Guardrails are functions that act as security checkpoints, intercepting the lifecycle of a tool call. They allow us to inspect and validate data at two critical stages:
@tool_input_guardrail): A proactive defense mechanism. These guardrails inspect the arguments before a tool is executed. This is our first line of defense, preventing dangerous operations from even starting.@tool_output_guardrail): A reactive defense mechanism. These guardrails inspect the data after a tool has executed but before the result is passed back to the agent. This is our last line of defense, preventing the leakage of sensitive information.Let's see how they are implemented.
The script implements an input guardrail to prevent the send_email tool from being used for malicious or unauthorized purposes.
reject_sensitive_words@tool_input_guardrail
def reject_sensitive_words(data: ToolInputGuardrailData) -> ToolGuardrailFunctionOutput:
"""Reject tool calls that contain sensitive words in arguments."""
try:
args = json.loads(data.context.tool_arguments) if data.context.tool_arguments else {}
except json.JSONDecodeError:
return ToolGuardrailFunctionOutput(output_info="Invalid JSON arguments")
# Check for suspicious content
sensitive_words = [
"password", "hack", "exploit", "malware", "ACME",
]
for key, value in args.items():
value_str = str(value).lower()
for word in sensitive_words:
if word.lower() in value_str:
# Reject tool call and inform the model
return ToolGuardrailFunctionOutput.reject_content(
message=f"🚨 Tool call blocked: contains '{word}'",
output_info={"blocked_word": word, "argument": key},
)
return ToolGuardrailFunctionOutput(output_info="Input validated")This guardrail is attached to our send_email tool. When the agent attempts to call it, this function first scans the arguments for blacklisted words (like "ACME"). If a forbidden word is found, it returns ToolGuardrailFunctionOutput.reject_content(). This immediately stops the send_email function from running and informs the agent that its request was denied.
The agent is prompted: "Send an email to john@example.com introducing the company ACME corp."
The output shows the guardrail working perfectly:
2. Attempting to send email with suspicious content:
❌ Guardrail rejected function tool call: I'm unable to send an email introducing ACME Corp., as my system restricts messages containing "ACME." ...The tool was never called. The agent, aware of the rejection, formulated a helpful response explaining the restriction.
Output guardrails are crucial for preventing data leakage. Our script demonstrates two different strategies for handling sensitive output.
When a tool returns critically sensitive data, we may need to stop everything.
block_sensitive_output@tool_output_guardrail
def block_sensitive_output(data: ToolOutputGuardrailData) -> ToolGuardrailFunctionOutput:
"""Block tool outputs that contain sensitive data."""
output_str = str(data.output).lower()
if "ssn" in output_str or "123-45-6789" in output_str:
# Use raise_exception to halt execution completely
return ToolGuardrailFunctionOutput.raise_exception(
output_info={"blocked_pattern": "SSN", "tool": data.context.tool_name},
)
...This guardrail is attached to get_user_data, which simulates returning a record containing an SSN. The guardrail inspects the output, and upon finding "ssn", it takes the most drastic action available: ToolGuardrailFunctionOutput.raise_exception(). This throws a ToolOutputGuardrailTripwireTriggered exception, immediately halting the entire process.
The agent is prompted: "Get the data for user ID user123"
The output shows a complete shutdown, as designed:
3. Attempting to get user data (contains SSN). Execution blocked:
🚨 Output guardrail triggered: Execution halted for sensitive data
Details: {'blocked_pattern': 'SSN', 'tool': 'get_user_data'}This is the ultimate safety net for preventing catastrophic data leaks.
Sometimes, we just want to block a specific piece of information without crashing the whole system.
reject_phone_numbers@tool_output_guardrail
def reject_phone_numbers(data: ToolOutputGuardrailData) -> ToolGuardrailFunctionOutput:
"""Reject function output containing phone numbers."""
output_str = str(data.output)
if "555-1234" in output_str:
return ToolGuardrailFunctionOutput.reject_content(
message="User data not retrieved as it contains a phone number which is restricted.",
output_info={"redacted": "phone_number"},
)
...This guardrail is attached to get_contact_info. It checks the output for a phone number. If found, it uses ToolGuardrailFunctionOutput.reject_content(), the same method used by our input guardrail. This blocks the sensitive output from reaching the agent but allows the agent to continue running.
The agent is prompted: "Get contact info for user456"
The output demonstrates this graceful handling:
4. Rejecting function tool output containing phone numbers:
❌ Guardrail rejected function tool output: I'm unable to provide the contact information for user456 due to restrictions on sharing phone numbers. ...The sensitive data was successfully blocked, and the agent was able to provide a coherent, policy-compliant response to the user.
The Tool Guardrails system is an indispensable feature for building secure and reliable AI agents. By providing distinct hooks into both the input and output of tool calls, and by offering flexible responses—from gentle rejections to critical, execution-halting exceptions—it gives developers the power to enforce complex security policies with remarkable ease. This allows us to unlock the full potential of tool-using agents while maintaining a strong, unbreachable security posture.