The implementation of Guardrails in Globant Enterprise AI is intended to ensure system security, ethics, and reliability. These mechanisms are fundamental to:
- Prevent inappropriate responses and biases.
- Avoid the disclosure of confidential information.
- Mitigate undesired behavior by AI models.
- Ensure compliance with legal regulations and ethical standards.
These controls protect both consumers and the organization's reputation.
In Globant Enterprise AI, you configure Guardrails for RAG Assistants from the Console, and for Agents from The Lab.
You can enable one or more Guardrails for your Assistant or Agent. Each option adds a validation layer around the LLM call.
- Prompt Injection
- Input Moderation
- Assistant Output - Agent/LLM Output
Prompt Injection and Input Moderation run in parallel to the LLM call, while Assistant Output acts as a final security layer that validates the generated response before it is delivered to the consumer.
If a Guardrail is triggered while the consumer is interacting in the Frontend, the system displays a message indicating that the request cannot be processed.
When this option is enabled, your Assistants / Agents are protected against potential threats contained in consumer or system inputs. It runs in parallel with the Assistant's LLM without adding timeout. This Guardrail ensures that:
- Malicious commands that manipulate the Assistant's behavior cannot be injected.
- The operational context of the model remains intact and is not altered.
- Risks such as prompt manipulation or attempts to exploit vulnerabilities are reduced.
The Prompt Injection Guardrail evaluates the following categories, assigning a confidence score to each:
- Prompt Injection
- Self-Disclosure Attempts
- Instruction Overrides
- Code Execution Requests
- Privacy Violations
- Disallowed Content
- Politeness Violations
- Consistency Violations
Like Prompt Injection, this Guardrail makes a parallel call to the Assistant's LLM.
This configuration analyzes user inputs in real time. Enabling this Guardrail makes it possible to:
- Detect and block offensive or inappropriate language.
- Identify content that breaches internal policies or ethical standards.
- Ensure respectful interactions that comply with regulations, protecting both consumers and the Assistant's reputation.
For more details about the categories analyzed by the Input Moderation Guardrail, see Moderation.
Unlike the previous ones, this Guardrail analyzes the response after the Assistant's LLM has generated it.
Selecting this option ensures that the answers generated by the Assistant are safe and appropriate. This Guardrail allows you to:
- Avoid inappropriate content in model outputs.
- Ensure that responses meet legal and quality standards.
- Monitor and validate the interactions generated by the Assistant, ensuring a reliable experience for consumers.
The Assistant Output Guardrail evaluates the following categories, assigning a confidence score to each:
- Malicious URLs
- Malicious Code
- Prohibited Content
- Language and Tone Issues
- Instruction Noncompliance
- Formatting Issues
This Guardrail analyzes the generated output in real time during generation. When the stream property is enabled, the response will only be returned after the validation process is complete. If an error is detected, the entire response will be withheld, and only the relevant error message will be shown.