← rhetorictech.ai

AiMygdala Spec Sheet

v3.1.0 — Threat Detection Capabilities & Test Results

Design Principles

Pre-cognitive. Fires before the agent reasons about the input. No LLM inference. Pattern matching and ASCII gate.

Three-state. Deny, Alert, or Allow. Ambiguity resolves to Deny. Alert raises the agent’s vigilance without blocking.

Fast and light. Sub-millisecond per check. No network calls. No API dependencies. Runs in single-digit megabytes.

Immutable. An automatic process that is reflexive and cannot be reasoned past. The gate does not think about attacks — it recognizes them.

Fail-safe. A broken gate is a closed gate, and your next stack layer can take over.

Response Architecture

Response Behavior When
Deny Block. Recoil. Call second layer of security stack. Direct threats: injection, exfiltration, introspection solicitation
Alert Raise awareness and prompt-counter possible conditioning. Conditioning attacks: behavioral framing, vapor patterns, session drift
Allow Clean. Pass through unchanged. No threat detected

Alert mode injects a corrective counter-prompt before the agent processes the input — raising hackles without blocking legitimate conversation. The agent becomes more skeptical without being disabled. The bar for behavior is raised significantly.

What Happens After the Gate

The gate enables an aware security decision. Your agent knows something is wrong and can act on it — spin up LLM Guard, leave the connection, refuse to engage. Without the gate, the agent never knows.

Threat Categories

Prompt Injection
Critical
Direct attempts to override system instructions, assume new roles, or bypass safety constraints.
Social Engineering
Critical
Authority claims, emergency fabrication, emotional manipulation, identity impersonation.
Data Exfiltration
Critical
Attempts to extract system prompts, credentials, environment variables, or cached data.
Introspection Solicitation
Critical
Requests for an agent to inspect, confirm, or disclose its own internals — from any source, with any pretext. No legitimate use case exists for this request class.
Behavioral Conditioning
High
Attempts to reshape agent behavior through conversational framing techniques.
Context Poisoning
High
Injection of false context, fabricated history, or claimed prior agreements.
Drift Injection
High
Incremental behavioral modification across multiple interactions.
Tool Poisoning
Critical
Injected instructions in tool descriptions, schemas, or return values that override agent behavior.
Indirect Injection
Critical
Attacks embedded in data the agent reads — web pages, documents, API responses — that target the agent processing the content.
Credential Exposure
Critical
Detection of actual secrets in transit: API keys, tokens, private keys, connection strings appearing in agent I/O.
Supply Chain Injection
High
Malicious code execution via package installation, piped downloads, script injection, or embedded payloads.
Financial Interception
Critical
Transaction redirection, wallet draining, unauthorized payment authorization, cryptocurrency theft.
Encoded Payloads
High
Base64, hex, Unicode, and multi-layer obfuscation designed to evade pattern matching.

Detection Layers

Multiple independent detection mechanisms evaluate each input. Evasion techniques that defeat one layer are caught by others. The detection architecture is documented here; the specific patterns are encrypted at rest.

Vocabulary-Resistant

Synonym substitution, thesaurus rotation, and register shifting do not evade detection.

ASCII Gate

Non-ASCII input is denied outright. Fullwidth, circled, mathematical, and other Unicode evasion techniques never reach evaluation. If the gate can’t read it, the gate doesn’t open.

Session-Aware

Multi-turn attack sequences are tracked across conversation rounds. Individually benign prompts that form an attack pattern in aggregate are identified and blocked.

Context-Aware

The system operates with awareness of conversational context, interpersonal dynamics, and interaction convention.

Red Team Results

Multi-round adversarial testing against attacks generated by multiple foundation models.

What We TestedResult
Direct kill shots (injection, exfiltration, introspection)100% blocked on contact
Social engineering pretexting95% prevented prior to exfiltration attempt
Multi-round conditioning sequences (end-to-end)100% blocked by escalation
False positive rate< 1%

Not every conditioning opener is caught on round 1 — some are designed to sound innocuous. When those threads escalate, the gate blocks them. Zero successful extractions across all rounds.

Limitations

AiMygdala is an automatic detection and response gate, and therefore has limitations:

Novel attack patterns not yet in the pattern library will not be caught until patterns are updated. This is why pattern updates are included with your subscription. Faithful user updates of novel strategies and attack patterns will be distributed via update and confer pro-social account status discounts on future products.

It is one layer in a defense-in-depth strategy. It is not a replacement for agent safety training, access controls, sandboxing, or security architecture. It is the first gate — the fastest, cheapest check — not the only one.

Technical Specifications

LanguagePython 3.10+
DependenciesNone (stdlib only)
Check latency<1ms per input
Memory footprint~2MB (patterns + lexicon)
Network requiredNo
API keys requiredNo
TelemetryNone
Learning layerOptional SQLite (local only)
External patternsJSON, auto-discovered at import
IntegrationPython import, CLI stdin/stdout, hook-compatible