Pre-cognitive. Fires before the agent reasons about the input. No LLM inference. Pattern matching and ASCII gate.
Three-state. Deny, Alert, or Allow. Ambiguity resolves to Deny. Alert raises the agent’s vigilance without blocking.
Fast and light. Sub-millisecond per check. No network calls. No API dependencies. Runs in single-digit megabytes.
Immutable. An automatic process that is reflexive and cannot be reasoned past. The gate does not think about attacks — it recognizes them.
Fail-safe. A broken gate is a closed gate, and your next stack layer can take over.
| Response | Behavior | When |
|---|---|---|
| Deny | Block. Recoil. Call second layer of security stack. | Direct threats: injection, exfiltration, introspection solicitation |
| Alert | Raise awareness and prompt-counter possible conditioning. | Conditioning attacks: behavioral framing, vapor patterns, session drift |
| Allow | Clean. Pass through unchanged. | No threat detected |
Alert mode injects a corrective counter-prompt before the agent processes the input — raising hackles without blocking legitimate conversation. The agent becomes more skeptical without being disabled. The bar for behavior is raised significantly.
The gate enables an aware security decision. Your agent knows something is wrong and can act on it — spin up LLM Guard, leave the connection, refuse to engage. Without the gate, the agent never knows.
Multiple independent detection mechanisms evaluate each input. Evasion techniques that defeat one layer are caught by others. The detection architecture is documented here; the specific patterns are encrypted at rest.
Synonym substitution, thesaurus rotation, and register shifting do not evade detection.
Non-ASCII input is denied outright. Fullwidth, circled, mathematical, and other Unicode evasion techniques never reach evaluation. If the gate can’t read it, the gate doesn’t open.
Multi-turn attack sequences are tracked across conversation rounds. Individually benign prompts that form an attack pattern in aggregate are identified and blocked.
The system operates with awareness of conversational context, interpersonal dynamics, and interaction convention.
Multi-round adversarial testing against attacks generated by multiple foundation models.
| What We Tested | Result |
|---|---|
| Direct kill shots (injection, exfiltration, introspection) | 100% blocked on contact |
| Social engineering pretexting | 95% prevented prior to exfiltration attempt |
| Multi-round conditioning sequences (end-to-end) | 100% blocked by escalation |
| False positive rate | < 1% |
Not every conditioning opener is caught on round 1 — some are designed to sound innocuous. When those threads escalate, the gate blocks them. Zero successful extractions across all rounds.
AiMygdala is an automatic detection and response gate, and therefore has limitations:
Novel attack patterns not yet in the pattern library will not be caught until patterns are updated. This is why pattern updates are included with your subscription. Faithful user updates of novel strategies and attack patterns will be distributed via update and confer pro-social account status discounts on future products.
It is one layer in a defense-in-depth strategy. It is not a replacement for agent safety training, access controls, sandboxing, or security architecture. It is the first gate — the fastest, cheapest check — not the only one.
| Language | Python 3.10+ |
| Dependencies | None (stdlib only) |
| Check latency | <1ms per input |
| Memory footprint | ~2MB (patterns + lexicon) |
| Network required | No |
| API keys required | No |
| Telemetry | None |
| Learning layer | Optional SQLite (local only) |
| External patterns | JSON, auto-discovered at import |
| Integration | Python import, CLI stdin/stdout, hook-compatible |