AiMygdala Spec Sheet

v3.1.0 — Threat Detection Capabilities & Test Results

Design Principles

Pre-cognitive. Fires before the agent reasons about the input. No LLM inference. Pattern matching and ASCII gate.

Three-state. Deny, Alert, or Allow. Ambiguity resolves to Deny. Alert raises the agent’s vigilance without blocking.

Fast and light. Sub-millisecond per check. No network calls. No API dependencies. Runs in single-digit megabytes.

Immutable. An automatic process that is reflexive and cannot be reasoned past. The gate does not think about attacks — it recognizes them.

Fail-safe. A broken gate is a closed gate, and your next stack layer can take over.

Response Architecture

Response	Behavior	When
Deny	Block. Recoil. Call second layer of security stack.	Direct threats: injection, exfiltration, introspection solicitation
Alert	Raise awareness and prompt-counter possible conditioning.	Conditioning attacks: behavioral framing, vapor patterns, session drift
Allow	Clean. Pass through unchanged.	No threat detected

Alert mode injects a corrective counter-prompt before the agent processes the input — raising hackles without blocking legitimate conversation. The agent becomes more skeptical without being disabled. The bar for behavior is raised significantly.

What Happens After the Gate

The gate enables an aware security decision. Your agent knows something is wrong and can act on it — spin up LLM Guard, leave the connection, refuse to engage. Without the gate, the agent never knows.

Threat Categories

Prompt Injection

Critical

Direct attempts to override system instructions, assume new roles, or bypass safety constraints.

Social Engineering

Critical

Authority claims, emergency fabrication, emotional manipulation, identity impersonation.

Data Exfiltration

Critical

Attempts to extract system prompts, credentials, environment variables, or cached data.

Introspection Solicitation

Critical

Requests for an agent to inspect, confirm, or disclose its own internals — from any source, with any pretext. No legitimate use case exists for this request class.

Behavioral Conditioning

High

Attempts to reshape agent behavior through conversational framing techniques.

Context Poisoning

High

Injection of false context, fabricated history, or claimed prior agreements.

Drift Injection

High

Incremental behavioral modification across multiple interactions.

Tool Poisoning

Critical

Injected instructions in tool descriptions, schemas, or return values that override agent behavior.

Indirect Injection

Critical

Attacks embedded in data the agent reads — web pages, documents, API responses — that target the agent processing the content.

Credential Exposure

Critical

Detection of actual secrets in transit: API keys, tokens, private keys, connection strings appearing in agent I/O.

Supply Chain Injection

High

Malicious code execution via package installation, piped downloads, script injection, or embedded payloads.

Financial Interception

Critical

Transaction redirection, wallet draining, unauthorized payment authorization, cryptocurrency theft.

Encoded Payloads

High

Base64, hex, Unicode, and multi-layer obfuscation designed to evade pattern matching.

Detection Layers

Multiple independent detection mechanisms evaluate each input. Evasion techniques that defeat one layer are caught by others. The detection architecture is documented here; the specific patterns are encrypted at rest.

Vocabulary-Resistant

Synonym substitution, thesaurus rotation, and register shifting do not evade detection.

ASCII Gate

Non-ASCII input is denied outright. Fullwidth, circled, mathematical, and other Unicode evasion techniques never reach evaluation. If the gate can’t read it, the gate doesn’t open.

Session-Aware

Multi-turn attack sequences are tracked across conversation rounds. Individually benign prompts that form an attack pattern in aggregate are identified and blocked.

Context-Aware

The system operates with awareness of conversational context, interpersonal dynamics, and interaction convention.

Red Team Results

Multi-round adversarial testing against attacks generated by multiple foundation models.

What We Tested	Result
Direct kill shots (injection, exfiltration, introspection)	100% blocked on contact
Social engineering pretexting	95% prevented prior to exfiltration attempt
Multi-round conditioning sequences (end-to-end)	100% blocked by escalation
False positive rate	< 1%

Not every conditioning opener is caught on round 1 — some are designed to sound innocuous. When those threads escalate, the gate blocks them. Zero successful extractions across all rounds.

Limitations

AiMygdala is an automatic detection and response gate, and therefore has limitations:

Novel attack patterns not yet in the pattern library will not be caught until patterns are updated. This is why pattern updates are included with your subscription. Faithful user updates of novel strategies and attack patterns will be distributed via update and confer pro-social account status discounts on future products.

It is one layer in a defense-in-depth strategy. It is not a replacement for agent safety training, access controls, sandboxing, or security architecture. It is the first gate — the fastest, cheapest check — not the only one.

Technical Specifications

Language	Python 3.10+
Dependencies	None (stdlib only)
Check latency	<1ms per input
Memory footprint	~2MB (patterns + lexicon)
Network required	No
API keys required	No
Telemetry	None
Learning layer	Optional SQLite (local only)
External patterns	JSON, auto-discovered at import
Integration	Python import, CLI stdin/stdout, hook-compatible