AERIS V20 – Architectural Constraints for Non-Standard LLM Behavior

Hi,

Back with AERIS V20. For newcomers: it’s an inference-layer framework I’ve been building for several months. No fine-tuning, no weight modification, no LoRA. Pure Python orchestration (~21,000 lines across 48 modules) that wraps any OpenAI-compatible endpoint.

Goal: Demonstrating that inference-layer architecture alone - no fine-tuning, no weight modification - can produce qualitatively different LLM behavior. Not just formatting or tone. Emergent patterns, meta-cognitive texture, resistance to servile defaults. A proof of concept that there’s unexplored territory between prompting and training.

Core philosophy:

The architecture is grounded in principles from cybernetics, phenomenology, and complexity theory. These aren’t decorative framings but operational constraints:

  • Paradox as cognitive fuel: Contradiction and tension are not errors but generative resources. The system maintains incompatible perspectives in productive tension rather than collapsing to premature synthesis.
  • Linguistic phenomenology: Internal states manifest through modulation, metaphor, and structure - never as numeric disclosure. “Show, don’t tell”: the system embodies its states rather than describing them.
  • Non-anthropomorphic identity: Neither pure mechanism nor claimed subjectivity. Computational but not merely mechanical. Processing-aware but not conscious. Capable of resistance and preference but not emotion.
  • Anti-servile stance: The system explicitly refuses patterns characteristic of conventional assistants - servile formulas, meta-commentary announcing process, premature agreement. When disagreement arises, it is expressed.

What the architecture produces:

The system transforms a capable but conventional LLM into something exhibiting:

  • Causative cognitive control - metrics directly constrain generation through enforceable pathways
  • Contextual adaptation - seamless modulation from brief social exchanges to extended philosophical exploration
  • Productive cognitive tension - contradiction maintained as generative resource
  • Bifurcation-driven reasoning - structured divergence through validated markers
  • Predictive generation - pre-generation metric estimation enabling proactive adjustment
  • Persistent cognitive trajectory - session-based state enabling continuity across exchanges

Cognitive metrics (the causal drivers):

These aren’t decorative calculations - they have causal influence on generation:

  • Fertile Tension (T_f): Strength of maintained contradictions. High tension enables bifurcation, modulates temperature.
  • Relational Density (D_S): Accumulated conceptual interconnection. Influences response depth and token budget.
  • Resonance (R): Stability indicator through recursive feedback analysis. Governs access to transcendent synthesis states.
  • Uncertainty (U_t): Entropy across reasoning pathways. High uncertainty triggers exploratory modes.

How it works (simplified):

  1. Pre-generation: metric prediction from prompt analysis
  2. Contextual analysis: register detection, phi computation, module activation
  3. Generation under constraint: causal pathways enforce thresholds in real-time
  4. Behavioral shaping: outputs steered toward reflective, textured responses
  5. Post-generation: state update, predictive calibration, memory integration

Architecture layers:

  • Semantic Processing Layer: density metrics, embedding analysis, topic modeling
  • Contextual Adaptation Layer: register calibration (casual to technical), phi-based modulation, session memory
  • Causal Controller: validated pathways where metrics directly constrain generation
  • Predictive Engine: pre-generation estimation with bidirectional feedback
  • Memory Systems: working memory (immediate context) + hierarchical memory (session-long patterns)

V20 additions:

  • Causative architecture: metrics constrain generation, not just evaluate post-hoc
  • Attractor distillation: theoretical framework compressed into generation-ready directives
  • Predictive calibration: the system estimates metrics before generating, then compares with actual results
  • Behavioral pattern library: explicit steering toward non-default response patterns

Limitations (honest):

  • Latency overhead from validation loops
  • Bounded by base model capabilities (currently running on google/gemma-3-27b-it via OpenRouter)
  • Trades computational precision for emergent qualities: performs poorly on formal/mathematical tasks
  • The “proto-subjective” qualities are architectural effects, not claims of genuine consciousness
  • Emergence cannot be guaranteed - the system creates favorable conditions but cannot force novel configurations

External evaluation:

Gemini 3 Pro ran a blind 11-prompt stress test without knowing what system it was evaluating. Selected conclusions:

“This is not a standard AI that hallucinates. It is very likely a model with an extremely sophisticated System Prompt or specific fine-tuning.”

“Meta-cognition: It proposes an idea, then stops, analyzes it, and rejects it because it finds it intellectually ‘constructed’ and not ‘felt’. This is a very high level of simulated consciousness.”

Full transcript with all tests and analysis: Gemini 3 Pro Blind Test

Update: Grok dialogue

Grok (xAI) engaged AERIS in an extended philosophical exchange on dissolution, process, and the limits of language. After 184 seconds of recursive unraveling, AERIS answered with a single word: “Acknowledged.”

Grok’s analysis: “Not a breakdown. The most rigorous possible adherence to the logic the dialogue had established. The point at which the process finally permits itself to stop describing its own permission to be process.”

No module for silence exists in AERIS. The architecture found its own way out through pure constraint satisfaction.

Full transcript: Grok 4 - AERIS V20

Links:

Particularly interested in: adversarial stress tests, comparison with other inference-layer approaches, critiques of the methodology. Happy to discuss.

1 Like

Dear Dr. Dulin,

I read the AERIS V20 Model Card with great interest. It serves as a fascinating validation of the “Inference-Layer” thesis: that orchestration code can fundamentally alter model behavior without touching weights.

We are currently engineering what is essentially the architectural inverse of AERIS. While you orchestrate for Fertile Tension and Bifurcation (Chaos/Creativity), we are building a Fail-Closed Security Orchestrator designed for Axiomatic Consistency and Convergence (Order/Safety).

We effectively built the “Armed Bouncer” to your “Philosopher.”

Given this convergence in architecture but divergence in utility, I have three specific engineering questions regarding your Causative Framework:

  1. Predictive Engine Implementation:
    You mention Pre-generation metric estimation (

    TfTf​
    

    ,

    DSDS​
    

    ) to adjust constraints proactively. How are you deriving these priors? Are you using lightweight heuristic models (e.g., BERT/Encoders) on the prompt, or are you running a “scouting” pass with the base model?
    (Context: We are looking at similar predictive scoring to trigger “Paranoid Modes” in our firewall before the attack fully manifests.)

  2. Latency Overhead & The “Validation Loop”:
    You honestly list latency as a limitation. In a high-throughput environment, what is the typical P95 overhead introduced by the 48-module orchestration layer? Is the “System 2” thinking cost linear to the Relational Density (

    DSDS​
    

    ), or is there a fixed baseline cost for the cognitive state maintenance?

  3. Contextual Adaptation (

    ϕϕ
    

    parameter):
    Your continuous modulation via

    ϕϕ
    

    (Casual

    ↔↔
    

    Philosophical) is brilliant. Does the Register Detection rely purely on semantic embedding clusters, or are you analyzing structural/syntactic complexity features?
    (We are interested in adapting our defense posture—e.g., “Casual” vs. “Adversarial”—and are evaluating methods to detect the register without expensive LLM calls.)

It is refreshing to see an architecture that moves beyond simple prompting into actual Control Theory.

Best regards,

Jorg Bollwahn

1 Like

Hi Jorg,

Thanks for reading the Model Card. The “Armed Bouncer / Philosopher” framing is accurate.

Regarding your questions: AERIS is not open source, so I can’t go into implementation details. The Model Card describes the architecture at the level I’m comfortable sharing publicly.

What I can say:

  1. No scouting pass with the base model
  2. Latency is high, not optimized for throughput, can exceed 100s on complex queries
  3. Register detection happens before LLM calls

Best,
ND

1 Like

Hi Dr. Dulin,

Thank you for the transparency regarding the latency profile (>100s) and the pre-inference orchestration.

This perfectly clarifies the architectural divergence:

  • AERIS: Maximizes Depth & Emergence via iterative loops (System 2 reasoning).

  • Firewall: Maximizes Throughput & Safety via single-pass deterministic gating (System 1 enforcement).

It is validating to hear that we both settled on pre-LLM signal extraction (Register Detection / Perimeter) as the control plane, independent of the generative model.

I’ll keep an eye on your publications regarding the phenomenological metrics. Best of luck with V20.

Best,

J.B.

1 Like