01.5 — Project Deep Dive

Cultivated Learning

Complete Cognitive Architecture · AI Research

Longitudinal Behavioral Development in Frozen Language Models

Through Memory-Augmented Reflective Interaction

The model is the organism. The human is the gardener.
Traditional training changes the brain. Cultivated Learning changes the environment the brain operates in.

PyTorch Mistral 7B ChromaDB Sentence Transformers Recursive Reflection Memory Systems

View on GitHub →

The Thesis

Every LLM chatbot forgets you the moment the conversation ends. Memory systems exist — but they treat persistence as a feature bolted onto the side. Cultivated Learning inverts the entire priority: accumulated knowledge about you outranks the conversation you're currently having.

The system wraps a frozen Mistral 7B in a stateful cognitive shell. Persistent memory with semantic search. Dynamic context assembly that budgets tokens like a finite resource. Recursive self-reflection that generates behavioral directives. Human feedback translated into salience adjustments. The model never changes — not one weight. But its behavior evolves through memory alone.

The question this project asks is precise: can inference-time architecture alone produce measurable behavioral development over extended interaction?

System Architecture

Interaction Layer

User ↔ System Interface · Gradio UI

│ │ │ │

Memory Store

Semantic vectors + metadata · ChromaDB · Salience decay

Context Assembler

Token-budgeted prompt packing · Memory over history

Reflection Engine

4-depth recursive self-analysis · Directive generation

Feedback Integrator

Human signal → salience adjustments · Corrections

│ │ │ │

Base LLM

Mistral 7B Instruct v0.3 · Frozen · Stateless Inference

Core Systems

Memory

Semantic Memory Store

ChromaDB-backed vector store with four memory types — episodic, semantic, procedural, reflective. Blended retrieval ranks by 60% semantic similarity + 40% salience score. Exponential decay ensures stale memories fade while reinforced knowledge persists.

Context

Context Assembler

Token-budgeted prompt packing that allocates context like a finite resource. Behavioral directives first, then retrieved memories, then conversation history with remaining space. Memories outrank recency — always.

Reflection

Reflection Engine

Post-interaction self-analysis at four depths. D0: factual evaluation. D1: pattern recognition across interactions. D2: prescriptive directive generation. D3: meta-coherence check for contradictions. Each depth feeds the next.

Persistence

Consolidation & Cold Storage

Fading episodic memories get distilled into durable semantic memories through LLM-powered consolidation. Below the salience floor, memories archive to cold storage with semantic resurfacing on strong query match. Nothing is permanently lost.

Memory Taxonomy

Type	Purpose	Example
Episodic	Raw interaction records	"User asked about ML basics on Feb 14"
Semantic	Distilled facts and preferences	"User prefers concise, direct answers"
Procedural	Behavioral directives	"Ground explanations in practical examples"
Reflective	Self-analysis observations	"Technical responses tend toward verbosity"

Design Philosophy

Most chatbots drop old memories to keep recent turns. This system does the opposite — it will forget what was just said before it forgets what it has learned about you. Accumulated knowledge is more valuable than short-term context for longitudinal development. The context assembler enforces this at the token level: memories get budget priority, conversation history fills whatever space remains.

Tech Stack

Base Model

Mistral 7B
Instruct v0.3

Embeddings

all-MiniLM-L6-v2
384-dim

Vector DB

ChromaDB
Cosine Space

GPU

RTX 5090
34 GB VRAM

Framework

PyTorch 2.9.1
Transformers 4.57

Gradio
Tabbed Interface

Environment

Docker
Ubuntu 24.04

Language

Python
73% Jupyter / 27% Py

Research Questions

Can a frozen model exhibit measurable behavioral evolution through inference-time architecture alone?

Does the reflection engine produce better outcomes than feedback alone?

Where does inference-time learning hit its ceiling, and what causes it?

Does memory consolidation improve retrieval quality over time?

Project Roadmap

Phase 1

Memory + Context + Interaction Loop

Phase 1.5

Dedicated Embeddings + Blended Retrieval + Gradio UI

Phase 2

Reflection Engine

Phase 2

Memory Consolidation

Phase 2

Cold Storage

Phase 3

Longitudinal Evaluation — 100+ Interactions

Phase 3

A/B Testing — Framework vs. Vanilla Model

Open Source · MIT License

The code is the argument.

View Repository → ← All Projects