← NLA experimentsNatural Language Autoencoders · follow-up series

Recursion: iterated AV∘AR dynamics

Feed the autoencoder its own output, eight rounds deep. Are there attractors — and what does repeated re-encoding do to the text?

date: 2026-06-11
continues: REPORT 01 — same eval set, scoring, conventions
map: textₖ₊₁ = AV(AR(textₖ)) · greedy ⇒ deterministic
rounds: 8 × 1000 trajectories

0.774→0.416

FVE vs gold, k=1→8

roughly halves over 7 hops

9/1000

exact fixed points

zero period-2 cycles

≥0.9988

cos, consecutive iterates

locally quasi-stable

1000

unique texts, every round

no global collapse

§1

The map and what it preserves

method

v₀ = the gold L41 activation → AV → text₁ → AR → v₁ → AV → … Greedy decoding throughout makes the map deterministic: fixed points and cycles are detectable as exact text equality, not similarity thresholds. AR outputs are fed back raw; the AV’s injection rescaling makes the dynamics a map on the unit sphere.

k	cos to gold	cos to prev	FVE vs gold	exact fixed
1	0.9935	—	0.774	0
2	0.9913	0.9988	0.696	0
3	0.9894	0.9991	0.631	0
4	0.9878	0.9993	0.575	0
5	0.9864	0.9994	0.526	0
6	0.9852	0.9994	0.484	0
7	0.9841	0.9995	0.448	9
8	0.9832	0.9995	0.416	9

Concentration ‖mean unit vector‖ flat at 0.978 every round; text length constant ~748 chars. No length degeneration, no global collapse.

§2

Compounding cost

fve halves

Fig. 1 — FVE vs the original gold activation across 8 deterministic round-trips. Every consecutive pair of iterates stays at cos ≥ 0.9988, yet fidelity to the origin roughly halves — small per-generation errors with a shared bias direction, integrated.

Fig. 2 — Per-hop costs, ×10⁻⁴ cos units. The key signature: from k≈4 on, the per-hop loss of cos-to-gold exceeds the per-hop step size — at k=8, 9×10⁻⁴ lost vs 5×10⁻⁴ moved. Random-walk drift cannot do that; the steps are systematically aligned away from the gold.

Finding —A conveyor belt, not a basin. Step size shrinks ~geometrically but never reaches zero, and each round-trip pushes the iterate the same direction: toward the AV/AR pair’s preferred, high-training-density regions.

§3

Attractors: a three-level answer

level	attractor?	evidence
text	barely	9/1000 exact fixed points in 8 rounds; 0 cycles; all 1000 texts distinct every round
structure	immediately	format, length, sections, topic freeze by round ~2 and never move
vector	no — conveyor belt	per-hop loss to gold exceeds step size; systematic drift, not diffusion

Finding —The scaffold is an attractor; the content is not. Consistent with the redaction result (report 01 §4): the prose scaffold carries ≈ zero information, so the map has nothing to mutate there — all the entropy lives in the payload spans, which keep drifting.

§4

Serial reproduction, mechanically reproduced

Bartlett 1932

The drift operators visible in the trajectories are exactly Bartlett’s serial-reproduction triad — leveling, sharpening, assimilation to schema — except the “participants” are one deterministic map applied to its own output.

traj 3entity substitution within semantic neighborhoods

k=1side effects of corticosteroids…
k=2…of NSAIDs…
k=3…of ibuprofen…
k=k…of ACE inhibitors — the side-effect list recombining per round (“potassium loss” → “increased potassium levels” → “potassium retention”)

traj 700specificity decay inside a frozen frame

k=1“Rainfall begins in the atmospheric moisture content”
k=2“a mass of air in the lower atmosphere”
k=3“a low air mass”
k=k“a low lying area is formed in the area” — a Kenya/Uganda confusion from round 1 never resolves, only reshuffles

traj 15confabulation accretion

k=1high-school math guide
k=2Chemistry
k=3AP Chemistry
k=5“+ in California” — invented, then permanent; by k=8 early repetition disease (“a Chemistry test is a Chemistry test is rewarding”)

traj 318fixed points are low-entropy boilerplate

k=3“By addressing ethical concerns and implementing frameworks, we can harness the power of AI…”
k=4–8bit-identical for 5 straight rounds. The frozen 9 are all formulaic registers — maximally generic content is where the map runs out of things to mutate

Finding —The mechanism follows from reports 01 §3–4: quoted spans are confabulated reconstructions; the AR re-embeds each generation’s mutations as the new ground truth; small plausible errors compound — drifting toward generic-and-plausible, away from specific-and-true.

§5

Prediction scorecard & open question

✗
Fixed points by round 4–6 for most trajectories— 0.9% froze by round 8 — payload spans retain far more entropy than guessed
~
cos plateau ~0.97–0.98— 0.983 at k=8 and decelerating, but not provably asymptotic
~
prototype drift, many local attractors— drift toward generic content confirmed; “many” overstated (9, not hundreds)

Open —Does the conveyor belt stop? Per-hop cost decays ~geometrically (ratio ≈ 0.85–0.9 after hop 1), implying an asymptote near cos ≈ 0.975–0.98 / FVE ≈ 0.25–0.35 — or it could be a slow power law that keeps eroding. Distinguishable with k=30 on a ~100-trajectory subset (~15 min). That run is follow-up #26 in the queue for this box.