Per-activation error = mean of the 10 temp-1.0 sample NMSEs from the best-of-N run (mean-of-10 FVE 0.772 ≈ greedy 0.775). Bucket FVEs use the train denominator 0.0579 as throughout the series. The 200 eval docs (UFW en 100000–100199) were re-streamed and positions re-derived with the repo’s sampler (seed 42, 5/doc, min 50): all 1000 (doc, pos) pairs matched meta_1000.json exactly, and the recomputed variance baseline reproduced (0.0575).
Thirteen features per sample, in four families: gold geometry (atypicality = per-sample predict-the-mean error, whose mean 0.0575 is the FVE denominator — so it is each sample’s own baseline; raw norm; doc-mate cos), token (word-start vs subword, punctuation, digit, length, Zipf frequency, POS via spaCy on the 240-char context tail), document (length, relative position, alpha fraction, line length), and explanation (char length, n quoted spans, quoted fraction).
§1
What correlates with error
Spearman · n=1000
Fig. 1 — Spearman correlation of per-activation NMSE with each feature. Gold-vector geometry dominates (atypicality +0.672, raw norm −0.645, doc-mate cos −0.503); document-position features are noise-level; expl_nspans is not significant.
§2
Atypicality: big gradient, mostly mechanical
own-baseline rescoring
FVE by atypicality quartile under the global denominator falls from 0.861 to 0.651 — apparently a steep difficulty gradient. But each sample’s atypicality is its own predict-the-mean baseline, so far-from-mean samples have more variance to explain by construction. Rescoring each sample against its own baseline (own-FVE = 1 − msei/atypi) nearly flattens it.
FVE, global denominator
Q1
0.861 FVE
Q2
0.814 FVE
Q3
0.762 FVE
Q4
0.651 FVE
FVE, own baseline (1 − mse/atyp)
Q1
0.796 FVE
Q2
0.779 FVE
Q3
0.762 FVE
Q4
0.760 FVE
0.6FVE0.9
Fig. 2 — FVE by atypicality quartile (Q1 most typical → Q4 most atypical). Global denominator: 0.861 / 0.814 / 0.762 / 0.651. Own-baseline: 0.796 / 0.779 / 0.762 / 0.760 — the gradient nearly vanishes. Spearman(atyp, own-FVE) = −0.14: a real but small residual penalty.
Finding —Most of the r=+0.67 is denominator mechanics; the AV’s proportional fidelity is nearly flat across the activation distribution — it tracks unusual activations almost as well, proportionally, as typical ones. Robustness echo of the adjacent-layer result (REPORT_01 §6): the vector input side degrades gracefully.
§3
Token-type strata: the digit hole
non-lexical material
Fig. 3 — FVE by token type. Word-start tokens (the bulk of the eval) decode at 0.799; digit tokens at 0.561 — the channel's true weak spot. Dashed line: the 0.775 headline.
POS among word-start tokens is flat — the verbalization channel handles all word classes equally; its weak spot is non-lexical material. Only PROPN dips.
word-start POS classes
PROPN
0.759 FVE
ADP
0.788 FVE
PRON
0.793 FVE
NOUN
0.796 FVE
VERB
0.802 FVE
ADJ
0.807 FVE
ADV
0.823 FVE
AUX
0.845 FVE
0.7FVE0.9
Fig. 4 — FVE by POS among word-start tokens, on a deliberately tight domain. NOUN 0.796, VERB 0.802, ADJ 0.807, ADV 0.823, AUX 0.845, ADP 0.788, PRON 0.793 cluster tightly; PROPN (0.759) is the only dip. Cells under n=15 suppressed.
The worst-12 list says it plainly — citation, date, and numeric contexts dominate:
11:145-50page range, journal citation
(2015citation year
(00:24timestamp
the 1of “1990s”-style
,comma in an address list
GSTRtax-form date
Finding —Best-6 samples are function/content words in clean argumentative prose (mse ≈ 0.003–0.004, FVE > 0.93). Coherent with REPORT_01 §3: spans are semantic reconstructions, lexically inexact — a 148-token confabulated-span explanation cannot pin arbitrary digit strings; that is exactly the content semantic paraphrase cannot carry. Direct follow-up: idea #12, random-string channel capacity.
§4
Document effects: cleanliness yes, depth no
quartile FVEs
doc alpha-frac (dirty → clean)
Q1
0.713 FVE
Q2
0.759 FVE
Q3
0.797 FVE
Q4
0.819 FVE
doc length (72 → 2048 tok)
Q1
0.763 FVE
Q2
0.755 FVE
Q3
0.781 FVE
Q4
0.789 FVE
relative position in doc
Q1
0.783 FVE
Q2
0.781 FVE
Q3
0.764 FVE
Q4
0.760 FVE
0.7FVE0.8
Fig. 5 — FVE by document-feature quartile, shared domain. Alpha-fraction (dirty→clean) spans 0.713→0.819 — boilerplate/listy docs cost ~0.11 FVE. Doc length (72→2048 tok) and relative position are near-flat (≤0.03 spread).
Finding —The ~0.11 FVE cleanliness cost is consistent with “pure UFW prose is the clean end of the train mix” (REPORT_01 §1). Depth-in-context is worth ≤0.03: what L41 encodes at a position, as read by this NLA, is mostly local. This deflates the expected payoff of the context-length-isolation follow-up (REPORT_02 §8.6).
§5
Explanation length: marker, not mechanism
demeaned within-activation
Across activations, longer and quote-denser explanations accompany lower error (expl_len r=−0.41, expl_qfrac r=−0.36). But within an activation — across its 10 temp-1 samples, activation fixed effects removed via demeaning (10,000 samples) — the correlation evaporates. The contrast is the result.
Fig. 6 — Spearman(mse, expl_len) across vs within activations. Across: −0.406. Within (demeaned, 10,000 samples): −0.002, p=0.86; n_spans −0.034, p<0.001 but negligible.
Finding —The AV writes longer when there is decodable content; sampling a longer explanation for a fixed vector buys nothing. Independent cheap-side confirmation that the 148-token RL-pinned budget isn’t the binding constraint sample-to-sample — more test-time tokens don’t help, matching the best-of-N convergence (REPORT_01 §5).
§6
Predictability of failure
doc-level 5-fold CV
0.556
cv-ρ², all 13 features
rank-OLS, no doc leakage
0.451
cv-ρ², atypicality alone
one feature, no decode
pre-decode
strongest predictors
atypicality · raw norm · doc-mate cos
free
coverage flag
no AV call needed
Over half of per-sample error rank is predictable before any decode — and the strongest predictors are computable from the activation alone.
Operational consequence —A calibrated “this sample will decode badly” flag is free. The OpenInterp critique’s legitimate half — FVE won’t warn you off-distribution (REPORT_04 §11) — has a cheap mitigation: distance-from-training-mean as a coverage gate, no AV call needed.
§7
Caveats
observational
#
caveat
1
Per-activation error is temp-1 mean-of-10, not greedy (aggregate gap 0.003 FVE; rank order essentially shared).
2
Strata are observational and correlated (dirty docs contain the digits and punctuation; atypicality correlates with doc_alpha_frac) — the rank-OLS handles joint attribution only in rank space. No causal claims.
3
n=39 for digit tokens (FVE 0.561 ± ~0.05); POS cells under n=15 suppressed.
4
Same 200-doc clustering caveat as the whole series; spaCy tags on 240-char tails are approximate for the first words of the tail (the target token is always tail-final, so its tag is reliable).
§8
Verdict
Verdict7/10
The headline 0.775 decomposes into ~0.80–0.86 on the clean-prose lexical material that dominates the eval mix, with a tolerated tail of digits (0.56), punctuation (0.71), boilerplate docs (0.71), and far-from-mean activations (0.65 global / ~0.76 own-baseline). Nothing here weakens the release — the strata are exactly what the confabulated-semantic-span mechanism (REPORT_01 §3–4) predicts, and the proportional-fidelity flatness across the activation distribution is a genuinely good property. The two actionable outputs: digits as the measured capacity boundary of the text channel, and a free pre-decode difficulty flag. Cheap, clarifying, no surprises that contradict the series; raises the value of follow-up #12 (random-string capacity) and lowers #6 (context-length isolation).