Excitement Index — Methodology

Reference document for the Excitement Index column shown on every dataset card. Authored 2026-04-26 from the 2026-04-25 candidate-metrics literature review and the 2026-04-26 rerun on permissive positives across all four lenses.

source · methodology/excitement_index.html

One-paragraph summary

For each lens, every latent dimension is a 17,000-item-long column of factor loadings. We want a label-free, column-only statistic that tells us which dimensions are likely to be selected by the downstream stepwise evaluator (ACE-rank → EBIC stop) when fit against any held-out target. Across 114 candidate column statistics evaluated on permissive-selected positives across all four lenses (Health, Behavioral Sciences, Social Economics, Technology), the simplest one wins: raw standard deviation of the loading column . We call it the Excitement Index because high-SD columns "get excited" about specific subsets of items rather than spreading their variance evenly. We display each lens's top-25 most-excited dimensions on every dataset card, and color a dimension red if it was not selected by that card's mirroring regression and green if it was.

Definition & formula

Let L i be the loading vector of latent dimension i across all items in the lens's training corpus (≈17,180 items for the Health Lens, similar for the others). Then: Excitement i = sd(L i ) That is, the unbiased sample standard deviation of the raw, un-standardized loadings on a single dimension across all items. No target, no covariates, no transforms.

Why this matters

The PsyProxy stepwise evaluator picks features greedily by ACE-rank, stopping at the EBIC criterion (γ=1 strict, γ=0.5 permissive). Two dimensions with the same mean loading magnitude can have very different selection probabilities: the one whose loadings are sharply concentrated on a small subset of items carries more discriminative information, while the one whose loadings are spread evenly across all items behaves like noise. Excitement Index captures that "sharp vs. spread" distinction in a single number that requires no target.

The 2026-04-26 rerun — result leaderboard

Permissive-selected dimensions across all four lenses (n = 50 positives per lens) served as the "ever-selected" class. ACE-transformed Wilcoxon rank-sum tests for every candidate metric vs. that label, separately per lens, then averaged the absolute z-statistic across lenses. Top 10: # metric mean |z| sd |z| n lenses min p 1 raw_sd 11.918 0.047 4 5.7e-33 2 cvar95_abs 11.739 0.106 4 2.1e-32 3 cvar99_abs 10.788 0.282 4 9.8e-29 4 l2_raw 9.209 1.276 4 2.3e-25 5 winsor_sd10_raw 5.103 2.174 4 2.9e-13 6 trimmed_sd10_raw 4.725 2.164 4 1.9e-12 7 clr_var_absmass 4.648 2.560 4 1.3e-13 8 gini_index_signal 4.326 2.669 4 1.1e-13 9 top_k_mass_5pct 4.301 2.634 4 1.4e-13 10 atkinson_eps1_abs 4.292 2.748 4 1.2e-13 Source: data/bmv3/derived/proxy_selection_metrics/ace_metric_ranking_2026-04-26.csv . Raw SD is statistically significantly ahead of every other candidate (sd of |z| across lenses is 0.047 vs. 0.11+ for the runner-ups, indicating it's also the most stable across lenses).

Discrimination test methodology

For each candidate metric and each lens: Compute the metric for every dimension in the lens's loading matrix (e.g., 1100 dims for the Health Lens). Apply the ACE smoother (Alternating Conditional Expectations, Breiman & Friedman 1985) to the metric values, transforming them onto a monotone scale that maximizes their correlation with the binary "ever-selected" label. This handles non-linear relationships symmetrically across candidates. Run a Wilcoxon rank-sum test (Mann-Whitney U) on the ACE-transformed values, separating positives (ever-selected dims for this lens) from negatives (never-selected). Record the absolute z-statistic and the two-sided p-value. Aggregate across the four lenses by averaging |z|. Higher mean |z| ⇒ better discrimination. Stable mean |z| (low sd across lenses) ⇒ generalizes.

Why we tested 114 candidate metrics

The original 2026-03 benchmark was 62 statistics: 2 loading bases (raw, z-scored) × 4 transforms (none, SUPSMU, monotone, linear) × 7 base statistics (sd, IQR, kurtosis, |skew|, mean|x|, Gini, prop_above_1sd) + self-fit R². Raw SD already won that round at mean |z| ≈ 2.16. To make sure we hadn't missed an obvious better candidate, we cast a wider net for the 2026-04-25 expansion: 52 additional statistics drawn from twelve literatures (sparsity/inequality, entropy, multimodality, robust scale, higher moments, tail-weight, complexity/fractality, spectral analysis, factor-analysis utility, item-discrimination psychometrics, randomness/structure, and matrix-spectral feature ranking). The full annotated table follows.

The 52 expanded candidate metrics — with formulas, justifications, and citations

Drawn from research/excitement_lit_review/claude_lit_review_2026-04-25.md . Let x be the n=17,000 vector of loadings on a single dimension. Most metrics are O(n) or O(n log n); a few are subsampled to 5,000 to stay tractable. 1. Sparsity / Inequality (12 metrics) metric formula sketch citation gini_index_signal 2·Σ(i · x_sorted_abs[i]) / (n · Σ|x|) − (n+1)/n Hurley & Rickard (2009). Comparing Measures of Sparsity . IEEE TIT 55(10):4723–41. hoyer_sparsity (√n − L1/L2) / (√n − 1) Hoyer (2004). Non-negative Matrix Factorization with Sparseness Constraints . JMLR 5:1457–69. theil_index_T (1/n)·Σ ((|x_i|/mean|x|) · log(|x_i|/mean|x|)) Theil (1967). Economics and Information Theory . North-Holland. theil_index_L (1/n)·Σ log(mean|x|/|x_i|) Theil (1967). atkinson_eps_0p5 1 − (mean|x|^0.5)² / mean|x| Atkinson (1970). On the Measurement of Inequality . J. Econ. Theory 2(3):244–63. atkinson_eps_2 1 − 1/mean|x| · (mean|x|⁻¹)⁻¹ Atkinson (1970). pietra_robin_hood 0.5 · Σ |(|x_i| − mean|x|)| / Σ|x| Pietra (1915); Hoover (1936). lorenz_area 1 − 2 · trapz(Lorenz_curve(|x|)) Gastwirth (1972). Estimation of the Lorenz Curve and Gini Index . RES 54(3). herfindahl Σ((|x_i|/Σ|x|)²) Herfindahl (1950); Hirschman (1945). top_k_mass_5pct Σ(top 5% of |x|) / Σ|x| Concentration ratio / Pareto principle. effective_rank_entropy exp(−Σ p_i log p_i) where p_i = x_i² / Σx² Roy & Vetterli (2007). Effective Rank: A Measure of Effective Dimensionality . EUSIPCO. participation_ratio (Σx²)² / Σx⁴ Bell & Dean (1970). Atomic vibrations in vitreous silica . Discuss. Faraday Soc. 50:55–61. 2. Entropy (12 metrics) metric formula sketch citation shannon_entropy_kde −∫ f(x) log f(x) dx via KDE Beirlant et al. (1997). Nonparametric Entropy Estimation . Int. J. Math. Stat. Sci. 6:17–39. histogram_entropy_50bin −Σ p_b

The original 62-metric structure (2026-03 benchmark)

For completeness: the 2026-03 benchmark covered 2 loading bases × 4 transforms × 7 base statistics + self-fit R² = 57 + 5 = 62 metrics : Loading bases: raw loadings; z-scored loadings (mean-0, sd-1 per column). Transforms: none, SUPSMU (Friedman 1984 super smoother on the sorted x), monotone (PAVA isotonic regression of the sorted x against rank), linear (best-fit OLS of the sorted x against rank). Base statistics: sd, IQR (q75 − q25), excess kurtosis, |skew|, mean|x|, Gini coefficient, prop_above_1sd (fraction of |x_i| > 1 sd of x). Plus self-fit R² : how well the dim's loading vector linearly fits its own rank-curve under each transform. Of those 62, raw_sd (no transform) was the winner at mean |z| ≈ 2.16. The 2026-04-26 rerun expanded the candidate set with the 52 above, broadened positives to permissive selections across all four lenses, and saw raw_sd's effect size jump to 11.92 simply because the larger positives pool gave the test more power.

Why "Excitement"?

Dimensions with high raw SD have sharply differentiated loading profiles — some items load heavily, most don't. The dimension "gets excited" about a specific subset. Low-SD dimensions spread their variance evenly across all items; they fail to discriminate. The PsyProxy stepwise evaluator empirically agrees: ever-selected dimensions have systematically higher raw SD than never-selected ones, by ≈12 standard deviations on the Wilcoxon statistic.

Source files

Lit review (52 candidates, full annotated): research/excitement_lit_review/claude_lit_review_2026-04-25.md Original 62-metric implementation: scripts/compute_ace_transformed_metrics.py 21 raw-base metrics implementation: scripts/compute_proxy_selection_metrics.py 2026-04-26 rerun script: scripts/research/rerun_excitement_metrics.py 2026-04-26 rerun log: data/processed/excitement_rerun_2026-04-26.log Output leaderboard: data/bmv3/derived/proxy_selection_metrics/ace_metric_ranking_2026-04-26.csv Per-dimension Excitement values (DB-resident, served to dataset cards): registry.db.proxy_excitement table.

References (consolidated)

Anderson, T. W. & Darling, D. A. (1952). Asymptotic theory of certain "goodness-of-fit" criteria. Annals Math. Stat. 23:193–212. Atkinson, A. B. (1970). On the Measurement of Inequality. J. Econ. Theory 2(3):244–263. Bandt, C. & Pompe, B. (2002). Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 88:174102. Bell, R. J. & Dean, P. (1970). Atomic vibrations in vitreous silica. Discuss. Faraday Soc. 50:55–61. Birnbaum, A. (1968). Some latent trait models, in F. M. Lord & M. R. Novick, Statistical Theories of Mental Test Scores . Addison-Wesley. Breiman, L. & Friedman, J. H. (1985). Estimating Optimal Transformations for Multiple Regression and Correlation. JASA 80(391):580–598. Comrey, A. & Lee, H. (1992). A First Course in Factor Analysis 2e. Erlbaum. Cover, T. & Thomas, J. (2006). Elements of Information Theory 2e. Wiley. Daubechies, I. (1992). Ten Lectures on Wavelets . SIAM. Fano, U. (1947). Ionization yield of radiations. Phys. Rev. 72(1):26–29. Friedman, J. H. (1984). A Variable Span Smoother. Stanford LCS Tech. Rep. 5. Hampel, F. R. (1974). The Influence Curve and its Role in Robust Estimation. JASA 69:383–393. Hartigan, J. A. & Hartigan, P. M. (1985). The Dip Test of Unimodality. Annals of Statistics 13(1):70–84. Higuchi, T. (1988). Approach to an irregular time series on the basis of the fractal theory. Physica D 31(2):277–283. Hill, B. M. (1975). A Simple General Approach to Inference About the Tail of a Distribution. Annals of Stat. 3(5):1163–1174. Hjorth, B. (1970). EEG analysis based on time domain properties. EEG Clin. Neurophys. 29:306–310. Hosking, J. R. M. (1990). L-moments. JRSSB 52(1):105–124. Hoyer, P. O. (2004). Non-negative Matrix Factorization with Sparseness Constraints. JMLR 5:1457–1469. Hurley, N. & Rickard, S. (20