PsyProxy Context Validity — Top 3 Cross-Dataset Transfer (strict)

Frame from Larsen et al. (2025) — Context validity = the extent to which a measurement or model trained in one substantive context behaves consistently when applied to a different one.

source · context_validity_transfer_top3.html

Per target dataset, the three transfer models that scored highest on the target's holdout fold. Cross-dataset only (within-dataset transfer on different DVs excluded). Best lens picked per (target, source) pair. Click a dataset name to open its card. Drag a column header's right edge to resize.

Type # Target dataset Transfer model Proxy overlap Best lens n_h Score ord 1 Amazon video-game reviews: star rating (regression variant) Disneyland reviews star rating (ordinal) 7 / 15 Social Economics Lens 59,998 r=0.656 2 ACL-IMDB movie-review sentiment 6 / 14 Behavioral Sciences Lens 59,998 r=0.583 3 Sentiment140 Twitter sentiment 3 / 15 Health Lens 59,998 r=0.576 ord 1 Disneyland reviews star rating (ordinal) Amazon video-game reviews: star rating (regression variant) 5 / 14 Health Lens 8,532 r=0.591 2 ACL-IMDB movie-review sentiment 6 / 14 Behavioral Sciences Lens 8,532 r=0.568 3 Sentiment140 Twitter sentiment 2 / 14 Health Lens 8,532 r=0.509 ord 1 Douban movie reviews, 100K native Chinese (Eval 11) Amazon video-game reviews: star rating (regression variant) 0 / 15 Behavioral Sciences Lens 20,000 r=0.451 2 ACL-IMDB movie-review sentiment 0 / 14 Behavioral Sciences Lens 20,000 r=0.433 3 Disneyland reviews star rating (ordinal) 0 / 15 Behavioral Sciences Lens 20,000 r=0.428 ord 1 Druglib drug-overall-rating reviews Amazon video-game reviews: star rating (regression variant) 0 / 12 Social Economics Lens 824 r=0.297 2 Disneyland reviews star rating (ordinal) 2 / 12 Social Economics Lens 824 r=0.269 3 RateMyProfessor full (Eval 15) 0 / 15 Behavioral Sciences Lens 824 r=0.199 ord 1 Druglib drug-effectiveness reviews Amazon video-game reviews: star rating (regression variant) 0 / 10 Health Lens 824 r=0.225 2 Douban movie reviews, 100K native Chinese (Eval 11) 0 / 14 Social Economics Lens 824 r=0.207 3 Amazon video-game reviews: verified-purchase binary 1 / 14 Social Economics Lens 824 r=0.182 ord 1 LIAR political-claim truth ratings Depression vs Anxiety (4-lens eval) 0 / 2 Health Lens 2,568 r=0.221 2 Depression vs Anxiety ZAV sentences (4-lens eval) 0 / 1 Health Lens 2,5

Strict vs permissive (paired on identical target / source / lens cells)

Statistic Value paired cells 986 mean strict 0.2833 mean permissive 0.2774 mean diff (perm − strict) -0.0059 Wilcoxon signed-rank W = 199370, p = 0.06137 strict better 470 / 986 (47.7%) permissive better 456 / 986 (46.2%)

Per-lens performance (strict only, all cross-dataset cells)

Lens n cells mean primary metric fair-pair win-rate* Behavioral Sciences Lens 224 0.2880 27 / 154 (17.5%) Health Lens 314 0.2781 51 / 154 (33.1%) Social Economics Lens 224 0.2905 45 / 154 (29.2%) Technology Lens 224 0.2789 31 / 154 (20.1%) * Fair-pair win-rate = how often this lens scored highest on (target, source) pairs where all 4 lenses ran (154 such pairs).

Patterns observed

Strict regressions transfer slightly but significantly better than permissive. Mean diff = -0.0059 in strict's favor (Wilcoxon p = 0.0614). The 15-feature cap appears to limit overfitting more than it sacrifices generalization. The four lenses split between two roles: Behavioral Sciences Lens and Social Economics Lens lead by per-cell mean (~0.285), while Health Lens leads by fair-pair win-rate (~31%). BMv3 wins on the harder pairs (lowering its mean) while the other two compete on easier pairs (raising theirs). Proxy overlap tracks transfer score within type. The brightest-green ordinal cells (Disney↔Amazon, IMDB->Disney) tend to share the largest fraction of selected proxies; the orange/yellow cells (Druglib, LIAR) share much less, consistent with their construct-space distance. The "consumer-review polarity manifold" is the dominant cross-dataset signal: Disney ↔ Amazon ↔ IMDB ↔ sentiment140 ↔ Douban all transfer to each other at AUC/|ρ| = 0.43–0.92, often dwarfing what the targets get from any in-domain source. Health Lens dominates only when the construct space matches. LIAR truth-ordinal is the only target where a depression-vs-anxiety eval source outperforms the polarity-manifold sources — and BMv3 is the only winning lens for that target.