PsyProxy
Methodology

Gold Medal Page

Olympic-style medal tally across the v5 dataset cards. Each (dataset, target) is one competition. Only datasets with at least 5 criterion-model families competing count toward the medal table — uncontested entries (PsyProxy alone) are excluded. PsyProxy competes as one family (best of 4 lenses × strict/permissive + Sinhala translation arms). OpenAI competes as one family (best of Rathje-construct regression on GPT-4o-mini, GPT-4.1-nano, GPT-5-nano). Lexicon-based and topic-model baselines compete as their own singletons. Primary metric per task type: binary =FVE․Binomial, ordinal =Quadratic Kappa, regression =R², multilabel/multiclass =Macro F1.

source · methodology/gold_medal_page.html

Medal table

family
gold
silver
bronze
total
PsyProxy
10
4
0
14
OpenAI (Rathje)
3
1
2
6
Topic models
1
4
1
6
LIWC
0
2
6
8
TextDescriptives
0
2
1
3
VADER
0
1
1
2
Empath
0
0
1
1
TAACO
0
0
1
1
TAASSC
0
0
1
1

Metric policy

binary
Binary = FVE, with AUC/F1 as secondary checks
ordinal
Ordinal = quadratic κ, with within-one/MAE as secondary checks
regression
Regression = R², with RMSE/MAE as secondary checks
multiclass
Multiclass = macro-F1, with accuracy/macro-AUC as secondary checks

Per-dataset podium

Druglib drug-effectiveness reviews
ordinal · Quad. κ · 10 families
GPsyProxy0.590
STopic models0.322
BOpenAI (Rathje)0.234
Druglib drug-overall-rating reviews
regression · · 10 families
GPsyProxy0.307
STextDescriptives0.163
BTAASSC0.145
LIAR political-claim truth ratings
ordinal · Quad. κ · 10 families
GPsyProxy0.269
SLIWC0.208
BTextDescriptives0.183
Disneyland reviews: Hong Kong vs California branch
binary · FVE · 10 families
GPsyProxy0.622
STopic models0.524
BLIWC0.114
ACL-IMDB movie-review sentiment
binary · FVE · 10 families
GOpenAI (Rathje)0.898
SPsyProxy0.646
BLIWC0.305
GoEmotions Reddit comments multi-label emotion
multiclass · Macro-F1 · 9 families
GPsyProxy0.165
SLIWC0.089
BEmpath0.055
Empathetic Dialogues context labels
multiclass · Macro-F1 · 9 families
GPsyProxy0.257
STopic models0.097
BLIWC0.085
Amazon video-game reviews: verified-purchase binary
binary · FVE · 10 families
GPsyProxy0.232
STextDescriptives0.187
BTAACO0.177
Sentiment140 Twitter sentiment
binary · FVE · 10 families
GOpenAI (Rathje)0.435
SPsyProxy0.303
BLIWC0.195
CFPB consumer complaints: explanation vs relief response
binary · FVE · 10 families
GTopic models0.074
SPsyProxy0.065
BLIWC0.043
Amazon video-game reviews: star rating (ordinal)
ordinal · Quad. κ · 10 families
GPsyProxy0.767
SOpenAI (Rathje)0.548
BVADER0.452
Amazon video-game reviews: star rating (regression variant)
regression · · 8 families
GPsyProxy0.671
SVADER0.339
BLIWC0.316
Reddit suicidality subset (binary)
binary · FVE · 10 families
GOpenAI (Rathje)0.879
SPsyProxy0.805
BTopic models0.742
Disneyland Reviews — Star Rating
regression · · 10 families
GPsyProxy0.604
STopic models0.579
BOpenAI (Rathje)0.344