Gold Medal Page

Olympic-style medal tally across the v5 dataset cards. Each (dataset, target) is one competition. Only datasets with at least 5 criterion-model families competing count toward the medal table — uncontested entries (PsyProxy alone) are excluded. PsyProxy competes as one family (best of 4 lenses × strict/permissive + Sinhala translation arms). OpenAI competes as one family (best of Rathje-construct regression on GPT-4o-mini, GPT-4.1-nano, GPT-5-nano). Lexicon-based and topic-model baselines compete as their own singletons. Primary metric per task type: binary =FVE․Binomial, ordinal =Quadratic Kappa, regression =R², multilabel/multiclass =Macro F1.

source · methodology/gold_medal_page.html

Medal table

family

gold

silver

bronze

total

PsyProxy

10

4

0

14

OpenAI (Rathje)

3

1

2

6

Topic models

1

4

1

6

LIWC

0

2

6

8

TextDescriptives

0

2

1

3

VADER

0

1

1

2

Empath

0

0

1

1

TAACO

0

0

1

1

TAASSC

0

0

1

1

Metric policy

binary

Binary = FVE, with AUC/F1 as secondary checks

ordinal

Ordinal = quadratic κ, with within-one/MAE as secondary checks

regression

Regression = R², with RMSE/MAE as secondary checks

multiclass

Multiclass = macro-F1, with accuracy/macro-AUC as secondary checks

Per-dataset podium

Druglib drug-effectiveness reviews

ordinal · Quad. κ · 10 families

STopic models0.322

BOpenAI (Rathje)0.234

Druglib drug-overall-rating reviews

regression · R² · 10 families

STextDescriptives0.163

LIAR political-claim truth ratings

ordinal · Quad. κ · 10 families

BTextDescriptives0.183

Disneyland reviews: Hong Kong vs California branch

binary · FVE · 10 families

STopic models0.524

ACL-IMDB movie-review sentiment

binary · FVE · 10 families

GOpenAI (Rathje)0.898

GoEmotions Reddit comments multi-label emotion

multiclass · Macro-F1 · 9 families

Empathetic Dialogues context labels

multiclass · Macro-F1 · 9 families

STopic models0.097

Amazon video-game reviews: verified-purchase binary

binary · FVE · 10 families

STextDescriptives0.187

Sentiment140 Twitter sentiment

binary · FVE · 10 families

GOpenAI (Rathje)0.435

CFPB consumer complaints: explanation vs relief response

binary · FVE · 10 families

GTopic models0.074

Amazon video-game reviews: star rating (ordinal)

ordinal · Quad. κ · 10 families

SOpenAI (Rathje)0.548

Amazon video-game reviews: star rating (regression variant)

regression · R² · 8 families

Reddit suicidality subset (binary)

binary · FVE · 10 families

GOpenAI (Rathje)0.879

BTopic models0.742

Disneyland Reviews — Star Rating

regression · R² · 10 families

STopic models0.579

BOpenAI (Rathje)0.344