PsyProxy
Datasets·Sentiment·07__emotion_labels__goemotions_reddit__multilabel

GoEmotions Reddit comments multi-label emotion

GoEmotions is a public Google AI corpus of 56,984 Reddit comments labeled with one or more of 27 fine-grained emotions plus neutral. The texts in this corpus frequently engage with a variety of personal experiences and social observations, often reflecting on themes of relationships , emotional struggles , and social dynamics . Discussions often include complaints about interpersonal conflicts, such as manipulative behavior and mental games , alongside expressions of support or advice for navigating these challenges. There is also a notable presence of commentary on media consumption and its impact on emotional well-being, as well as reflections on cultural phenomena and public figures . The tone varies from humorous to serious, indicating a complex interplay of emotions and opinions within the community. [Summary on 50 random texts by ChatGPT 4o Mini].

Distribution of Primary emotion (28 classes, multilabel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
3,414 at floor1,131 at ceiling
56,984
items
11,397
holdout n
Primary emotion (28 classes, multilabel)
target
Multi-class
kind
13
systems compared
Criterion validity

Reported holdout systems from the verified card

Multiclass classification uses Macro-F1 as the task-primary metric. Secondary columns keep the companion metrics visible so binary, ordinal, regression, and multiclass cards are not compared through one flattened score.

Source podium · Macro-F1 · 9 families
Gold
PsyProxy
0.165
Silver
LIWC
0.089
Bronze
Empath
0.055
Model-family mix
PsyProxy · 4Lexicon · 2Topic model · 2Baseline · 5
SystemFamilyVariantMacro-F1AccuracyMacro-AUCPrimary scale
psyproxyPsyProxy — Behavioral Sciences Lens v0.5 · 1000d
PsyProxypermissive0.1650.389-1.000
psyproxyPsyProxy — Social Economics Lens v0.5 · 1000d
PsyProxypermissive0.1620.391-1.000
psyproxyPsyProxy — Technology Lens v0.5 · 800d
PsyProxypermissive0.1560.387-1.000
psyproxyPsyProxy — Health Lens v0.9 · 1100d
PsyProxypermissive0.1380.389-1.000
lexLinguistic Inquiry and Word Count (LIWC)
Lexiconpermissive0.0830.346-1.000
topicBERTopic
Topic modelpermissive0.0380.3330.537
baselineTool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC)
Baselinepermissive0.0380.329-1.000
lexValence Aware Dictionary and sEntiment Reasoner (VADER)
Lexiconpermissive0.0350.327-1.000
baselineEmpath
Baselinepermissive0.0240.325-1.000
baselineTextDescriptives
Baselinepermissive0.0220.322-1.000
baselineTool for the Automatic Analysis of Lexical Sophistication (TAALES)
Baselinepermissive0.0200.323-1.000
baselineTool for the Automatic Analysis of Cohesion (TAACO)
Baselinepermissive0.0170.323-1.000
topicHierarchical Dirichlet Process (tomotopy HDP)
Topic modelpermissive0.0170.3230.509