Emotion Classification Model

Model Description

This model is a fine-tuned version of xlm-roberta-large for multilingual emotion classification tasks. It is trained to classify text into 9 distinct emotion categories:

Anger (0)
Fear (1)
Disgust (2)
Sadness (3)
Joy (4)
Enthusiasm (5)
Hope (6)
Pride (7)
No emotion (8)

The model is designed to analyze input text and predict the corresponding emotion, including the neutral "No emotion" category.

Model Performance

The model was evaluated on a dataset of 12,022 examples (10% of all data). Below is a summary of the performance metrics across all categories:

Emotion	Precision	Recall	F1-Score	Support
Anger (0)	0.70	0.50	0.59	2936
Fear (1)	0.56	0.13	0.21	317
Disgust (2)	0.56	0.35	0.43	105
Sadness (3)	0.69	0.40	0.51	334
Joy (4)	0.58	0.56	0.57	427
Enthusiasm (5)	0.42	0.15	0.23	544
Hope (6)	0.50	0.20	0.29	777
Pride (7)	0.57	0.32	0.41	354
No emotion (8)	0.64	0.88	0.74	6228

Overall Metrics

Accuracy: 64%
Macro Average: Precision: 0.58, Recall: 0.39, F1-Score: 0.44
Weighted Average: Precision: 0.63, Recall: 0.64, F1-Score: 0.61

Usage

Input

The model expects a text input in UTF-8 format. The input can be a sentence, paragraph, or any textual data.

Output

The model outputs a predicted emotion label from the predefined categories, along with the associated confidence scores.

Example

from transformers import pipeline

classifier = pipeline("text-classification", model="uvegesistvan/wildmann_german_proposal_0")

text = "Ich bin so glücklich über die Fortschritte, die ich gemacht habe!"
prediction = classifier(text)

print(prediction)
# Output: [{'label': 'Joy', 'score': 0.85}]

Training Data

The model was trained on a dataset containing labeled examples for 9 emotions. All training data was on german. The "No emotion" category is the most represented in the dataset.

Limitations and Bias

The model's performance may vary across languages or cultural contexts not well-represented in the training data.
The "Fear" and "Enthusiasm" categories have lower recall and F1 scores, indicating potential underperformance in these classes.