WRAPresentations -- A TACO-based Embedder For Inference and Information-Driven Argument Mining on Twitter
Introducing WRAPresentations, a cutting-edge sentence-transformers model that leverages the power of a 768-dimensional dense vector space to map tweets according to the four classes Reason, Statement, Notification and None. This powerful model is tailored for argument mining on Twitter, derived from the BERTweet-base architecture initially pre-trained on Twitter data. Through fine-tuning with the TACO dataset, WRAPresentations is effectively in encoding inference and information in tweets.
Class Semantics
The TACO framework revolves around the two key elements of an argument, as defined by the Cambridge Dictionary. It encodes inference as a guess that you make or an opinion that you form based on the information that you have, and it also leverages the definition of information as facts or details about a person, company, product, etc..
WRAPresentations, to some degree, captures the semantics of these critical components in its embedding space.
Consequently, it has also learned the class semantics, where inferences and information can be aggregated in relation to these distinct classes containing these components:
- Statement, which refers to unique cases where only the inference is presented as something that someone says or writes officially, or an action done to express an opinion.
- Reason, which represents a full argument where the inference is based on direct information mentioned in the tweet, such as a source-reference or quotation, and thus reveals the author’s motivation to try to understand and to make judgments based on practical facts.
- Notification, which refers to a tweet that limits itself to providing information, such as media channels promoting their latest articles.
- None, a tweet that provides neither inference nor information.
In its entirety, WRAPresentations encodes the following hierarchy for tweets:
Class Semantic Transfer to Embeddings
Observing the tweet distribution given CLS
tokens for later classification within the embedding space of WRAPresentations, we noted that
pre-classification fine-tuning via contrastive learning led to denser emergence of the expected class sectors compared to the embeddings of BERTweet,
as shown in the following figure.
Usage (Sentence-Transformers)
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model to generate tweet representations like this:
from sentence_transformers import SentenceTransformer
tweets = ["This is an example #tweet", "Each tweet is converted"]
model = SentenceTransformer("TomatenMarc/WRAPresentations")
embeddings = model.encode(tweets)
print(embeddings)
Notice: The tweets need to undergo preprocessing following the specifications for BERTweet-base.
Usage (HuggingFace Transformers)
Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on top of the contextualized word embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Tweets we want embeddings for
tweets = ["This is an example #tweet", "Each tweet is converted"]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained("TomatenMarc/WRAPresentations")
model = AutoModel.from_pretrained("TomatenMarc/WRAPresentations")
# Tokenize sentences
encoded_input = tokenizer(tweets, padding=True, truncation=True, return_tensors="pt")
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
print("Sentence embeddings:")
print(sentence_embeddings)
Furthermore, the WRAPresentations model is a highly suitable embedding component for AutoModelForSequenceClassification
, enabling
further fine-tuning of tweet classification tasks specifically for the four classes: Reason, Statement, Notification, and None.
The categorization of Reason and Statement as argument classes and Notification and None as non-argument classes is implicitly learned during
the fine-tuning process. This setup facilitates efficient identification and analysis of argumentative content and non-argumentative content in tweets.
Training
The WRAPresentations model underwent fine-tuning with 1,219 golden tweets from the TACO dataset, covering six topics. Five topics were chosen for optimization, representing 925 tweets (75.88%) covering #brexit (33.3%), #got (17%), #lotrrop (18.8%), #squidgame (17.1%), and #twittertakeover (13.8%). The model used a stratified 60/40 split for training/testing on optimization data. Additionally, 294 golden tweets (24.12%) related to the topic of #abortion were chosen as the holdout-set for final evaluation.
Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
BERTweet-base as a fill-mask
model. We chose to omit 10% of the words because this resulted in the
smallest possible average cosine distance between the tweets and their augmentations of ~0.08 making augmentation during pre-classification
fine-tuning itself a regulating factor prior to any overfitting with the later test data.
During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
of the respective data split.
This process created 162,064 pairs for training and 71,812 pairs for testing. An additional 53,560 pairs were used for final evaluation with the
holdout data. Moreover, we utilized MEAN
pooling, enhancing sentence representations, for fine-tuning.
The model was trained with the parameters:
DataLoader:
torch.utils.data.dataloader.DataLoader
of length 5065 with parameters:
{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
Loss:
sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss
with parameters:
{'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}
Parameters of the fit()-Method:
{
"epochs": 5,
"evaluation_steps": 1000,
"evaluator": "sentence_transformers.evaluation.BinaryClassificationEvaluator.BinaryClassificationEvaluator",
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 4e-05
},
"scheduler": "WarmupLinear",
"steps_per_epoch": null,
"warmup_steps": 2533,
"weight_decay": 0.01
}
Evaluation Results
We optimized several BERTweet models with CLS
or MEAN
pooling and evaluated them using the BinaryClassificationEvaluator
of SBERT with
standard CLS
tokens for classification showing:
Model | Precision | Recall | F1 | Support |
---|---|---|---|---|
Vanilla BERTweet-CLS |
50.00% | 100.00% | 66.67% | 53,560 |
Augmented BERTweet-CLS |
65.69% | 86.66% | 74.73% | 53,560 |
WRAPresentations-CLS |
66.00% | 84.32% | 74.04% | 53,560 |
WRAPresentations-MEAN (current model) |
63.05% | 88.91% | 73.78% | 53,560 |
The outcomes for WRAPresentations-MEAN
are influenced by the utilization of CLS
pooling during testing, while MEAN
pooling was employed during
fine-tuning. Despite this, employing MEAN
pooling during the fine-tuning process still improved the CLS
representation, particularly in terms
of recall. When WRAPresentations-MEAN
is tested with MEAN
pooling, the resulting F1 score stands at 74.07%.
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)
Environmental Impact
- Hardware Type: A100 PCIe 40GB
- Hours used: 2h
- Cloud Provider: Google Cloud Platform
- Compute Region: asia-southeast1 (Singapore)
- Carbon Emitted: 0.21kg CO2
Licensing
WRAPresentations © 2023 is licensed under CC BY-NC-SA 4.0.
Citation
@inproceedings{feger-dietze-2024-bertweets,
title = "{BERT}weet{'}s {TACO} Fiesta: Contrasting Flavors On The Path Of Inference And Information-Driven Argument Mining On {T}witter",
author = "Feger, Marc and
Dietze, Stefan",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-naacl.146",
doi = "10.18653/v1/2024.findings-naacl.146",
pages = "2256--2266"
}
- Downloads last month
- 15
Model tree for TomatenMarc/WRAPresentations
Base model
vinai/bertweet-base