Spaces:
Running
title: null
emoji: 🤗
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
tags:
- evaluate
- metric
description: >-
FrugalScore is a reference-based metric for NLG models evaluation. It is based
on a distillation approach that allows to learn a fixed, low cost version of
any expensive NLG metric, while retaining most of its original performance.
Metric Description
FrugalScore is a reference-based metric for Natural Language Generation (NLG) model evaluation. It is based on a distillation approach that allows to learn a fixed, low cost version of any expensive NLG metric, while retaining most of its original performance.
The FrugalScore models are obtained by continuing the pretraining of small models on a synthetic dataset constructed using summarization, backtranslation and denoising models. During the training, the small models learn the internal mapping of the expensive metric, including any similarity function.
How to use
When loading FrugalScore, you can indicate the model you wish to use to compute the score. The default model is moussaKam/frugalscore_tiny_bert-base_bert-score
, and a full list of models can be found in the Limitations and bias section.
>>> frugalscore = evaluate.load("frugalscore", "moussaKam/frugalscore_medium_bert-base_mover-score")
FrugalScore calculates how good are the predictions given some references, based on a set of scores.
The inputs it takes are:
predictions
: a list of strings representing the predictions to score.
references
: a list of string representing the references for each prediction.
Its optional arguments are:
batch_size
: the batch size for predictions (default value is 32
).
max_length
: the maximum sequence length (default value is 128
).
device
: either "gpu" or "cpu" (default value is None
).
>>> results = frugalscore.compute(predictions=['hello there', 'huggingface'], references=['hello world', 'hugging face'], batch_size=16, max_length=64, device="gpu")
Output values
The output of FrugalScore is a dictionary with the list of scores for each prediction-reference pair:
{'scores': [0.6307541, 0.6449357]}
Values from popular papers
The original FrugalScore paper reported that FrugalScore-Tiny retains 97.7/94.7% of the original performance compared to BertScore while running 54 times faster and having 84 times less parameters.
Examples
Maximal values (exact match between references
and predictions
):
>>> frugalscore = evaluate.load("frugalscore")
>>> results = frugalscore.compute(predictions=['hello world'], references=['hello world'])
>>> print(results)
{'scores': [0.9891098]}
Partial values:
>>> frugalscore = evaluate.load("frugalscore")
>>> results = frugalscore.compute(predictions=['hello world'], references=['hugging face'])
>>> print(results)
{'scores': [0.42482382]}
Limitations and bias
FrugalScore is based on BertScore and MoverScore, and the models used are based on the original models used for these scores.
The full list of available models for FrugalScore is:
Depending on the size of the model picked, the loading time will vary: the tiny
models will load very quickly, whereas the medium
ones can take several minutes, depending on your Internet connection.
Citation
@article{eddine2021frugalscore,
title={FrugalScore: Learning Cheaper, Lighter and Faster Evaluation Metrics for Automatic Text Generation},
author={Eddine, Moussa Kamal and Shang, Guokan and Tixier, Antoine J-P and Vazirgiannis, Michalis},
journal={arXiv preprint arXiv:2110.08559},
year={2021}
}