Spaces:
Running
Running
Fix error when force_tokens includes multi-word sequence to preserve
Browse filesRight now an error occurs when force_tokens includes a sequence which contains spaces and is contained in the prompt. This is caused by [`word, label = line.split(label_sep)`](https://huggingface.co/spaces/microsoft/llmlingua-2/blob/main/app.py#L50), where `line` is for example `The answer is 1` (where "The answer is" is the sequence in `force_tokens` and 1 is the corresponding label). The error is thrown because `line.split(label_sep)` returns `['The', 'answer', 'is', '1']`, which is too many arguments to be unpacked into `word, label`. The fix is to split only at the first occurence of `label_sep` from the right.
app.py
CHANGED
@@ -47,7 +47,7 @@ def compress(original_prompt, compression_rate, base_model="xlm-roberta-large",
|
|
47 |
lines = results["fn_labeled_original_prompt"].split(word_sep)
|
48 |
preserved_tokens = []
|
49 |
for line in lines:
|
50 |
-
word, label = line.
|
51 |
preserved_tokens.append((word, '+') if label == '1' else (word, None))
|
52 |
|
53 |
return compressed_prompt, preserved_tokens, n_word_compressed
|
|
|
47 |
lines = results["fn_labeled_original_prompt"].split(word_sep)
|
48 |
preserved_tokens = []
|
49 |
for line in lines:
|
50 |
+
word, label = line.rsplit(label_sep, 1)
|
51 |
preserved_tokens.append((word, '+') if label == '1' else (word, None))
|
52 |
|
53 |
return compressed_prompt, preserved_tokens, n_word_compressed
|