|
--- |
|
language: |
|
- de |
|
- en |
|
- multilingual |
|
widget: |
|
- text: "In December 1903 in France the Royal Swedish Academy of Sciences awarded Pierre Curie, Marie Curie, and Henri Becquerel the Nobel Prize in Physics." |
|
- text: "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen." |
|
- text: "My name is Julian and I live in montreal" |
|
- text: "My name is clara and I live in berkeley, california." |
|
- text: "My name is wolfgang and I live in berlin" |
|
tags: |
|
- roberta |
|
license: mit |
|
datasets: |
|
- wikiann |
|
--- |
|
|
|
# Roberta for Multilingual Named Entity Recognition |
|
|
|
## Model description |
|
|
|
#### Limitations and bias |
|
This model is limited by its training dataset of entity-annotated news articles from a specific span of time. This may not generalize well for all use cases in different domains. |
|
|
|
## Training data |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
|
|
model = RobertaForTokenClassification.from_pretrained("julian-schelb/roberta-ner-multilingual/") |
|
|
|
text = "Für Richard Phillips Feynman war es immer wichtig in New York, die unanschaulichen Gesetzmäßigkeiten der Quantenphysik Laien und Studenten nahezubringen und verständlich zu machen." |
|
|
|
inputs = tokenizer( |
|
text, |
|
add_special_tokens=False, return_tensors="pt" |
|
) |
|
|
|
with torch.no_grad(): |
|
logits = model(**inputs).logits |
|
|
|
predicted_token_class_ids = logits.argmax(-1) |
|
|
|
# Note that tokens are classified rather then input words which means that |
|
# there might be more predicted token classes than words. |
|
# Multiple token classes might account for the same word |
|
predicted_tokens_classes = [model_tuned.config.id2label[t.item()] for t in predicted_token_class_ids[0]] |
|
predicted_tokens_classes |
|
``` |