dslim commited on
Commit
504bc89
·
verified ·
1 Parent(s): 539f299

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -36
README.md CHANGED
@@ -12,57 +12,98 @@ model-index:
12
  - name: distilbert-NER
13
  results: []
14
  ---
 
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
 
19
- # distilbert-NER
20
 
21
- This model is a fine-tuned version of [distilbert-base-cased](https://huggingface.co/distilbert-base-cased) on an unknown dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.0710
24
- - Precision: 0.9202
25
- - Recall: 0.9232
26
- - F1: 0.9217
27
- - Accuracy: 0.9810
28
 
29
- ## Model description
30
 
31
- More information needed
 
 
 
 
32
 
33
  ## Intended uses & limitations
34
 
35
- More information needed
36
 
37
- ## Training and evaluation data
38
 
39
- More information needed
 
 
40
 
41
- ## Training procedure
 
 
 
 
42
 
43
- ### Training hyperparameters
 
 
44
 
45
- The following hyperparameters were used during training:
46
- - learning_rate: 2e-05
47
- - train_batch_size: 16
48
- - eval_batch_size: 16
49
- - seed: 42
50
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
51
- - lr_scheduler_type: linear
52
- - num_epochs: 3
53
 
54
- ### Training results
55
 
56
- | Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
57
- |:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
58
- | 0.2748 | 1.0 | 878 | 0.0959 | 0.8886 | 0.8976 | 0.8931 | 0.9739 |
59
- | 0.0635 | 2.0 | 1756 | 0.0721 | 0.9199 | 0.9228 | 0.9213 | 0.9805 |
60
- | 0.0411 | 3.0 | 2634 | 0.0710 | 0.9202 | 0.9232 | 0.9217 | 0.9810 |
61
 
 
62
 
63
- ### Framework versions
64
 
65
- - Transformers 4.35.2
66
- - Pytorch 2.1.0+cu121
67
- - Datasets 2.16.1
68
- - Tokenizers 0.15.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  - name: distilbert-NER
13
  results: []
14
  ---
15
+ # distilbert-NER
16
 
17
+ ## Model description
 
18
 
19
+ **distilbert-NER** is the fine-tuned version of **DistilBERT**, which is a distilled variant of the BERT model. DistilBERT has fewer parameters than BERT, making it smaller, faster, and more efficient. distilbert-NER is specifically fine-tuned for the task of **Named Entity Recognition (NER)**.
20
 
21
+ This model accurately identifies the same four types of entities as its BERT counterparts: location (LOC), organizations (ORG), person (PER), and Miscellaneous (MISC). Although it is a more compact model, distilbert-NER demonstrates a robust performance in NER tasks, balancing between size, speed, and accuracy.
 
 
 
 
 
 
22
 
23
+ The model was fine-tuned on the English version of the [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset, which is widely recognized for its comprehensive and diverse range of entity types.
24
 
25
+ ### Available NER models
26
+ | Model Name | Description | Parameters |
27
+ |-------------------|-------------|------------------|
28
+ | [bert-base-NER](https://huggingface.co/dslim/bert-base-NER) | Fine-tuned BERT-base model for NER - balanced performance | 110M |
29
+ | [distilbert-NER](https://huggingface.co/dslim/distilbert-NER) | Fine-tuned DistilBERT - smaller, faster, lighter than BERT | 66M |
30
 
31
  ## Intended uses & limitations
32
 
33
+ #### How to use
34
 
35
+ This model can be utilized with the Transformers *pipeline* for NER, similar to the BERT models.
36
 
37
+ ```python
38
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
39
+ from transformers import pipeline
40
 
41
+ tokenizer = AutoTokenizer.from_pretrained("dslim/distilbert-NER")
42
+ model = AutoModelForTokenClassification.from_pretrained("dslim/distilbert-NER")
43
+
44
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
45
+ example = "My name is Wolfgang and I live in Berlin"
46
 
47
+ ner_results = nlp(example)
48
+ print(ner_results)
49
+ ```
50
 
51
+ #### Limitations and bias
 
 
 
 
 
 
 
52
 
53
+ The performance of distilbert-NER is linked to its training on the CoNLL-2003 dataset. Therefore, it might show limited effectiveness on text data that significantly differs from this training set. Users should be aware of potential biases inherent in the training data and the possibility of entity misclassification in complex sentences.
54
 
55
+ ## Training data
 
 
 
 
56
 
57
+ The model was fine-tuned on the English version of the standard [CoNLL-2003 Named Entity Recognition](https://www.aclweb.org/anthology/W03-0419.pdf) dataset, known for its effectiveness in training NER models.
58
 
59
+ ## Training procedure
60
 
61
+ The training details, including hardware specifications, aren't specified. However, the model's training followed the best practices suitable for distilbert models, aiming at an efficient balance between learning efficiency and model accuracy.
62
+
63
+ ## Eval results
64
+ | Metric | Score |
65
+ |------------|-------|
66
+ | Loss | 0.0710|
67
+ | Precision | 0.9202|
68
+ | Recall | 0.9232|
69
+ | F1 | 0.9217|
70
+ | Accuracy | 0.9810|
71
+
72
+ The training and validation losses demonstrate a decrease over epochs, signaling effective learning. The precision, recall, and F1 scores are competitive, showcasing the model's robustness in NER tasks.
73
+
74
+ ### BibTeX entry and citation info
75
+
76
+ For DistilBERT:
77
+
78
+ ```
79
+ @article{sanh2019distilbert,
80
+ title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
81
+ author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
82
+ journal={arXiv preprint arXiv:1910.01108},
83
+ year={2019}
84
+ }
85
+ ```
86
+
87
+ For the underlying BERT model:
88
+
89
+ ```
90
+ @article{DBLP:journals/corr/abs-1810-04805,
91
+ author = {Jacob Devlin and
92
+ Ming{-}Wei Chang and
93
+ Kenton Lee and
94
+ Kristina Toutanova},
95
+ title = {{BERT:} Pre-training of Deep Bidirectional Transformers for Language
96
+ Understanding},
97
+ journal = {CoRR},
98
+ volume = {abs/1810.04805},
99
+ year = {2018},
100
+ url = {http://arxiv.org/abs/1810.04805},
101
+ archivePrefix = {arXiv},
102
+ eprint = {1810.04805},
103
+ timestamp = {Tue, 30 Oct 2018 20:39:56 +0100},
104
+ biburl = {https://dblp.org/rec/journals/corr/abs-1810-04805.bib},
105
+ bibsource = {db
106
+
107
+ lp computer science bibliography, https://dblp.org}
108
+ }
109
+ ```