BERTić-COMtext-SR-legal-NER-ekavica

BERTić-COMtext-SR-legal-NER-ekavica is a variant of the BERTić model, fine-tuned on the task of named entity recognition in Serbian legal texts written in the Ekavian pronunciation. The model was fine-tuned for 20 epochs on the Ekavian variant of the COMtext.SR.legal dataset.

Benchmarking

This model was evaluated on the task of named entity recognition in Serbian legal texts. The model uses a newly developed named entity schema consisting of 21 entity types, tailored for the domain of Serbian legal texts, and encoded according the the IOB2 standard. The full entity list is available on the COMtext.SR GitHub repository.

This model was compared with SrBERTa, a model specially trained on Serbian legal texts, fine-tuned for 20 epochs for named entity recognition using the Ekavian variant of the COMtext.SR.legal corpus of legal texts. Token-level accuracy and F1 (macro-averaged and per-class) were used as evaluation metrics and gold tokenized text was taken as input.

Two evaluation settings for both models were considered:

Default - only the entity type portion of the NE tag is considered, effectively ignoring the "B-" and "I-" prefixes
Strict - the entire NE tag is considered

For the strict setting, per-class results are given separately for each B-CLASS and I-CLASS tag. In addition, macro-averaged F1 scores are presented in two variants - one where the O (outside) class is ignored, and another where it is treated equally to other named entity classes.

BERTić-COMtext-SR-legal-NER-ekavica and SrBERTa were fine-tuned and evaluated on the COMtext.SR.legal.ekavica corpus using 10-fold CV.

The code and data to run these experiments is available on the COMtext.SR GitHub repository.

Results

Metrics	BERTić-COMtext-SR-legal-NER-ekavica (default)	BERTić-COMtext-SR-legal-NER-ekavica (strict)	SrBERTa (default)	SrBERTa (strict)
Accuracy	0.9849	0.9837	0.9685	0.9670
Macro F1 (with O)	0.8522	0.8418	0.7270	0.7152
Macro F1 (without O)	0.8355	0.8335	0.7033	0.7028
Per-class F1
PER	0.9811	0.9734 / 0.9713	0.8695	0.8216 / 0.8901
LOC	0.9027	0.9016 / 0.8520	0.6858	0.6770 / 0.6557
ADR	0.9252	0.8803 / 0.9168	0.8448	0.7841 / 0.8297
COURT	0.9450	0.9424 / 0.9408	0.7809	0.7440 / 0.7867
INST	0.7848	0.7912 / 0.8087	0.6346	0.6487 / 0.6376
COM	0.7577	0.6932 / 0.7435	0.4719	0.3685 / 0.4461
OTHORG	0.4458	0.3223 / 0.5464	0.3054	0.2471 / 0.3597
LAW	0.9583	0.9565 / 0.9572	0.9133	0.8793 / 0.9130
REF	0.8315	0.7611 / 0.8200	0.7706	0.6386 / 0.7609
IDPER	0.9630	0.9630 / N/A	1.0000	1.0000 / N/A
IDCOM	0.9779	0.9779 / N/A	0.9018	0.9018 / N/A
IDTAX	1.0000	1.0000 / N/A	0.9667	0.9667 / N/A
NUMACC	1.0000	1.0000 / N/A	0.6667	0.6667 / N/A
NUMDOC	0.5333	0.5333 / N/A	0.3333	0.3333 / N/A
NUMCAR	0.6111	0.5079 / 0.4286	0.3879	0.4333 / 0.0
NUMPLOT	0.7143	0.7143 / N/A	0.4928	0.4928 / N/A
IDOTH	0.6161	0.6161 / N/A	0.3967	0.3967 / N/A
CONTACT	0.8000	0.8000 / N/A	0.1333	0.1333 / N/A
DATE	0.9602	0.9383 / 0.9544	0.9491	0.9079 / 0.9492
MONEY	0.9703	0.9543 / 0.9662	0.8885	0.8926 / 0.8852
MISC	0.4445	0.4032 / 0.4149	0.2113	0.2154 / 0.1962
O	0.9946	0.9946	0.9870	0.9870

ICEF-NLP
/

bcms-bertic-comtext-sr-legal-ner-ekavica

BERTić-COMtext-SR-legal-NER-ekavica

Benchmarking

Results

Model tree for ICEF-NLP/bcms-bertic-comtext-sr-legal-ner-ekavica