Model Card for en_textcat_resume_sections

This model is designed to classify sections within English-language resumes, including labels such as Skills, Education, Experience, and others.

Model Details

Model Description

This model utilizes spaCy's text classification component to categorize sections of resumes into predefined labels. It is trained on the ganchengguang/resume_seven_class dataset, which contains examples of various resume sections.

Model type: Text Classification
Language(s) (NLP): English
Finetuned from model: spacy/en_core_web_md

Uses

Direct Use

This model can be used to automatically classify sections within English-language resumes, facilitating the extraction of structured information from unstructured resume text. It can only classify Skills, Education, Experience, Profile and Summary successfully for now.

Downstream Use

This model can serve as a component in larger systems for resume parsing, candidate screening, or any application requiring the identification of specific sections within resumes.

Out-of-Scope Use

This model is not designed for tasks outside of resume section classification, such as general text classification or Named Entity Recognition (NER) in non-resume texts.

Bias, Risks, and Limitations

The model's performance is dependent on the quality and diversity of the training data. It may not perform well on resumes that differ significantly from the training examples. Additionally, the model may have biases based on the dataset it was trained on.

Recommendations

Users should be aware of the model's limitations and biases. It is recommended to evaluate the model's performance on a diverse set of resumes before deploying it in production environments.

How to Get Started with the Model

https://github.com/ssobii2/Wozify-CV-Parser
Checks Spacy's Website

Training Details

Training Data

The model was trained on the ganchengguang/resume_seven_class dataset, which contains examples of various resume sections.

Training Procedure

The model was fine-tuned using spaCy's text classification component. The training involved the following steps:

Data preprocessing: Tokenization and vectorization of resume text.
Model training: Fine-tuning the spacy/en_core_web_md model on the preprocessed data.
Evaluation: Assessing the model's performance on a validation set.

Preprocessing

The text data was cleaned by removing special characters, normalizing whitespace, and converting text to lowercase. Tokenization was performed using spaCy's tokenizer.

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a separate test set from the ganchengguang/resume_seven_class dataset, containing examples of resume sections not seen during training.

Factors

The evaluation considered factors such as resume length, formatting, and the presence of uncommon sections.

Metrics

The model's performance was measured using accuracy, precision, recall, and F1-score.

Results

The model achieved the following results on the test set:

Text Categorization Model Performance Metrics

Summary Section

Precision: 88.4%
Recall: 89.8%
F1-score: 89.1%

Profile Section

Precision: 95.2%
Recall: 88.3%
F1-score: 91.6%

Education Section

Precision: 93.2%
Recall: 90.5%
F1-score: 91.9%

Experience Section

Precision: 78.8%
Recall: 82.5%
F1-score: 80.6%

Skills Section

Precision: 88.5%
Recall: 88.5%
F1-score: 88.5%

Overall Model Performance

Micro Precision: 88.3%
Micro Recall: 87.7%
Micro F1-score: 88.0%
Macro Precision: 88.8%
Macro Recall: 87.9%
Macro F1-score: 88.3%
Macro AUC: 97.8%

Summary

The model performs best on Education and Profile sections, while the Experience section has relatively lower performance metrics. The Skills section shows balanced precision and recall.

Technical Specifications

Model Architecture and Objective

The model is based on spaCy's text classification component, utilizing the spacy/en_core_web_md base model. The objective is to classify resume sections into predefined categories.

Compute Infrastructure

The model was trained on my personal gaming laptop. The config file can be found inside the model folder.

Hardware

Intel Core-i7-13620H
16GB RAM
RTX 4070 Laptop GPU 8GB VRAM

Software

Operating System: Windows 11
Libraries: spaCy

ThunderJaw
/

en_textcat_resume_sections