Model Card for en_textcat_resume_sections
This model is designed to classify sections within English-language resumes, including labels such as Skills, Education, Experience, and others.
Model Details
Model Description
This model utilizes spaCy's text classification component to categorize sections of resumes into predefined labels. It is trained on the ganchengguang/resume_seven_class
dataset, which contains examples of various resume sections.
- Model type: Text Classification
- Language(s) (NLP): English
- Finetuned from model: spacy/en_core_web_md
Uses
Direct Use
This model can be used to automatically classify sections within English-language resumes, facilitating the extraction of structured information from unstructured resume text. It can only classify Skills, Education, Experience, Profile and Summary successfully for now.
Downstream Use
This model can serve as a component in larger systems for resume parsing, candidate screening, or any application requiring the identification of specific sections within resumes.
Out-of-Scope Use
This model is not designed for tasks outside of resume section classification, such as general text classification or Named Entity Recognition (NER) in non-resume texts.
Bias, Risks, and Limitations
The model's performance is dependent on the quality and diversity of the training data. It may not perform well on resumes that differ significantly from the training examples. Additionally, the model may have biases based on the dataset it was trained on.
Recommendations
Users should be aware of the model's limitations and biases. It is recommended to evaluate the model's performance on a diverse set of resumes before deploying it in production environments.
How to Get Started with the Model
- https://github.com/ssobii2/Wozify-CV-Parser
- Checks Spacy's Website
Training Details
Training Data
The model was trained on the ganchengguang/resume_seven_class
dataset, which contains examples of various resume sections.
Training Procedure
The model was fine-tuned using spaCy's text classification component. The training involved the following steps:
- Data preprocessing: Tokenization and vectorization of resume text.
- Model training: Fine-tuning the
spacy/en_core_web_md
model on the preprocessed data. - Evaluation: Assessing the model's performance on a validation set.
Preprocessing
The text data was cleaned by removing special characters, normalizing whitespace, and converting text to lowercase. Tokenization was performed using spaCy's tokenizer.
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a separate test set from the ganchengguang/resume_seven_class
dataset, containing examples of resume sections not seen during training.
Factors
The evaluation considered factors such as resume length, formatting, and the presence of uncommon sections.
Metrics
The model's performance was measured using accuracy, precision, recall, and F1-score.
Results
The model achieved the following results on the test set:
Text Categorization Model Performance Metrics
Summary Section
- Precision: 88.4%
- Recall: 89.8%
- F1-score: 89.1%
Profile Section
- Precision: 95.2%
- Recall: 88.3%
- F1-score: 91.6%
Education Section
- Precision: 93.2%
- Recall: 90.5%
- F1-score: 91.9%
Experience Section
- Precision: 78.8%
- Recall: 82.5%
- F1-score: 80.6%
Skills Section
- Precision: 88.5%
- Recall: 88.5%
- F1-score: 88.5%
Overall Model Performance
- Micro Precision: 88.3%
- Micro Recall: 87.7%
- Micro F1-score: 88.0%
- Macro Precision: 88.8%
- Macro Recall: 87.9%
- Macro F1-score: 88.3%
- Macro AUC: 97.8%
Summary
The model performs best on Education and Profile sections, while the Experience section has relatively lower performance metrics. The Skills section shows balanced precision and recall.
Technical Specifications
Model Architecture and Objective
The model is based on spaCy's text classification component, utilizing the spacy/en_core_web_md
base model. The objective is to classify resume sections into predefined categories.
Compute Infrastructure
The model was trained on my personal gaming laptop. The config file can be found inside the model folder.
Hardware
- Intel Core-i7-13620H
- 16GB RAM
- RTX 4070 Laptop GPU 8GB VRAM
Software
- Operating System: Windows 11
- Libraries: spaCy
Model tree for ThunderJaw/en_textcat_resume_sections
Base model
spacy/en_core_web_md