File size: 4,597 Bytes
6bdb9c8 c947598 6bdb9c8 c947598 6bdb9c8 c947598 6bdb9c8 c947598 6bdb9c8 c947598 57d8910 c947598 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 |
---
datasets:
- theeseus-ai/RiskClassifier
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
# RiskClassifier: Fine-Tuned LLaMA 3.1 8B Model
## Model Summary
**RiskClassifier** is a fine-tuned version of the **meta-llama/Llama-3.1-8B-Instruct** model, designed to evaluate risk levels across diverse scenarios using structured critical thinking. It is fine-tuned on the **theeseus-ai/RiskClassifier** dataset, which focuses on assessing and labeling risk scores while maintaining detailed reasoning explanations. This model is optimized for tasks requiring risk classification, fraud detection, and analytical reasoning.
## Model Details
- **Base Model**: [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)
- **Fine-tuned Dataset**: [theeseus-ai/RiskClassifier](https://huggingface.co/datasets/theeseus-ai/RiskClassifier)
- **Model Size**: 8 Billion Parameters
- **Language**: English
- **License**: Apache 2.0
- **Use Case**: Risk assessment, fraud detection, critical thinking tasks
## Dataset Information
The **RiskClassifier** dataset provides structured scenarios with:
- **Context**: A description of the event requiring analysis.
- **Query**: A critical-thinking question tied to the scenario.
- **Answers**: Four risk level options ("Low risk," "Moderate risk," "High risk," "Very high risk").
- **Risk Score**: A numeric value (0–100) representing the raw risk assessment.
- **Conversations**: Reformatted data in ShareGPT-style conversation format to train the model for reasoning and structured responses.
Example Reformatted Output:
```
{
"context": "A customer used a credit card in a high-fraud region for a large purchase.",
"query": "What is the risk level of this transaction?",
"answers": ["Low risk", "Moderate risk", "High risk", "Very high risk"],
"risk_score": 85,
"conversations": [
{"role": "system", "content": "You are a helpful AI that assesses risk levels and provides explanations."},
{"role": "user", "content": "Context: A customer used a credit card in a high-fraud region for a large purchase.\nQuestion: What is the risk level of this transaction?\nAnswers: [Low risk, Moderate risk, High risk, Very high risk]"},
{"role": "assistant", "content": "Risk Level: Very high risk (Score: 85)"}
]
}
```
## Intended Use
### Applications
- **Fraud Detection**: Evaluating suspicious transactions and identifying high-risk activities.
- **Risk Analysis**: Assessing scenarios with probabilistic evaluations for financial and operational decisions.
- **Critical Thinking Tasks**: Enhancing AI's ability to reason about uncertainty and complex situations.
- **Educational Tools**: Training AI systems to provide explanations for risk assessments.
### Limitations
- **Context Dependency**: Accuracy may degrade with ambiguous or incomplete context.
- **Bias Risk**: Outputs may inherit biases present in training data; manual review is advised for high-impact decisions.
- **Numeric Risk Scores**: The numerical scores may require post-processing to fit domain-specific thresholds.
## How to Use
### Example Code:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "theeseus-ai/RiskClassifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
inputs = tokenizer("Context: A large transaction flagged for manual review.\nQuestion: What is the risk level?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
```
## Evaluation Metrics
- **Accuracy**: Verified predictions against labeled risk levels.
- **Reasoning Completeness**: Evaluated explanations for clarity and alignment with context.
- **Risk Score Consistency**: Checked correlation between numeric risk scores and label predictions.
## Training Configuration
- **Optimizer**: AdamW
- **Batch Size**: 32
- **Learning Rate**: 2e-5
- **Epochs**: 3
- **Hardware**: NVIDIA A100 GPUs
- **Precision**: bf16 mixed precision
## Environmental Impact
- **Hardware**: NVIDIA A100 GPUs
- **Training Hours**: ~2 hours
- **Carbon Emissions**: Estimated using [ML CO2 Calculator](https://mlco2.github.io/impact)
## Citation
```
@misc{RiskClassifier2024,
title={RiskClassifier: Fine-Tuned LLaMA 3.1 8B Model for Risk Assessment},
author={Theeseus AI},
year={2024},
howpublished={\url{https://huggingface.co/theeseus-ai/RiskClassifier}}
}
```
## Contact
For inquiries, please reach out to **[email protected]** or visit [LinkedIn](https://www.linkedin.com/in/theeseus).
|