Model Card for Model ID
Pretrained the Llama3.2-1B model with Tamil text from uonlp/CulturaX.
Model Details
Model Description
The Tamil LLaMA models have been enhanced and tailored specifically with an extensive Tamil vocabulary of 20,000 tokens, building upon the foundation set by the original LLaMA-3.2. This is very similar to abhinand/tamil-llama-7b-base-v0.1.
- Developed by: Mohan Parthasarathy
- Funded by [optional]: Self
- Shared by [optional]: Self
- Model type: Pretrained model
- Language(s) (NLP): Tamil
- License: Apache 2.0
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
model = AutoPeftModelForCausalLM.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")
tokenizer = AutoTokenizer.from_pretrained("suruti94/llama-3.21B-tamil-base-0.2")
Training Details
This follows the steps described in https://arxiv.org/pdf/2311.05845.
- A new tokenizer is built using Sentencpiece by sampling 1 million documents 4.7 million documents from uonlp/Cultura-X
- Model was trained with Tamil data from uonlp/Cultura-X using bfloat16. Original model was loaded in 8 bits using Lora.
Training Data
https://huggingface.co/datasets/uonlp/CulturaX/tree/main/ta
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: bfloat16, 8 bits Lora
Speeds, Sizes, Times [optional]
epoch = 0.9995 total_flos = 516494476GF train_loss = 2.7275 train_runtime = 3:37:51.19 train_samples = 70222 train_samples_per_second = 5.372 train_steps_per_second = 0.084
Evaluation
epoch = 0.9995
eval_accuracy = 0.5318
eval_loss = 2.2674
eval_runtime = 0:05:49.70
eval_samples = 7803
eval_samples_per_second = 22.313
eval_steps_per_second = 2.791
perplexity = 9.6547
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: 1x A100 SXM4
- Hours used: 3:38
- Cloud Provider: vast.ai
- Compute Region: US
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
Model tree for suruti94/llama-3.21B-tamil-base-0.2
Base model
meta-llama/Llama-3.2-1B