YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
---
title: "GPTWithMoE + MCTS for Text Generation"
summary: "A GPT model enhanced with Mixture of Experts and FlashAttention, incorporating Monte Carlo Tree Search (MCTS) for controlled text generation."
tags:
- text-generation
- gpt
- mixture-of-experts
- mcts
- flashattention
license: "Apache-2.0"
datasets:
- finewebedu
library_name: "pytorch"
language: "en"
---
---
title: "GPTWithMoE + MCTS for Text Generation"
summary: "A GPT model enhanced with Mixture of Experts and FlashAttention, incorporating Monte Carlo Tree Search (MCTS) for controlled text generation."
tags:
- text-generation
- gpt
- mixture-of-experts
- mcts
- flashattention
license: "Apache-2.0"
datasets:
- finewebedu
library_name: "pytorch"
language: "en"
---
# **GPTWithMoE + MCTS for Text Generation**
## Model Summary
This model is a custom implementation of GPT enhanced with a Mixture of Experts (MoE) architecture and FlashAttention for efficient computation. The model incorporates Monte Carlo Tree Search (MCTS) for decoding, making it suitable for tasks that require controlled and exploratory text generation.
The model was trained on the **FinewebEdu dataset**, achieving a training loss of **1.579923** and a validation loss of **7.792485**.
### Key Features
- **Mixture of Experts (MoE)**: Dynamically selects the most relevant experts for each input, improving efficiency and specialization.
- **FlashAttention**: Optimized attention mechanism for long-sequence processing.
- **MCTS Decoding**: Uses Monte Carlo Tree Search to explore possible outputs, providing fine-grained control over text generation.
- **Custom Configurations**:
- 6 Transformer layers
- 4 attention heads
- Embedding dimension: 256
- Block size: 512 tokens
---
## Intended Use
This model is designed for text generation tasks such as:
- Story generation
- Dialogue systems
- Content creation with controlled exploration
---
#### Load Files from the Repository
You can use the `from_pretrained` method to load specific files or weights:
```python
from transformers import AutoTokenizer, AutoModel
import torch
# Load the tokenizer (if applicable)
tokenizer = AutoTokenizer.from_pretrained("RobbiePasquale/gpt-moe-mcts")
# Load model weights
model = torch.hub.load("RobbiePasquale/gpt-moe-mcts", "GPTWithMoE")
# Use tokenizer and model
prompt = "Once upon a time in a distant galaxy,"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
# Generate output (if model is compatible with Hugging Face architecture)
output = model.generate(input_ids, max_length=50)
print(tokenizer.decode(output[0]))
Using huggingface_hub
for Direct Access to Files
The huggingface_hub
library allows users to download individual files or clone the repository.
1. Install the Library
pip install huggingface_hub
2. Clone the Repository
You can clone the repository to access all files in the directory:
from huggingface_hub import snapshot_download
# Download repository files
repo_path = snapshot_download(repo_id="RobbiePasquale/gpt-moe-mcts")
print(f"Repository downloaded to {repo_path}")
This will download all files from the repository, including:
moe_mcts_new.pt
q_star.py
mcts_text_gen.py
The files will be saved locally in a directory structure matching the repository.
3. Load Specific Files
To load individual files programmatically:
from huggingface_hub import hf_hub_download
# Download specific file
weights_path = hf_hub_download(repo_id="RobbiePasquale/gpt-moe-mcts", filename="moe_mcts_new.pt")
print(f"Downloaded weights to {weights_path}")
Command-Line Access
You can also clone the repository using Git:
git lfs install
git clone https://huggingface.co/RobbiePasquale/gpt-moe-mcts
How to Use
Installation
Ensure you have the following libraries installed:
- PyTorch (
pip install torch
) - Transformers (
pip install transformers
)
Usage Instructions
Upload the following three files to your working directory:
moe_mcts_new.pt
: The pre-trained weights.q_star.py
: Model definition, training, and validation logic.mcts_text_gen.py
: Script for MCTS-based text generation.
Example: Generating Text
Load the weights and model configuration:
from q_star import GPTConfig, GPTWithMoE from transformers import GPT2Tokenizer import torch device = "cuda" if torch.cuda.is_available() else "cpu" tokenizer = GPT2Tokenizer.from_pretrained("gpt2") tokenizer.pad_token = tokenizer.eos_token config = GPTConfig(vocab_size=50304, block_size=512, n_layer=6, n_head=4, n_embd=256) model = GPTWithMoE(config, num_experts=3, expert_layers=3, block_size_q=32, block_size_kv=32, num_blocks_kv=4, device=device) model.load_state_dict(torch.load("moe_mcts_new.pt", map_location=device)) model.eval()
Use the
mcts_text_gen.py
script for text generation:from mcts_text_gen import generate_text_with_mcts prompt = "Once upon a time in a distant galaxy," generated_text = generate_text_with_mcts( model=model, tokenizer=tokenizer, prompt=prompt, max_length=100, num_simulations=50, c_puct=1.5, top_k=5, device=device, ) print("Generated Text:") print(generated_text)
Training Details
- Dataset: FinewebEdu
- Batch Size: 16
- Sequence Length: 512
- Optimizer: AdamW with weight decay
- Learning Rate: 3e-3
- Gradient Accumulation: Adapted for a total batch size of 262144 tokens.
Loss Metrics
- Training Loss: 1.579923
- Validation Loss: 7.792485
Limitations
- The model may struggle with highly domain-specific language not represented in FinewebEdu.
- Computationally intensive due to MCTS decoding.
Citation
If you use this model, please cite:
@article{gptmoe_mcts,
title={GPTWithMoE + MCTS for Controlled Text Generation},
author={Robbie Pasquale},
year={2024}
}