You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Mistral-7B-WikiFineTuned

This project involves fine-tuning the Mistral-7B-Instruct model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output.

Model Description

  • Base Model: Mistral-7B
  • Fine-Tuned on: Wikitext-103-raw-v1
  • Purpose: The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output.
  • License: MIT

How to Use

To use this model, you can load it with the Hugging Face transformers library. Below is a basic example of how to use the model for text generation:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned")

# Load the model
model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned", 
                                             device_map="auto",
                                             load_in_4bit=True)

# Create the pipeline
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)

# Generate text
prompt = "The future of AI is"
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])

Inference API

You can also use the model directly via the Hugging Face Inference API:

import requests

API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned"
headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({"inputs": "The future of AI is"})
print(output)

Training Details

  • Framework Used: PyTorch
  • Optimization Techniques:
    • 4-bit quantization using bitsandbytes to reduce memory usage.
    • Training accelerated using peft and accelerate.

Dataset

The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets.

Training Configuration

  • Learning Rate: 2e-4
  • Batch Size: 4 (with gradient accumulation)
  • Max Steps: 125 (for demonstration; should ideally be higher, e.g., 1000)
  • Optimizer: Paged AdamW (32-bit)
  • Evaluation Strategy: Evaluation every 25 steps
  • PEFT Configuration: LoRA with 8 ranks and dropout of 0.1

Hyperparameters

  • Learning Rate: 2e-4
  • Batch Size: 4
  • Max Steps: 125 (demo)

Evaluation

The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training.

Limitations and Biases

While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications.

License

This model is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or issues, please contact [email protected].

Downloads last month
4
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.