shellwork/ChatParts-qwen2.5-14b
🤖 XJTLU-Software RAG GitHub Repository • 📊 ChatParts Dataset
shellwork/ChatParts-qwen2.5-14b is a specialized dialogue model fine-tuned from Qwen2.5-14B-Instruct by the XJTLU-Software iGEM Competition team. This model is tailored for the synthetic biology domain, aiming to assist competition participants and researchers in efficiently collecting and organizing relevant information. It serves as the local model component of the XJTLU-developed Retrieval-Augmented Generation (RAG) software, enhancing search and summarization capabilities within synthetic biology data.
📚 Dataset Information
The model is trained on a comprehensive synthetic biology-specific dataset curated from multiple authoritative sources:
- iGEM Wiki Pages (2004-2023): Comprehensive coverage of synthetic biology topics from over two decades of iGEM competitions.
- Synthetic Biology Review Papers: More than 1,000 high-quality review articles providing in-depth insights into various aspects of synthetic biology.
- iGEM Parts Registry Documentation: Detailed documentation of parts used in iGEM projects, facilitating accurate information retrieval.
In total, the dataset comprises over 200,000 question-answer pairs, meticulously assembled to cover a wide spectrum of synthetic biology topics. For more detailed information about the dataset, please visit our training data repository.
🛠️ How to Use
This repository supports usage with the transformers
library. Below is a straightforward example of how to deploy the shellwork/ChatParts-qwen2.5-14b model using transformers
.
📋 Requirements
Transformers Library: Ensure you have
transformers
version >= 4.43.0 installed. You can update your installation using:pip install --upgrade transformers
⚙️ Example: Deploying with Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the tokenizer and model
model_name = "shellwork/ChatParts-qwen2.5-14b"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Define the prompt and messages
prompt = "Give me a short introduction to synthetic biology."
messages = [
{"role": "system", "content": "You are ChatParts, a model specialized in synthetic biology created by XJTLU-Software."},
{"role": "user", "content": prompt}
]
# Apply chat template
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
# Tokenize the input
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# Generate the response
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512
)
# Extract the generated tokens
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
# Decode the response
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
🔍 Explanation
Import Libraries: Import the necessary libraries including
torch
,modelscope
, andtransformers
.Load Model and Tokenizer: Use
AutoModelForCausalLM
andAutoTokenizer
frommodelscope
to load the pre-trained model and tokenizer.Define Prompt and Messages: Create a prompt and define the conversation messages, including system and user roles.
Apply Chat Template: Utilize the
apply_chat_template
method to format the messages appropriately for the model.Tokenize Input: Tokenize the formatted text and move it to the appropriate device (CPU/GPU).
Generate Response: Use the
generate
method to produce a response with a specified maximum number of new tokens.Decode and Print: Decode the generated tokens to obtain the final text response and print it.
📄 License
This model is released under the Apache License 2.0. For more details, please refer to the license information in the repository.
🔗 Additional Resources
- RAG Software: Explore the full capabilities of our Retrieval-Augmented Generation software here.
- Training Data: Access and review the extensive training dataset here.
- Support & Contributions: For support or to contribute to the project, visit our GitHub Issues page.
Feel free to reach out through our GitHub repository for any questions, issues, or contributions related to shellwork/ChatParts-qwen2.5-14b.
- Downloads last month
- 4