|
--- |
|
tags: |
|
- ctranslate2 |
|
- int8 |
|
- float16 |
|
|
|
license: apache-2.0 |
|
--- |
|
# # Fast-Inference with Ctranslate2 |
|
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU. |
|
|
|
quantized version of [openllmplayground/openalpaca_7b_700bt_preview](https://huggingface.co/openllmplayground/openalpaca_7b_700bt_preview) |
|
```bash |
|
pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0 |
|
``` |
|
Converted on 2023-06-02 using |
|
``` |
|
ct2-transformers-converter --model openllmplayground/openalpaca_7b_700bt_preview --output_dir /home/michael/tmp-ct2fast-openalpaca_7b_700bt_preview --force --copy_files README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code |
|
``` |
|
|
|
Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2) |
|
and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2) |
|
- `compute_type=int8_float16` for `device="cuda"` |
|
- `compute_type=int8` for `device="cpu"` |
|
|
|
```python |
|
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub |
|
from transformers import AutoTokenizer |
|
|
|
model_name = "michaelfeil/ct2fast-openalpaca_7b_700bt_preview" |
|
# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model. |
|
model = GeneratorCT2fromHfHub( |
|
# load in int8 on CUDA |
|
model_name_or_path=model_name, |
|
device="cuda", |
|
compute_type="int8_float16", |
|
# tokenizer=AutoTokenizer.from_pretrained("openllmplayground/openalpaca_7b_700bt_preview") |
|
) |
|
outputs = model.generate( |
|
text=["def fibonnaci(", "User: How are you doing? Bot:"], |
|
max_length=64, |
|
include_prompt_in_result=False |
|
) |
|
print(outputs) |
|
``` |
|
|
|
# Licence and other remarks: |
|
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo. |
|
|
|
# Original description |
|
|
|
|
|
# OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA |
|
|
|
In this repo, we release a permissively licensed open-source instruction-following model based on [OpenLLaMA](https://github.com/openlm-research/open_llama). In this release, we release a public preview of the 7B OpenAlpaca model based on [the previewed version of OpenLLaMA](https://huggingface.co/openlm-research/open_llama_7b_700bt_preview) that is a 7B model trained with 700 billion tokens. We provide PyTorch weights of OpenAlpaca. Stay tuned for our forthcoming updates! |
|
|
|
**[Project Page]** [(https://github.com/yxuansu/OpenAlpaca)](https://github.com/yxuansu/OpenAlpaca) |
|
|
|
# Dataset and Training |
|
|
|
We train our model on the [dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) released by Databricks. The training configurations are provided in the table below. The training takes on 8 x A100(40G) GPUs and lasts for around 30 minutes. |
|
|
|
||| |
|
|:-------------:|:-------------:| |
|
|**Batch Size**|64| |
|
|**Learning rate**|2e-5| |
|
|**Epochs**|3| |
|
|**Max length**|1024| |
|
|
|
|
|
|
|
# Example Usage |
|
|
|
Below shows an example on how to use OpenAlpaca |
|
|
|
```python |
|
import torch |
|
from transformers import LlamaForCausalLM, LlamaTokenizer |
|
|
|
# the previewed version of OpenAlpaca |
|
model_path = r'openllmplayground/openalpaca_7b_700bt_preview' |
|
tokenizer = LlamaTokenizer.from_pretrained(model_path) |
|
model = LlamaForCausalLM.from_pretrained(model_path).cuda() |
|
tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage |
|
|
|
# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html |
|
instruction = r'What is an alpaca? How is it different from a llama?' |
|
''' |
|
instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.' |
|
instruction = r'What is the capital of Tanzania?' |
|
instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.' |
|
''' |
|
|
|
prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:' |
|
tokens = tokenizer.encode(prompt_no_input) |
|
|
|
tokens = torch.LongTensor(tokens).unsqueeze(0) |
|
instance = {'input_ids': tokens, |
|
'top_k': 50, |
|
'top_p': 0.9, |
|
'generate_len': 128} |
|
|
|
length = len(tokens[0]) |
|
with torch.no_grad(): |
|
rest = model.generate( |
|
input_ids=tokens, |
|
max_length=length+instance['generate_len'], |
|
use_cache=True, |
|
do_sample=True, |
|
top_p=instance['top_p'], |
|
top_k=instance['top_k'] |
|
) |
|
|
|
output = rest[0][length:] |
|
string = tokenizer.decode(output, skip_special_tokens=True) |
|
print(f'[!] Generation results: {string}') |
|
``` |
|
|
|
# License and Usage |
|
|
|
OpenAlpaca is permissively licensed under the Apache 2.0 license and can be used freely for academic/commercial purposes. |
|
|
|
|
|
# Contact |
|
We would love to get feedback from the community. If you have any questions, please open an issue or contact us. |
|
|
|
OpenAlpaca is developed by: [Yixuan Su](https://yxuansu.github.io/)<sup>\*</sup>, [Tian Lan](https://github.com/gmftbyGMFTBY)<sup>\*</sup>, and [Deng Cai](https://jcyk.github.io/) (The first two members<sup>\*</sup> contributed equally.) |
|
|
|
# Reference: |
|
|
|
If you found OpenAlpaca useful in your research or applications, please kindly cite using the following BibTeX: |
|
``` |
|
@misc{openalpaca, |
|
author = {Yixuan Su and Tian Lan and Deng Cai}, |
|
title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA}, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}}, |
|
} |
|
``` |
|
``` |
|
@software{openlm2023openllama, |
|
author = {Xinyang Geng and Hao Liu}, |
|
title = {OpenLLaMA: An Open Reproduction of LLaMA}, |
|
month = May, |
|
year = 2023, |
|
url = {https://github.com/openlm-research/open_llama} |
|
} |
|
``` |
|
``` |
|
@misc{alpaca, |
|
author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto }, |
|
title = {Stanford Alpaca: An Instruction-following LLaMA model}, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}}, |
|
} |
|
``` |
|
``` |
|
@article{touvron2023llama, |
|
title={Llama: Open and efficient foundation language models}, |
|
author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie{-}Anne Lachaux and Timoth{\'{e}}e Lacroix and Baptiste Rozi{\`{e}}re and Naman Goyal and Eric Hambro and Faisal Azhar and Aur{\'{e}}lien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample}, |
|
journal={arXiv preprint arXiv:2302.13971}, |
|
year={2023} |
|
} |
|
``` |
|
|