michaelfeil commited on
Commit
a8b5979
·
1 Parent(s): de97112

Upload openllmplayground/openalpaca_7b_700bt_preview ctranslate fp16 weights

Browse files
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ctranslate2
4
+ - int8
5
+ - float16
6
+
7
+ license: apache-2.0
8
+ ---
9
+ # # Fast-Inference with Ctranslate2
10
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
11
+
12
+ quantized version of [openllmplayground/openalpaca_7b_700bt_preview](https://huggingface.co/openllmplayground/openalpaca_7b_700bt_preview)
13
+ ```bash
14
+ pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0
15
+ ```
16
+ Converted on 2023-06-02 using
17
+ ```
18
+ ct2-transformers-converter --model openllmplayground/openalpaca_7b_700bt_preview --output_dir /home/michael/tmp-ct2fast-openalpaca_7b_700bt_preview --force --copy_files README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
19
+ ```
20
+
21
+ Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2)
22
+ and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2)
23
+ - `compute_type=int8_float16` for `device="cuda"`
24
+ - `compute_type=int8` for `device="cpu"`
25
+
26
+ ```python
27
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
28
+ from transformers import AutoTokenizer
29
+
30
+ model_name = "michaelfeil/ct2fast-openalpaca_7b_700bt_preview"
31
+ # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
32
+ model = GeneratorCT2fromHfHub(
33
+ # load in int8 on CUDA
34
+ model_name_or_path=model_name,
35
+ device="cuda",
36
+ compute_type="int8_float16",
37
+ # tokenizer=AutoTokenizer.from_pretrained("openllmplayground/openalpaca_7b_700bt_preview")
38
+ )
39
+ outputs = model.generate(
40
+ text=["def fibonnaci(", "User: How are you doing? Bot:"],
41
+ max_length=64,
42
+ include_prompt_in_result=False
43
+ )
44
+ print(outputs)
45
+ ```
46
+
47
+ # Licence and other remarks:
48
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
49
+
50
+ # Original description
51
+
52
+
53
+ # OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA
54
+
55
+ In this repo, we release a permissively licensed open-source instruction-following model based on [OpenLLaMA](https://github.com/openlm-research/open_llama). In this release, we release a public preview of the 7B OpenAlpaca model based on [the previewed version of OpenLLaMA](https://huggingface.co/openlm-research/open_llama_7b_700bt_preview) that is a 7B model trained with 700 billion tokens. We provide PyTorch weights of OpenAlpaca. Stay tuned for our forthcoming updates!
56
+
57
+ **[Project Page]** [(https://github.com/yxuansu/OpenAlpaca)](https://github.com/yxuansu/OpenAlpaca)
58
+
59
+ # Dataset and Training
60
+
61
+ We train our model on the [dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) released by Databricks. The training configurations are provided in the table below. The training takes on 8 x A100(40G) GPUs and lasts for around 30 minutes.
62
+
63
+ |||
64
+ |:-------------:|:-------------:|
65
+ |**Batch Size**|64|
66
+ |**Learning rate**|2e-5|
67
+ |**Epochs**|3|
68
+ |**Max length**|1024|
69
+
70
+
71
+
72
+ # Example Usage
73
+
74
+ Below shows an example on how to use OpenAlpaca
75
+
76
+ ```python
77
+ import torch
78
+ from transformers import LlamaForCausalLM, LlamaTokenizer
79
+
80
+ # the previewed version of OpenAlpaca
81
+ model_path = r'openllmplayground/openalpaca_7b_700bt_preview'
82
+ tokenizer = LlamaTokenizer.from_pretrained(model_path)
83
+ model = LlamaForCausalLM.from_pretrained(model_path).cuda()
84
+ tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage
85
+
86
+ # same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
87
+ instruction = r'What is an alpaca? How is it different from a llama?'
88
+ '''
89
+ instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
90
+ instruction = r'What is the capital of Tanzania?'
91
+ instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'
92
+ '''
93
+
94
+ prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
95
+ tokens = tokenizer.encode(prompt_no_input)
96
+
97
+ tokens = torch.LongTensor(tokens).unsqueeze(0)
98
+ instance = {'input_ids': tokens,
99
+ 'top_k': 50,
100
+ 'top_p': 0.9,
101
+ 'generate_len': 128}
102
+
103
+ length = len(tokens[0])
104
+ with torch.no_grad():
105
+ rest = model.generate(
106
+ input_ids=tokens,
107
+ max_length=length+instance['generate_len'],
108
+ use_cache=True,
109
+ do_sample=True,
110
+ top_p=instance['top_p'],
111
+ top_k=instance['top_k']
112
+ )
113
+
114
+ output = rest[0][length:]
115
+ string = tokenizer.decode(output, skip_special_tokens=True)
116
+ print(f'[!] Generation results: {string}')
117
+ ```
118
+
119
+ # License and Usage
120
+
121
+ OpenAlpaca is permissively licensed under the Apache 2.0 license and can be used freely for academic/commercial purposes.
122
+
123
+
124
+ # Contact
125
+ We would love to get feedback from the community. If you have any questions, please open an issue or contact us.
126
+
127
+ OpenAlpaca is developed by: [Yixuan Su](https://yxuansu.github.io/)<sup>\*</sup>, [Tian Lan](https://github.com/gmftbyGMFTBY)<sup>\*</sup>, and [Deng Cai](https://jcyk.github.io/) (The first two members<sup>\*</sup> contributed equally.)
128
+
129
+ # Reference:
130
+
131
+ If you found OpenAlpaca useful in your research or applications, please kindly cite using the following BibTeX:
132
+ ```
133
+ @misc{openalpaca,
134
+ author = {Yixuan Su and Tian Lan and Deng Cai},
135
+ title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
136
+ year = {2023},
137
+ publisher = {GitHub},
138
+ journal = {GitHub repository},
139
+ howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
140
+ }
141
+ ```
142
+ ```
143
+ @software{openlm2023openllama,
144
+ author = {Xinyang Geng and Hao Liu},
145
+ title = {OpenLLaMA: An Open Reproduction of LLaMA},
146
+ month = May,
147
+ year = 2023,
148
+ url = {https://github.com/openlm-research/open_llama}
149
+ }
150
+ ```
151
+ ```
152
+ @misc{alpaca,
153
+ author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
154
+ title = {Stanford Alpaca: An Instruction-following LLaMA model},
155
+ year = {2023},
156
+ publisher = {GitHub},
157
+ journal = {GitHub repository},
158
+ howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
159
+ }
160
+ ```
161
+ ```
162
+ @article{touvron2023llama,
163
+ title={Llama: Open and efficient foundation language models},
164
+ author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie{-}Anne Lachaux and Timoth{\'{e}}e Lacroix and Baptiste Rozi{\`{e}}re and Naman Goyal and Eric Hambro and Faisal Azhar and Aur{\'{e}}lien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
165
+ journal={arXiv preprint arXiv:2302.13971},
166
+ year={2023}
167
+ }
168
+ ```
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "unk_token": ""
5
+ }
generation_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "pad_token_id": 0,
6
+ "transformers_version": "4.29.1"
7
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:801a0db6049886245e7b4cd0c55e68b862f5069db423ea686f93fc85df9c45e7
3
+ size 6744405708
special_tokens_map.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "pad_token": "</s>",
5
+ "unk_token": {
6
+ "content": "",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ }
12
+ }
tokenizer_config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "clean_up_tokenization_spaces": false,
13
+ "eos_token": {
14
+ "__type": "AddedToken",
15
+ "content": "",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false
20
+ },
21
+ "model_max_length": 1000000000000000019884624838656,
22
+ "pad_token": null,
23
+ "sp_model_kwargs": {},
24
+ "tokenizer_class": "LlamaTokenizer",
25
+ "unk_token": {
26
+ "__type": "AddedToken",
27
+ "content": "",
28
+ "lstrip": false,
29
+ "normalized": true,
30
+ "rstrip": false,
31
+ "single_word": false
32
+ }
33
+ }
vocabulary.txt ADDED
The diff for this file is too large to render. See raw diff