Upload openllmplayground/openalpaca_7b_700bt_preview ctranslate fp16 weights

Browse files

Files changed (7) hide show

README.md +168 -0
config.json +5 -0
generation_config.json +7 -0
model.bin +3 -0
special_tokens_map.json +12 -0
tokenizer_config.json +33 -0
vocabulary.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+---
+tags:
+- ctranslate2
+- int8
+- float16
+license: apache-2.0
+---
+# # Fast-Inference with Ctranslate2
+Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
+quantized version of [openllmplayground/openalpaca_7b_700bt_preview](https://huggingface.co/openllmplayground/openalpaca_7b_700bt_preview)
+```bash
+pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0
+```
+Converted on 2023-06-02 using
+```
+ct2-transformers-converter --model openllmplayground/openalpaca_7b_700bt_preview --output_dir /home/michael/tmp-ct2fast-openalpaca_7b_700bt_preview --force --copy_files README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization int8_float16 --trust_remote_code
+```
+Checkpoint compatible to [ctranslate2>=3.14.0](https://github.com/OpenNMT/CTranslate2)
+and [hf-hub-ctranslate2>=2.0.8](https://github.com/michaelfeil/hf-hub-ctranslate2)
+- `compute_type=int8_float16` for `device="cuda"`
+- `compute_type=int8`  for `device="cpu"`
+```python
+from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
+from transformers import AutoTokenizer
+model_name = "michaelfeil/ct2fast-openalpaca_7b_700bt_preview"
+# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
+model = GeneratorCT2fromHfHub(
+        # load in int8 on CUDA
+        model_name_or_path=model_name,
+        device="cuda",
+        compute_type="int8_float16",
+        # tokenizer=AutoTokenizer.from_pretrained("openllmplayground/openalpaca_7b_700bt_preview")
+)
+outputs = model.generate(
+    text=["def fibonnaci(", "User: How are you doing? Bot:"],
+    max_length=64,
+    include_prompt_in_result=False
+)
+print(outputs)
+```
+# Licence and other remarks:
+This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
+# Original description
+# OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA
+In this repo, we release a permissively licensed open-source instruction-following model based on [OpenLLaMA](https://github.com/openlm-research/open_llama). In this release, we release a public preview of the 7B OpenAlpaca model based on [the previewed version of OpenLLaMA](https://huggingface.co/openlm-research/open_llama_7b_700bt_preview) that is a 7B model trained with 700 billion tokens. We provide PyTorch weights of OpenAlpaca. Stay tuned for our forthcoming updates!
+**[Project Page]** [(https://github.com/yxuansu/OpenAlpaca)](https://github.com/yxuansu/OpenAlpaca)
+# Dataset and Training
+We train our model on the [dolly 15k dataset](https://huggingface.co/datasets/databricks/databricks-dolly-15k) released by Databricks. The training configurations are provided in the table below. The training takes on 8 x A100(40G) GPUs and lasts for around 30 minutes.
+|||
+|:-------------:|:-------------:|
+|**Batch Size**|64|
+|**Learning rate**|2e-5|
+|**Epochs**|3|
+|**Max length**|1024|
+# Example Usage
+Below shows an example on how to use OpenAlpaca
+```python
+import torch
+from transformers import LlamaForCausalLM, LlamaTokenizer
+# the previewed version of OpenAlpaca
+model_path = r'openllmplayground/openalpaca_7b_700bt_preview'
+tokenizer = LlamaTokenizer.from_pretrained(model_path)
+model = LlamaForCausalLM.from_pretrained(model_path).cuda()
+tokenizer.bos_token_id, tokenizer.eos_token_id = 1,2 # see https://github.com/openlm-research/open_llama#preview-weights-release-and-usage
+# same prompt as provided in https://crfm.stanford.edu/2023/03/13/alpaca.html
+instruction = r'What is an alpaca? How is it different from a llama?'
+'''
+instruction = r'Write an e-mail to congratulate new Standford admits and mention that you are excited about meeting all of them in person.'
+instruction = r'What is the capital of Tanzania?'
+instruction = r'Write a well-thought out abstract for a machine learning paper that proves that 42 is the optimal seed for training neural networks.'
+'''
+prompt_no_input = f'Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:'
+tokens = tokenizer.encode(prompt_no_input)
+tokens = torch.LongTensor(tokens).unsqueeze(0)
+instance = {'input_ids': tokens,
+                    'top_k': 50,
+                    'top_p': 0.9,
+                    'generate_len': 128}
+length = len(tokens[0])
+with torch.no_grad():
+    rest = model.generate(
+            input_ids=tokens,
+            max_length=length+instance['generate_len'],
+            use_cache=True,
+            do_sample=True,
+            top_p=instance['top_p'],
+            top_k=instance['top_k']
+        )
+output = rest[0][length:]
+string = tokenizer.decode(output, skip_special_tokens=True)
+print(f'[!] Generation results: {string}')
+```
+# License and Usage
+OpenAlpaca is permissively licensed under the Apache 2.0 license and can be used freely for academic/commercial purposes.
+# Contact
+We would love to get feedback from the community. If you have any questions, please open an issue or contact us.
+OpenAlpaca is developed by: [Yixuan Su](https://yxuansu.github.io/)<sup>\*</sup>, [Tian Lan](https://github.com/gmftbyGMFTBY)<sup>\*</sup>, and [Deng Cai](https://jcyk.github.io/) (The first two members<sup>\*</sup> contributed equally.)
+# Reference:
+If you found OpenAlpaca useful in your research or applications, please kindly cite using the following BibTeX:
+```
+@misc{openalpaca,
+  author = {Yixuan Su and Tian Lan and Deng Cai},
+  title = {OpenAlpaca: A Fully Open-Source Instruction-Following Model Based On OpenLLaMA},
+  year = {2023},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/yxuansu/OpenAlpaca}},
+}
+```
+```
+@software{openlm2023openllama,
+  author = {Xinyang Geng and Hao Liu},
+  title = {OpenLLaMA: An Open Reproduction of LLaMA},
+  month = May,
+  year = 2023,
+  url = {https://github.com/openlm-research/open_llama}
+}
+```
+```
+@misc{alpaca,
+  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
+  title = {Stanford Alpaca: An Instruction-following LLaMA model},
+  year = {2023},
+  publisher = {GitHub},
+  journal = {GitHub repository},
+  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
+}
+```
+```
+@article{touvron2023llama,
+  title={Llama: Open and efficient foundation language models},
+  author={Hugo Touvron and Thibaut Lavril and Gautier Izacard and Xavier Martinet and Marie{-}Anne Lachaux and Timoth{\'{e}}e Lacroix and Baptiste Rozi{\`{e}}re and Naman Goyal and Eric Hambro and Faisal Azhar and Aur{\'{e}}lien Rodriguez and Armand Joulin and Edouard Grave and Guillaume Lample},
+  journal={arXiv preprint arXiv:2302.13971},
+  year={2023}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "unk_token": ""
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 0,
+  "transformers_version": "4.29.1"
+}

model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:801a0db6049886245e7b4cd0c55e68b862f5069db423ea686f93fc85df9c45e7
+size 6744405708

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "pad_token": "</s>",
+  "unk_token": {
+    "content": "",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": false,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": null,
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

vocabulary.txt ADDED Viewed

The diff for this file is too large to render. See raw diff