加入中文词表并继续预训练中文Embedding,并在此基础上继续使用指令数据集finetuning,得到的中文LLaMA模型。

详情可参考:https://github.com/ymcui/Chinese-LLaMA-Alpaca

使用方法参考

  1. 安装模块包
pip install sentencepiece
pip install transformers>=4.28.0
  1. 生成文本
import torch
import transformers
from transformers import LlamaTokenizer, LlamaForCausalLM

def generate_prompt(text):
    return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{text}

### Response:"""


tokenizer = LlamaTokenizer.from_pretrained('minlik/chinese-alpaca-7b-merged')
model = LlamaForCausalLM.from_pretrained('minlik/chinese-alpaca-7b-merged').half().to('cuda')
model.eval()

text = '第一个登上月球的人是谁?'
prompt = generate_prompt(text)
input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')


with torch.no_grad():
    output_ids = model.generate(
        input_ids=input_ids,
        max_new_tokens=128,
        temperature=1,
        top_k=40,
        top_p=0.9,
        repetition_penalty=1.15
    ).cuda()
output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(output.replace(prompt, '').strip())
Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using minlik/chinese-alpaca-7b-merged 1