--- pipeline_tag: text-generation inference: true widget: - text: 'Hello!' example_title: Hello world group: Python library_name: transformers --- # yujiepan/falcon-40b-awq-w4g128 This model applies autoawq on [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b): AutoAWQ, 4bit, group_size=128, zero_point=True ## Accuracy | task | tiiuae/falcon-40b (fp16) | this repo | |----------------------------|-------------------|-----------| | wikitext ppl by lm_harness | 8.410 | 8.497 | ## Usage ```python from awq import AutoAWQForCausalLM from transformers import AutoTokenizer model_name_or_path = "yujiepan/falcon-40b-awq-w4g128" # Load model model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=False, trust_remote_code=False) tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) prompt = "Tell me about AI" tokens = tokenizer( prompt, return_tensors='pt' ).input_ids.cuda() # Generate generation_output = model.generate( tokens, do_sample=True, temperature=0.7, top_p=0.95, top_k=40, max_new_tokens=10, ) print("Output: ", tokenizer.decode(generation_output[0])) ```