--- language: - ko - en library_name: transformers --- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c0c845a04a514ba62bcd1a/RFpsPxlc_3cK0kmWj-tYR.png) # **Introduction** We introduce Llama-3-Motif, a new language model family of [**Moreh**](https://moreh.io/), specialized in Korean and English.\ [Llama-3-Motif-102B-Instruct](https://huggingface.co/moreh/Motif-102B-Instruct) is a chat model tuned from this model. ## Training Platform - Llama-3-Motif-102B is trained on [MoAI platform](https://moreh.io/product), refer to link for more information. ## Quick Usage base model is not served directly. Instead, you can chat directly with [Llama-3-Motif-102B-Instruct](https://huggingface.co/moreh/Motif-102B-Instruct) through our [Model hub](https://model-hub.moreh.io/). ## Details More details will be provided in the upcoming technical report. ### Release Date 2024.12.02 ### Benchmark Results |Provider|Model|kmmlu_direct score|| |---|---|---|---| |Moreh|**Llama-3-Motif-102B**|**64.74**|+| |Meta|Llama3-70B-instruct|54.5*|| |Meta|Llama3.1-70B-instruct|52.1*|| |Meta|Llama3.1-405B-instruct|65.8*|| |Alibaba|Qwen2-72B-instruct|64.1*|| |OpenAI|GPT-4-0125-preview|59.95*|| |OpenAI|GPT-4o-2024-05-13|64.11**|| |Google|gemini pro|50.18*|| |LG|exaone 3.0|44.5*|+| |Naver|HyperCLOVA X|53.4*|+| |Upstage|SOLAR-10.7B|41.65*|+| \* : Community report \*\* : Measured by Moreh \+ : Claimed to have better capability in Korean ## How to use We do not recommend using base model directly! ### Use with vLLM - Refer to this [link](https://github.com/vllm-project/vllm) to install vllm ```python from transformers import AutoTokenizer from vllm import LLM, SamplingParams # Change tensor_parallel_size to GPU numbers you can afford model = LLM("moreh/Llama-3-Motif-102B", tensor_parallel_size=4) tokenizer = AutoTokenizer.from_pretrained("moreh/Llama-3-Motif-102B") messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"}, ] messages_batch = [tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False)] # vllm does not support generation_config of hf. So we have to set it like below sampling_params = SamplingParams(max_tokens=512, temperature=0, repetition_penalty=1.0, stop_token_ids=[tokenizer.eos_token_id]) responses = model.generate(messages_batch, sampling_params=sampling_params) print(responses[0].outputs[0].text) ``` ### Use with transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "moreh/Llama-3-Motif-102B" # all generation configs are set in generation_configs.json model = AutoModelForCausalLM.from_pretrained(model_id).cuda() tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "유치원생에게 빅뱅 이론의 개념을 설명해보세요"}, ] messages_batch = tokenizer.apply_chat_template(conversation=messages, add_generation_prompt=True, tokenize=False) input_ids = tokenizer(messages_batch, padding=True, return_tensors='pt')['input_ids'].cuda() outputs = model.generate(input_ids) ```