RAG with Phi-3-mini-4k-instruct-q4.gguf - Local GGUF file
Hi,
I have the model file downloaded in local folder. The offline model works fine using Ollama and llama-cpp-python.
I am now trying to test RAG with Phi-3-mini-4k-instruct-q4.gguf file locally (model is downloaded on a local folder).
The RAG code and Phi-3-mini-4k-instruct-q4.gguf file is in same folder.
I am getting below error :
OSError: Phi-3-mini-4k-instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login
or by passing
token=<your_token>
I am guessing that "tokenizer_name" and "model_name" parameter is incorrect. Please suggest clues...
tokenizer_name="Phi-3-mini-4k-instruct",
model_name="Phi-3-mini-4k-instruct",
and
embed_model = HuggingFaceEmbedding(model_name="Phi-3-mini-4k-instruct-q4.gguf")
I guess i am doing something very wrong.... please suggest.
What are the correct values for these parameters... I dont want to download the file everytime but it should be offline.. do i need to download
--------------------- Code is below ---------------------
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext
from llama_index.llms.huggingface import HuggingFaceLLM
import torch
documents = SimpleDirectoryReader("content").load_data()
from llama_index.core.prompts.prompts import SimpleInputPrompt
system_prompt = "You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the instructions and context provided."
This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256,
generate_kwargs={"temperature": 0.0, "do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
--------------------------------------------------------------------->>>>
tokenizer_name="Phi-3-mini-4k-instruct",
model_name="Phi-3-mini-4k-instruct",
<<<<<< ----------------------------------------------------------------------------
#device_map="cuda",
# uncomment this if using CUDA to reduce memory usage
model_kwargs={"torch_dtype": torch.bfloat16}
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
load ,odel locally
--------------------------------------------------------------------->>>>
embed_model = HuggingFaceEmbedding(model_name="Phi-3-mini-4k-instruct-q4.gguf")