RAG with Phi-3-mini-4k-instruct-q4.gguf - Local GGUF file

#18
by sunil-pathak - opened

Hi,

I have the model file downloaded in local folder. The offline model works fine using Ollama and llama-cpp-python.

I am now trying to test RAG with Phi-3-mini-4k-instruct-q4.gguf file locally (model is downloaded on a local folder).
The RAG code and Phi-3-mini-4k-instruct-q4.gguf file is in same folder.

I am getting below error :
OSError: Phi-3-mini-4k-instruct is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing
token=<your_token>

I am guessing that "tokenizer_name" and "model_name" parameter is incorrect. Please suggest clues...


tokenizer_name="Phi-3-mini-4k-instruct",
model_name="Phi-3-mini-4k-instruct",

and

embed_model = HuggingFaceEmbedding(model_name="Phi-3-mini-4k-instruct-q4.gguf")

I guess i am doing something very wrong.... please suggest.

What are the correct values for these parameters... I dont want to download the file everytime but it should be offline.. do i need to download

--------------------- Code is below ---------------------
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext
from llama_index.llms.huggingface import HuggingFaceLLM
import torch

documents = SimpleDirectoryReader("content").load_data()
from llama_index.core.prompts.prompts import SimpleInputPrompt

system_prompt = "You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the instructions and context provided."

This will wrap the default prompts that are internal to llama-index

query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=256,
generate_kwargs={"temperature": 0.0, "do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,

--------------------------------------------------------------------->>>>

tokenizer_name="Phi-3-mini-4k-instruct",
model_name="Phi-3-mini-4k-instruct",

<<<<<< ----------------------------------------------------------------------------

#device_map="cuda",
# uncomment this if using CUDA to reduce memory usage
model_kwargs={"torch_dtype": torch.bfloat16}

)

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

load ,odel locally

--------------------------------------------------------------------->>>>

embed_model = HuggingFaceEmbedding(model_name="Phi-3-mini-4k-instruct-q4.gguf")

<<<<<< ----------------------------------------------------------------------------

Sign up or log in to comment