gghfez commited on
Commit
303a309
1 Parent(s): 3e138d7

Update app.py

Browse files

llama.cpp no longer supports building via `make`, update to CMake instructions in generated model card..

Files changed (1) hide show
  1. app.py +21 -15
app.py CHANGED
@@ -228,45 +228,51 @@ def process_model(model_id, q_method, use_imatrix, imatrix_q_method, private_rep
228
  # {new_repo_id}
229
  This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
230
  Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
231
-
232
  ## Use with llama.cpp
233
  Install llama.cpp through brew (works on Mac and Linux)
234
-
235
  ```bash
236
  brew install llama.cpp
237
-
238
  ```
239
  Invoke the llama.cpp server or the CLI.
240
-
241
  ### CLI:
242
  ```bash
243
  llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
244
  ```
245
-
246
  ### Server:
247
  ```bash
248
  llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
249
  ```
250
-
251
  Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
252
 
253
  Step 1: Clone llama.cpp from GitHub.
254
- ```
255
  git clone https://github.com/ggerganov/llama.cpp
 
256
  ```
257
 
258
- Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
259
- ```
260
- cd llama.cpp && LLAMA_CURL=1 make
 
261
  ```
262
 
263
- Step 3: Run inference through the main binary.
264
- ```
265
- ./llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
 
266
  ```
267
- or
 
 
 
268
  ```
269
- ./llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
 
 
270
  ```
271
  """
272
  )
 
228
  # {new_repo_id}
229
  This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
230
  Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
231
+
232
  ## Use with llama.cpp
233
  Install llama.cpp through brew (works on Mac and Linux)
 
234
  ```bash
235
  brew install llama.cpp
 
236
  ```
237
  Invoke the llama.cpp server or the CLI.
238
+
239
  ### CLI:
240
  ```bash
241
  llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
242
  ```
243
+
244
  ### Server:
245
  ```bash
246
  llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
247
  ```
248
+
249
  Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
250
 
251
  Step 1: Clone llama.cpp from GitHub.
252
+ ```bash
253
  git clone https://github.com/ggerganov/llama.cpp
254
+ cd llama.cpp
255
  ```
256
 
257
+ Step 2: Build using CMake. For CPU-only use:
258
+ ```bash
259
+ cmake -B build
260
+ cmake --build build --config Release
261
  ```
262
 
263
+ For CUDA support on Linux/Windows:
264
+ ```bash
265
+ cmake -B build -DGGML_CUDA=ON
266
+ cmake --build build --config Release
267
  ```
268
+
269
+ Step 3: Run inference through the binary (from the llama.cpp folder):
270
+ ```bash
271
+ ./build/bin/llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
272
  ```
273
+ or
274
+ ```bash
275
+ ./build/bin/llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
276
  ```
277
  """
278
  )