google/gemma-2b-it - W8A8_FP8 Compression

This is a compressed model using llmcompressor.

Compression Configuration

  • Base Model: google/gemma-2b-it
  • Compression Scheme: W8A8_FP8
  • Dataset: HuggingFaceH4/ultrachat_200k
  • Dataset Split: train_sft
  • Number of Samples: 512
  • Preprocessor: chat
  • Maximum Sequence Length: 8192

Sample Output

Prompt:

<bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>

Output:

<bos><bos><start_of_turn>user
Who is Alan Turing?<end_of_turn>
Alan Turing was a British mathematician, computer scientist, cryptanalyst, and philosopher. He is considered one of the pioneers of computer science and artificial intelligence. Turing is best known for his contributions to cryptography, particularly in the development of the modern theory of cryptography. He also made significant contributions to the development of artificial intelligence, including the Turing test, which is a measure of a machine's ability to exhibit intelligent behavior indistinguishable from a human.

Here are some of Turing's most important contributions to computer science and artificial intelligence:

* **Cryptography:** Turing developed the modern theory of cryptography, which is the study of methods for protecting

Evaluation

Downloads last month
32
Safetensors
Model size
3.03B params
Tensor type
BF16
·
F8_E4M3
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for espressor/google.gemma-2b-it_W8A8_FP8

Base model

google/gemma-2b-it
Quantized
(31)
this model

Dataset used to train espressor/google.gemma-2b-it_W8A8_FP8