|
--- |
|
license: bigcode-openrail-m |
|
pipeline_tag: text-generation |
|
library_name: peft |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
widget: |
|
- example_title: "Proof: n = n + 0" |
|
text: "Lemma plus_n_O : forall n:nat, n = n + 0." |
|
- example_title: "Proof: 7 + 3 = 10" |
|
text: "Lemma ex1: 7 + 3 = 10." |
|
- example_title: "Wrong proposition" |
|
text: "Lemma mult_S : forall m n : nat, S (m * n) = m * n + n." |
|
--- |
|
# Model Card for CoqLLM-FineTuned-Experiment-Gen0 |
|
|
|
This model is an experiment in the realm of formal theorem proving, |
|
specifically tailored for generating and interpreting Coq code. By |
|
leveraging a comprehensive dataset derived from over 10,000 Coq source |
|
files, *CoqLLM-FineTuned-Experiment-Gen0* exhibits an enhanced |
|
proficiency in understanding the syntax and semantics unique to Coq, |
|
thereby facilitating significant strides in automated theorem proving. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Andreas Florath |
|
- **Model type:** Fine-tuned Large Language Model |
|
- **Finetuned from model:** Mistral-7b (`mistralai/Mistral-7B-v0.1`) |
|
- **Language(s) (NLP):** Coq (only) |
|
- **License:** bigcode-openrail-m |
|
|
|
## Model Sources |
|
|
|
- **Paper:** "Enhancing Formal Theorem Proving: A Comprehensive |
|
Dataset for Training AI Models on Coq Code" by Andreas Florath. |
|
|
|
## Uses |
|
|
|
### Prompt Format |
|
|
|
No special prompt format needed. The model was fine-tuned with Coq |
|
source code. Just providing the proposal let's the model generate a |
|
proof, like: |
|
|
|
``` |
|
Lemma plus_n_O : forall n:nat, n = n + 0. |
|
``` |
|
|
|
No special characters or delimiters are needed. |
|
|
|
### Direct Use |
|
|
|
*CoqLLM-FineTuned-Experiment-Gen0* is an experiment to show the |
|
usefulness of the dataset used for fine-tuning the model. The model |
|
might be used to check if short proofs can be automatically generated. |
|
Another possible use-case is to curate the existing Coq source code |
|
and curate and generate new Coq source code. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model is not intended for general-purpose language tasks outside |
|
the domain of theorem proving and formal verification. Misuse includes |
|
but is not limited to non-Coq programming tasks, natural language |
|
processing outside technical documentation, or any form of deployment |
|
in critical systems without adequate supervision and validation. |
|
|
|
### Bias, Risks, and Limitations |
|
|
|
The model inherits biases from its base mode, training data, potentially |
|
reflecting the diversity or lack thereof in the collected Coq |
|
files. Users should be wary of these limitations, particularly when |
|
applying the model in new or underserved areas of theorem proving. |
|
|
|
### Recommendations |
|
|
|
To mitigate risks and biases, it's recommended to supplement model use |
|
with human oversight or an environment where the generated Coq source |
|
code can be automatically verified. Continuous monitoring for |
|
unexpected behaviors or outputs is advised, alongside efforts to |
|
diversify and expand the training dataset to cover a broader spectrum |
|
of Coq use cases. |
|
|
|
## How to Get Started with the Model |
|
|
|
Here is a code snippet using the fine-tuned model. The shown setup |
|
should work using GPUs with <= 24GByte RAM. You might want to adapt |
|
and experiment with different settings, like different temperatures. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
import sys |
|
from peft import PeftModel |
|
|
|
base_model_id = "mistralai/Mistral-7B-v0.1" |
|
bnb_config = BitsAndBytesConfig( |
|
load_in_4bit=True, bnb_4bit_use_double_quant=True, |
|
bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 |
|
) |
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
base_model_id, quantization_config=bnb_config, |
|
device_map="auto", trust_remote_code=True, |
|
) |
|
|
|
eval_tokenizer = AutoTokenizer.from_pretrained( |
|
base_model_id, add_bos_token=True, trust_remote_code=True) |
|
|
|
ft_model = PeftModel.from_pretrained( |
|
base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0") |
|
|
|
eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0." |
|
|
|
ft_model.eval() |
|
with torch.no_grad(): |
|
|
|
for idx in range(10): |
|
|
|
res = eval_tokenizer.decode( |
|
ft_model.generate( |
|
**model_input, |
|
max_new_tokens=50, |
|
do_sample=True, |
|
temperature=0.4, |
|
pad_token_id=eval_tokenizer.eos_token_id, |
|
repetition_penalty=1.15)[0], skip_special_tokens=False) |
|
|
|
print("Result [%2d] [%s]" % (idx, res)) |
|
|
|
``` |
|
|
|
## Training Details |
|
|
|
The model was fine-tuned with the `florath/coq-facts-props-proofs-v1` |
|
dataset. Only entries with permissive licenses were used during the |
|
fine-tuning process. |
|
|
|
|