File size: 5,167 Bytes
12b1d20
 
6b45168
 
 
19056b6
 
 
 
 
 
 
12b1d20
6b45168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c1e499
 
6b45168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4e5cfaf
 
6b45168
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9c1e499
6b45168
 
 
9c1e499
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
---
license: bigcode-openrail-m
pipeline_tag: text-generation
library_name: peft
base_model: mistralai/Mistral-7B-v0.1
widget:
- example_title: "Proof: n = n + 0"
  text: "Lemma plus_n_O : forall n:nat, n = n + 0."
- example_title: "Proof: 7 + 3 = 10"
  text: "Lemma ex1: 7 + 3 = 10."
- example_title: "Wrong proposition"
  text: "Lemma mult_S : forall m n : nat, S (m * n) = m * n + n."
---
# Model Card for CoqLLM-FineTuned-Experiment-Gen0

This model is an experiment in the realm of formal theorem proving,
specifically tailored for generating and interpreting Coq code.  By
leveraging a comprehensive dataset derived from over 10,000 Coq source
files, *CoqLLM-FineTuned-Experiment-Gen0* exhibits an enhanced
proficiency in understanding the syntax and semantics unique to Coq,
thereby facilitating significant strides in automated theorem proving.

## Model Details

### Model Description

- **Developed by:** Andreas Florath
- **Model type:** Fine-tuned Large Language Model
- **Finetuned from model:** Mistral-7b (`mistralai/Mistral-7B-v0.1`)
- **Language(s) (NLP):** Coq (only)
- **License:** bigcode-openrail-m

## Model Sources

- **Paper:** [Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code](https://arxiv.org/abs/2403.12627)
- **Dataset:** [florath/coq-facts-props-proofs-gen0-v1](https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1)

## Uses

### Prompt Format

No special prompt format needed.  The model was fine-tuned with Coq
source code.  Just providing the proposal let's the model generate a
proof, like:

```
Lemma plus_n_O : forall n:nat, n = n + 0.
```

No special characters or delimiters are needed.

### Direct Use

*CoqLLM-FineTuned-Experiment-Gen0* is an experiment to show the
usefulness of the dataset used for fine-tuning the model.  The model
might be used to check if short proofs can be automatically generated.
Another possible use-case is to curate the existing Coq source code
and curate and generate new Coq source code.

### Out-of-Scope Use

The model is not intended for general-purpose language tasks outside
the domain of theorem proving and formal verification.  Misuse includes
but is not limited to non-Coq programming tasks, natural language
processing outside technical documentation, or any form of deployment
in critical systems without adequate supervision and validation.

### Bias, Risks, and Limitations

The model inherits biases from its base mode, training data, potentially
reflecting the diversity or lack thereof in the collected Coq
files.  Users should be wary of these limitations, particularly when
applying the model in new or underserved areas of theorem proving.

### Recommendations

To mitigate risks and biases, it's recommended to supplement model use
with human oversight or an environment where the generated Coq source
code can be automatically verified.  Continuous monitoring for
unexpected behaviors or outputs is advised, alongside efforts to
diversify and expand the training dataset to cover a broader spectrum
of Coq use cases.

## How to Get Started with the Model

Here is a code snippet using the fine-tuned model.  The shown setup
should work using GPUs with <= 24GByte RAM.  You might want to adapt
and experiment with different settings, like different temperatures.

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import sys
from peft import PeftModel

base_model_id = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id, quantization_config=bnb_config,
    device_map="auto", trust_remote_code=True,
)

eval_tokenizer = AutoTokenizer.from_pretrained(
    base_model_id, add_bos_token=True, trust_remote_code=True)

ft_model = PeftModel.from_pretrained(
    base_model, "florath/CoqLLM-FineTuned-Experiment-Gen0")

eval_prompt = "Lemma plus_n_O : forall n:nat, n = n + 0."

model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():

    for idx in range(10):
    
        res = eval_tokenizer.decode(
            ft_model.generate(
                **model_input,
                max_new_tokens=50,
                do_sample=True,
                temperature=0.4,
                pad_token_id=eval_tokenizer.eos_token_id,
                repetition_penalty=1.15)[0], skip_special_tokens=False)
				
	    print("Result [%2d] [%s]" % (idx, res))

```

## Training Details

The model was fine-tuned with the [florath/coq-facts-props-proofs-gen0-v1](https://huggingface.co/datasets/florath/coq-facts-props-proofs-gen0-v1)
dataset.  Only entries with permissive licenses were used during the
fine-tuning process.

## Cite

```
@misc{florath2024enhancing,
      title={Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code}, 
      author={Andreas Florath},
      year={2024},
      eprint={2403.12627},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}
```