metadata

tags:
  - generated_from_trainer
model-index:
  - name: completed-model
    results: []

completed-model

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.4772	0.1	61	0.4037	-0.3256	-1.2099	0.8095	0.8843	-282.8301	-422.4691	-2.1649	-1.6678
0.3859	0.2	122	0.3681	-0.3816	-1.7445	0.7143	1.3629	-288.1762	-423.0287	-2.2536	-1.7385
0.3061	0.3	183	0.3546	-0.4969	-2.1025	0.8095	1.6056	-291.7559	-424.1818	-2.1989	-1.7108
0.3765	0.4	244	0.3374	-0.5153	-2.1301	0.7619	1.6148	-292.0326	-424.3660	-2.2182	-1.7222
0.2819	0.5	305	0.3303	-0.4402	-2.1809	0.8095	1.7407	-292.5404	-423.6147	-2.1835	-1.6998
0.3009	0.6	366	0.3314	-0.8026	-2.7756	0.8571	1.9730	-298.4871	-427.2388	-2.2430	-1.7529
0.3015	0.7	427	0.3228	-0.6439	-2.5710	0.9048	1.9271	-296.4410	-425.6519	-2.2258	-1.7303
0.3407	0.8	488	0.3185	-0.7270	-2.7118	0.8571	1.9847	-297.8488	-426.4829	-2.2530	-1.7496
0.3149	0.9	549	0.3186	-0.6296	-2.5591	0.8571	1.9295	-296.3221	-425.5087	-2.2481	-1.7413