Kewen Zhao commited on
Commit
54f1bd3
·
1 Parent(s): 7e44765

update README

Browse files
Files changed (1) hide show
  1. README.md +26 -18
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Code Eval
3
  emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
@@ -20,6 +20,8 @@ description: >-
20
 
21
  ## Metric description
22
 
 
 
23
  The CodeEval metric estimates the pass@k metric for code synthesis.
24
 
25
  It implements the evaluation harness for the HumanEval problem solving dataset described in the paper ["Evaluating Large Language Models Trained on Code"](https://arxiv.org/abs/2107.03374).
@@ -31,7 +33,9 @@ The Code Eval metric calculates how good are predictions given a set of referenc
31
 
32
  `predictions`: a list of candidates to evaluate. Each candidate should be a list of strings with several code candidates to solve the problem.
33
 
34
- `references`: a list with a test for each prediction. Each test should evaluate the correctness of a code candidate.
 
 
35
 
36
  `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
37
 
@@ -41,10 +45,11 @@ The Code Eval metric calculates how good are predictions given a set of referenc
41
 
42
  ```python
43
  from evaluate import load
44
- code_eval = load("code_eval")
45
- test_cases = ["assert add(2,3)==5"]
46
- candidates = [["def add(a,b): return a*b", "def add(a, b): return a+b"]]
47
- pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1, 2])
 
48
  ```
49
 
50
  N.B.
@@ -73,10 +78,11 @@ Full match at `k=1`:
73
 
74
  ```python
75
  from evaluate import load
76
- code_eval = load("code_eval")
77
- test_cases = ["assert add(2,3)==5"]
78
- candidates = [["def add(a, b): return a+b"]]
79
- pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1])
 
80
  print(pass_at_k)
81
  {'pass@1': 1.0}
82
  ```
@@ -85,10 +91,11 @@ No match for k = 1:
85
 
86
  ```python
87
  from evaluate import load
88
- code_eval = load("code_eval")
89
- test_cases = ["assert add(2,3)==5"]
90
- candidates = [["def add(a,b): return a*b"]]
91
- pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1])
 
92
  print(pass_at_k)
93
  {'pass@1': 0.0}
94
  ```
@@ -97,10 +104,11 @@ Partial match at k=1, full match at k=2:
97
 
98
  ```python
99
  from evaluate import load
100
- code_eval = load("code_eval")
101
- test_cases = ["assert add(2,3)==5"]
102
- candidates = [["def add(a, b): return a+b", "def add(a,b): return a*b"]]
103
- pass_at_k, results = code_eval.compute(references=test_cases, predictions=candidates, k=[1, 2])
 
104
  print(pass_at_k)
105
  {'pass@1': 0.5, 'pass@2': 1.0}
106
  ```
 
1
  ---
2
+ title: Code Eval Stdio
3
  emoji: 🤗
4
  colorFrom: blue
5
  colorTo: red
 
20
 
21
  ## Metric description
22
 
23
+ The stdio version of of the ["code eval"](https://huggingface.co/spaces/evaluate-metric/code_eval) metrics, which handles python programs that read inputs from STDIN and print answers to STDOUT, which is common in competitive programming (e.g. CodeForce, USACO)
24
+
25
  The CodeEval metric estimates the pass@k metric for code synthesis.
26
 
27
  It implements the evaluation harness for the HumanEval problem solving dataset described in the paper ["Evaluating Large Language Models Trained on Code"](https://arxiv.org/abs/2107.03374).
 
33
 
34
  `predictions`: a list of candidates to evaluate. Each candidate should be a list of strings with several code candidates to solve the problem.
35
 
36
+ `references`: a list of expected output for each prediction.
37
+
38
+ `inputs`: a list of inputs for each problem
39
 
40
  `k`: number of code candidates to consider in the evaluation. The default value is `[1, 10, 100]`.
41
 
 
45
 
46
  ```python
47
  from evaluate import load
48
+ code_eval_stdio = load("hage2000/code_eval_stdio")
49
+ inputs = ["2 3"]
50
+ references = ["5"]
51
+ candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))"]]
52
+ pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
53
  ```
54
 
55
  N.B.
 
78
 
79
  ```python
80
  from evaluate import load
81
+ code_eval_stdio = load("hage2000/code_eval_stdio")
82
+ inputs = ["2 3"]
83
+ references = ["5"]
84
+ candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))"]]
85
+ pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
86
  print(pass_at_k)
87
  {'pass@1': 1.0}
88
  ```
 
91
 
92
  ```python
93
  from evaluate import load
94
+ code_eval_stdio = load("hage2000/code_eval_stdio")
95
+ inputs = ["2 3"]
96
+ references = ["5"]
97
+ candidates = [[ "nums = list(map(int, input().split()))\nprint(nums[0]*nums[1])"]]
98
+ pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
99
  print(pass_at_k)
100
  {'pass@1': 0.0}
101
  ```
 
104
 
105
  ```python
106
  from evaluate import load
107
+ code_eval_stdio = load("hage2000/code_eval_stdio")
108
+ inputs = ["2 3"]
109
+ references = ["5"]
110
+ candidates = [[ "nums = list(map(int, input().split()))\nprint(sum(nums))", "nums = list(map(int, input().split()))\nprint(nums[0]*nums[1])"]]
111
+ pass_at_k, results = code_eval_stdio.compute(references=references, predictions=candidates, inputs = inputs, k=[1, 2])
112
  print(pass_at_k)
113
  {'pass@1': 0.5, 'pass@2': 1.0}
114
  ```