lewtun HF staff commited on
Commit
f44258c
·
1 Parent(s): 35712ae

Enable figure scrolling on Safari

Browse files
Files changed (1) hide show
  1. app/src/index.html +13 -1
app/src/index.html CHANGED
@@ -7,6 +7,18 @@
7
  <meta charset="utf8">
8
  <base target="_blank">
9
  <title>Scaling test-time compute for open models: How we implemented DeepMind’s compute-optimal recipe to solve hard math problems like OpenAI’s o1</title>
 
 
 
 
 
 
 
 
 
 
 
 
10
  <link rel="stylesheet" href="style.css">
11
  <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
12
  </head>
@@ -267,7 +279,7 @@
267
 
268
  <details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0">Implementation detail</summary><div class="indented">
269
  <p>The pass@k metric measures the probability, computed over a set of problems, that at least one of the top \(k\) generated outputs for each problem contains the correct solution. In practice, computing pass@k naively leads to high variance; for example, if we compute pass@1 from a single completion per problem, we can get significantly different values from repeated evaluations due to sampling. To combat this, OpenAI's <a href="https://huggingface.co/papers/2107.03374">Codex paper</a> introduced an unbiased estimator that accounts for the total number of generated samples \(n\), the number of correct samples \(c\), and the desired \(k\) value. The estimator is formulated as:
270
-
271
  $$\text{pass@k} = \mathbb{E}_{\text{problems}} \left[ 1 - \frac{\binom{n - c}{k}}{\binom{n}{k}} \right]$$
272
 
273
  This formula calculates the expected value over all problems and determines the likelihood that at least one of the top \(k\) samples is correct. The term \(\binom{n - c}{k}/\binom{n}{k}\) represents the probability of selecting \(k\) incorrect samples from the total, and subtracting from 1 gives the probability of having at least one correct sample among the top \(k\).<d-footnote>See the <a href="https://samuelalbanie.com/files/digest-slides/2022-07-codex.pdf?utm_source=chatgpt.com">wonderful notes</a> from Samuel Albanie for many more details on pass@k.</d-footnote></p>
 
7
  <meta charset="utf8">
8
  <base target="_blank">
9
  <title>Scaling test-time compute for open models: How we implemented DeepMind’s compute-optimal recipe to solve hard math problems like OpenAI’s o1</title>
10
+ <style>
11
+ figure {
12
+ max-width: 100%;
13
+ overflow-x: auto;
14
+ -webkit-overflow-scrolling: touch; /* Smooth scrolling on iOS */
15
+ }
16
+
17
+ figure img {
18
+ max-width: none; /* Allows image to maintain original size */
19
+ height: auto;
20
+ }
21
+ </style>
22
  <link rel="stylesheet" href="style.css">
23
  <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
24
  </head>
 
279
 
280
  <details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0">Implementation detail</summary><div class="indented">
281
  <p>The pass@k metric measures the probability, computed over a set of problems, that at least one of the top \(k\) generated outputs for each problem contains the correct solution. In practice, computing pass@k naively leads to high variance; for example, if we compute pass@1 from a single completion per problem, we can get significantly different values from repeated evaluations due to sampling. To combat this, OpenAI's <a href="https://huggingface.co/papers/2107.03374">Codex paper</a> introduced an unbiased estimator that accounts for the total number of generated samples \(n\), the number of correct samples \(c\), and the desired \(k\) value. The estimator is formulated as:
282
+
283
  $$\text{pass@k} = \mathbb{E}_{\text{problems}} \left[ 1 - \frac{\binom{n - c}{k}}{\binom{n}{k}} \right]$$
284
 
285
  This formula calculates the expected value over all problems and determines the likelihood that at least one of the top \(k\) samples is correct. The term \(\binom{n - c}{k}/\binom{n}{k}\) represents the probability of selecting \(k\) incorrect samples from the total, and subtracting from 1 gives the probability of having at least one correct sample among the top \(k\).<d-footnote>See the <a href="https://samuelalbanie.com/files/digest-slides/2022-07-codex.pdf?utm_source=chatgpt.com">wonderful notes</a> from Samuel Albanie for many more details on pass@k.</d-footnote></p>