blogpost-scaling-test-time-compute

Running

App Files Files Community

lewtun HF staff commited on 7 days ago

Commit

f44258c

1 Parent(s): 35712ae

Enable figure scrolling on Safari

Browse files

Files changed (1) hide show

app/src/index.html +13 -1

app/src/index.html CHANGED Viewed

@@ -7,6 +7,18 @@
     <meta charset="utf8">
     <base target="_blank">
     <title>Scaling test-time compute for open models: How we implemented DeepMind’s compute-optimal recipe to solve hard math problems like OpenAI’s o1</title>
     <link rel="stylesheet" href="style.css">
     <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
 </head>
@@ -267,7 +279,7 @@
     <details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0">Implementation detail</summary><div class="indented">
         <p>The pass@k metric measures the probability, computed over a set of problems, that at least one of the top \(k\) generated outputs for each problem contains the correct solution. In practice, computing pass@k naively leads to high variance; for example, if we compute pass@1 from a single completion per problem, we can get significantly different values from repeated evaluations due to sampling. To combat this, OpenAI's <a href="https://huggingface.co/papers/2107.03374">Codex paper</a> introduced an unbiased estimator that accounts for the total number of generated samples \(n\), the number of correct samples \(c\), and the desired \(k\) value. The estimator is formulated as:
             $$\text{pass@k} = \mathbb{E}_{\text{problems}} \left[ 1 - \frac{\binom{n - c}{k}}{\binom{n}{k}} \right]$$
         This formula calculates the expected value over all problems and determines the likelihood that at least one of the top \(k\) samples is correct. The term \(\binom{n - c}{k}/\binom{n}{k}\) represents the probability of selecting \(k\) incorrect samples from the total, and subtracting from 1 gives the probability of having at least one correct sample among the top \(k\).<d-footnote>See the <a href="https://samuelalbanie.com/files/digest-slides/2022-07-codex.pdf?utm_source=chatgpt.com">wonderful notes</a> from Samuel Albanie for many more details on pass@k.</d-footnote></p>

     <meta charset="utf8">
     <base target="_blank">
     <title>Scaling test-time compute for open models: How we implemented DeepMind’s compute-optimal recipe to solve hard math problems like OpenAI’s o1</title>
+    <style>
+        figure {
+            max-width: 100%;
+            overflow-x: auto;
+            -webkit-overflow-scrolling: touch; /* Smooth scrolling on iOS */
+        }
+        figure img {
+            max-width: none; /* Allows image to maintain original size */
+            height: auto;
+        }
+    </style>
     <link rel="stylesheet" href="style.css">
     <script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
 </head>
     <details><summary style="font-weight:600;font-size:1.25em;line-height:1.3;margin:0">Implementation detail</summary><div class="indented">
         <p>The pass@k metric measures the probability, computed over a set of problems, that at least one of the top \(k\) generated outputs for each problem contains the correct solution. In practice, computing pass@k naively leads to high variance; for example, if we compute pass@1 from a single completion per problem, we can get significantly different values from repeated evaluations due to sampling. To combat this, OpenAI's <a href="https://huggingface.co/papers/2107.03374">Codex paper</a> introduced an unbiased estimator that accounts for the total number of generated samples \(n\), the number of correct samples \(c\), and the desired \(k\) value. The estimator is formulated as:
             $$\text{pass@k} = \mathbb{E}_{\text{problems}} \left[ 1 - \frac{\binom{n - c}{k}}{\binom{n}{k}} \right]$$
         This formula calculates the expected value over all problems and determines the likelihood that at least one of the top \(k\) samples is correct. The term \(\binom{n - c}{k}/\binom{n}{k}\) represents the probability of selecting \(k\) incorrect samples from the total, and subtracting from 1 gives the probability of having at least one correct sample among the top \(k\).<d-footnote>See the <a href="https://samuelalbanie.com/files/digest-slides/2022-07-codex.pdf?utm_source=chatgpt.com">wonderful notes</a> from Samuel Albanie for many more details on pass@k.</d-footnote></p>