Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute