Revisiting TemplateGSM: Advancing Mathematical Reasoning in Language Models with Template-based Data Generation

Community Article Published November 14, 2024

Authors: Yifan Zhang et al.
Originally Published: February 2024
Revisited: November 2024


In February 2024, we introduced TemplateGSM, a groundbreaking dataset aimed at advancing mathematical reasoning in language models. Since then, the field has continued to evolve rapidly. We are revisiting TemplateGSM to highlight its contributions and provide a deeper understanding of our Template-based Data Generation (TDG) method, which underpins the dataset.


Project Resources


Introduction

The field of natural language processing (NLP) has witnessed remarkable progress with the advent of large language models (LLMs) like GPT-3, PaLM, and Llama. These models have demonstrated unprecedented capabilities in language understanding and generation. However, when it comes to tasks requiring complex reasoning, especially mathematical problem-solving, these models often fall short. One significant barrier is the scarcity of large-scale, high-quality, domain-specific datasets necessary for training models to develop sophisticated reasoning abilities.

To address this challenge, we introduced Template-based Data Generation (TDG), a novel approach that leverages GPT-4 to automatically generate parameterized meta-templates. These meta-templates serve as foundational structures for synthesizing a vast array of high-quality problems and solutions, effectively elevating data augmentation to a new level.

Utilizing TDG, we present TemplateGSM, a dataset comprising over 7 million synthetically generated grade school math problems. Each problem is accompanied by both code-based and natural language solutions, providing a rich resource for training and evaluating LLMs in mathematical reasoning.


What is Template-based Data Generation (TDG)?

Overview

Template-based Data Generation (TDG) is a method designed to systematically produce a vast array of mathematical problems along with their corresponding solutions by leveraging parameterized templates. By employing GPT-4 to generate these meta-templates, we capture a wide variety of problem structures and linguistic styles. By varying parameters within these GPT-4-generated templates, TDG ensures both scalability and quality in the generated data.

Methodology

The TDG process involves several key components that work together to generate high-quality mathematical datasets:

1. Generation of Meta-Templates with GPT-4

We begin by utilizing GPT-4 to generate meta-templates that encapsulate various mathematical problem types. These templates include placeholders for variable components such as names, quantities, items, dates, and locations. GPT-4's advanced language generation capabilities allow us to produce a diverse set of templates that encompass a wide range of mathematical concepts.

2. Parameter Generation

To instantiate the templates, we develop functions that generate parameters satisfying specific conditions, ensuring the solvability and validity of the problems. Parameters are carefully selected to avoid trivial or overly complex problems, striking a balance appropriate for the target educational level.

3. Problem Instantiation

The generated parameters are substituted into the GPT-4-generated meta-templates to create specific problem statements. Each instantiated problem is unique in its details but retains the structural characteristics defined by the template.

4. Solution Generation and Verification

For each problem, we generate solution code—typically in Python—that can automatically solve the problem. By executing the solution code, we obtain the results and verify the correctness of the solutions. In addition to the code-based solutions, we also generate natural language explanations that describe the solution steps without using code.

Illustrative Example

An illustrative example of our TDG method is presented below. The code snippet demonstrates how we generate problems involving sales over two consecutive months. The meta-template for this problem type was generated using GPT-4, capturing a realistic scenario that can be varied through parameter substitution.

def generate_problem_and_solution_code():
    # Lists of random terms
    months = ["January and February", "March and April", "May and June",
              "July and August", "September and October", "November and December"]
    
    # Get initial amount and subsequent ratio
    initial_amount = random.randint(50, 150)
    subsequent_ratio = random.choice([0.5, 0.6, 0.7, 0.8, 0.9])
    
    # Randomly select terms
    name = random.choice(first_names) + ' ' + random.choice(last_names)
    item = random.choice(items)
    month = random.choice(months)
    year = random.randint(2010, 2023)
    place = random.choice(places)
    county = random.choice(us_counties)
    county_name = county['CountyName'] + ", " + county["StateName"]
    
    # Construct problem statement
    problem_statement = f"{name} sold {initial_amount} {item} in {month.split(' and ')[0]}, {year} at {place} in {county_name}. "
    problem_statement += f"In {month.split(' and ')[1]}, they sold {int(subsequent_ratio*100)}% of the amount sold in the previous month. "
    problem_statement += f"How many {item} did {name} sell in total during {month}?"
    
    # Generate solution code
    solution_code = f"""# Number of {item} sold by {name} in {month.split(' and ')[0]}, {year}
{item}_sold_in_{month.split(' and ')[0]} = {initial_amount}

# Sales ratio for the next month
{item}_ratio = {subsequent_ratio}

# Calculating the amount of {item} sold in {month.split(' and ')[1]}
subsequent_{item}_sold = {item}_sold_in_{month.split(' and ')[0]} * {item}_ratio

# Calculating the total number of {item} sold during {month}
total_{item}_sold = {item}_sold_in_{month.split(' and ')[0]} + subsequent_{item}_sold

result = total_{item}_sold
"""
    
    # Execute the solution code
    exec_globals = {}
    exec(solution_code, {}, exec_globals)
    result = round(exec_globals['result'])
    
    # Generate the solution without code
    solution_wocode = f"{name} sold {initial_amount} {item} in {month.split(' and ')[0]}, {year}. "
    solution_wocode += f"In {month.split(' and ')[1]}, they sold {int(subsequent_ratio*100)}% of the previous month's sales, which is {round(subsequent_ratio*initial_amount)} {item}. "
    solution_wocode += f"In total, they sold {initial_amount} + {round(subsequent_ratio*initial_amount)} = {result} {item} during {month}."
    
    return problem_statement, solution_code, result, solution_wocode

Generated Problem and Solution Example

Problem Statement:

Alex Johnson sold 120 books in March, 2021 at The Bookstore in Orange County, California. In April, they sold 80% of the amount sold in the previous month. How many books did Alex Johnson sell in total during March and April?

Solution Code:

# Number of books sold by Alex Johnson in March, 2021
books_sold_in_March = 120

# Sales ratio for the next month
books_ratio = 0.8

# Calculating the amount of books sold in April
subsequent_books_sold = books_sold_in_March * books_ratio

# Calculating the total number of books sold during March and April
total_books_sold = books_sold_in_March + subsequent_books_sold

result = total_books_sold

Natural Language Solution:

Alex Johnson sold 120 books in March, 2021. In April, they sold 80% of the previous month's sales, which is 96 books. In total, they sold 120 + 96 = 216 books during March and April.


Advantages of TDG

The TDG method offers several significant advantages:

  • Scalability: TDG enables the generation of an effectively infinite amount of data by varying parameters within GPT-4-generated templates.
  • Quality Assurance: By using code execution for solution verification, we ensure that each problem has a correct and reliable solution.
  • Diversity: The use of GPT-4 to generate meta-templates introduces a wide variety of problem structures and linguistic styles.
  • Elevated Data Augmentation: By incorporating GPT-4 into the template generation process, we elevate data augmentation to a new level, enabling the synthesis of data that is both varied and high-quality.

Dataset Structure

TemplateGSM is organized into configurations based on the number of problems generated from each template.

Configurations

  • templategsm-1000-1k: Contains 1,000 problems generated from each of the first 1,000 templates.
  • templategsm-2000-1k: Contains 1,000 problems from each of the first 2,000 templates.
  • templategsm-4000-1k: Contains 1,000 problems from each of the first 4,000 templates.
  • templategsm-7473-1k: Contains 1,000 problems from each of all 7,473 templates.

Data Fields

Each problem in the dataset includes the following fields:

  • problem: The problem statement.
  • solution_code: A commented Python code that solves the problem.
  • result: The final answer to the problem.
  • solution_wocode: The solution explained in natural language without code.
  • source: Indicates the data source and seed used in problem generation.
  • template_id: The ID of the template from which the problem was generated.
  • problem_id: A unique index for each problem within its template.

How to Use TemplateGSM

The dataset is available on Hugging Face Datasets and can be easily accessed using the datasets library.

Installation

First, install the Hugging Face Datasets library if you haven't already:

pip install datasets

Loading the Dataset

You can load a specific configuration of the TemplateGSM dataset as follows:

from datasets import load_dataset

# Load a specific configuration
dataset = load_dataset("math-ai/TemplateGSM", "templategsm-4000-1k")

Replace "templategsm-4000-1k" with any other valid configuration name, such as "templategsm-1000-1k" or "templategsm-7473-1k".

Exploring the Data

Here's how you might explore the dataset:

# Access the first problem
first_problem = dataset['train'][0]

print("Problem Statement:\n", first_problem['problem'])
print("\nSolution Code:\n", first_problem['solution_code'])
print("\nNatural Language Solution:\n", first_problem['solution_wocode'])
print("\nFinal Answer:\n", first_problem['result'])

Example

An example of a problem and its solutions from the dataset:

Problem Statement:

Jamie bought 15 apples from the market. She gave 5 apples to her friend and then bought 3 more apples. How many apples does Jamie have now?

Solution Code:

# Initial number of apples Jamie bought
initial_apples = 15

# Apples given to friend
apples_given = 5

# Apples bought later
apples_bought_later = 3

# Calculating the remaining apples after giving some away
apples_after_giving = initial_apples - apples_given

# Total apples Jamie has now
total_apples = apples_after_giving + apples_bought_later

result = total_apples

Natural Language Solution:

Jamie initially bought 15 apples. She gave 5 apples to her friend, so she had 15 - 5 = 10 apples left. Then she bought 3 more apples, so now she has 10 + 3 = 13 apples.

Final Answer:

13


Access the Dataset

You can access the TemplateGSM dataset on Hugging Face:

https://huggingface.co/datasets/math-ai/TemplateGSM


License

TemplateGSM is made available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.


Citation

If you utilize Template-based Data Generation (TDG) or the TemplateGSM dataset in your research or applications, please consider citing:

@article{zhang2024training,
  title={Training and Evaluating Language Models with Template-based Data Generation},
  author={Zhang, Yifan and et al.},
  journal={arXiv preprint},
  year={2024}
}

Concluding Remarks

TemplateGSM represents a significant step forward in bridging the gap between natural language processing and mathematical reasoning. By providing a massive, diverse, and high-quality dataset of math problems and solutions, we aim to empower researchers and practitioners to develop language models with enhanced problem-solving abilities.

The TDG method, with its utilization of GPT-4 for meta-template generation, introduces a novel approach to data augmentation, ensuring both diversity and scalability. We encourage the research community to utilize TemplateGSM for training, fine-tuning, and evaluating LLMs in mathematical reasoning tasks.

Together, we can advance the capabilities of AI systems to understand and solve complex problems, bringing us closer to truly intelligent machines.


For more details and to contribute, please visit our GitHub repository:

https://github.com/iiis-ai/TemplateMath


Happy coding and modeling!