PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking

We introduce PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning), a framework that combines preference optimization with concepts from Reinforcement Learning (RL) to enable models to self-teach through iterative reasoning improvements. Central to PRefLexOR are thinking tokens, which explicitly mark reflective reasoning phases within model outputs, allowing the model to recursively engage in multi-step reasoning, revisiting, and refining intermediate steps before producing a final output. The foundation of PRefLexOR lies in Odds Ratio Preference Optimization (ORPO), where the model learns to align its reasoning with human-preferred decision paths by optimizing the log odds between preferred and non-preferred responses. The integration of Direct Preference Optimization (DPO) further enhances model performance by using rejection sampling to fine-tune reasoning quality, ensuring nuanced preference alignment. This hybrid approach between ORPO and DPO mirrors key aspects of RL, where the model is continuously guided by feedback to improve decision-making and reasoning. Active learning mechanisms allow PRefLexOR to dynamically generate new tasks, reasoning steps, and rejected answers on-the-fly during training. This adaptive process enables the model to self-teach as it continually improves through real-time feedback and recursive processing.

Our method diverges from traditional approaches by not relying on pre-generated datasets; instead, it dynamically generates new tasks, reasoning steps, and feedback on the fly, allowing the model to continuously adapt and improve in real time. Recursive optimization within the thinking token framework introduces iterative feedback loops, where the model refines its reasoning, much like policy refinement in RL, achieving deeper coherence, consistency, and adaptability. By recursively optimizing reasoning through feedback-driven learning, PRefLexOR achieves significant flexibility in its ability to handle complex tasks, learning and evolving its cognitive abilities autonomously. This framework advances the field of cognitive alignment by demonstrating that models can iteratively teach themselves to reason with greater depth and reflectivity, akin to an RL-based self-improving system capable of solving open-domain problems with superior reasoning depth and logic. Our implementation is straightforward and can be Incorporated into any existing pretrained LLM. The approach is demonstrated in use cases of materials design applications, where a small language model is trained to develop sophisticated reasoning capabilities. Thereby, PRefLexOR builds a dynamic knowledge graph by generating questions from random text and using Retrieval-Augmented Generation (RAG) to retrieve contextually relevant data from the entire corpus, facilitating recursive reasoning through complex interactions between similar nodes in the embedding space.

Source code: https://github.com/lamm-mit/PRefLexOR

Fig_100

Figure 1: Illustration of the workflow and design principles behind generative materials informatics. Panel a: The process of transforming information into knowledge and actionable outcomes. Each individual piece of information (left) is synthesized into a network of interconnected knowledge, leading to informed decisions and innovative designs (right). Panel b: Conventional approaches in materials science rely on data-driven models, partial differential equations (PDEs), and experimental results, focusing on single-step predictions. Panel c: In contrast, generative materials informatics models built on the PRefLexOR framework proposed in this paper use 'thinking' and 'reflection' explicitly by incorporating iterative reasoning and contextual understanding, allowing for more complex, multi-step predictions. This approach expands from single inference steps, includes multiple modalities of data and responses, integrates real-world feedback and physics, and leverages self-assessment and self-learning. Using using reinforcement learning (RL) principles, the discovery of principles or the solution of specific tasks is further inspired by biological paradigms, using bio-inspired neural network designs. These advanced methods support continuous improvement in material predictions, enabling more adaptable and intelligent designs

image

Figure 2: PRefLexOR Recursive Reasoning Algorithm: An iterative approach leveraging a fine-tuned Reasoning Model and a general-purpose Critic Model to generate, refine, and optionally integrate responses. The process involves generating initial responses, extracting reflections, improving thinking processes, and creating new responses based on refined thinking, with an optional final integration step. The algorithm relies on extracting thinking processes (indicated via <|thinking|>...<|/thinking|>) and reflection processes (indicated via <|reflect|>...<|/reflect|>). The use of special tokens allows us to easily construct such agentic modeling as it facilitates pausing inference, improving the strategy, and re-generating improved answers. The sampled responses can either be used in their final state or integrated into an amalgamated response that shows very rich facets in the scientific process.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name='lamm-mit/PRefLexOR_ORPO_DPO_EXO_REFLECT_10222024'
model = AutoModelForCausalLM.from_pretrained(model_name,     
    torch_dtype =torch.bfloat16,
    attn_implementation="flash_attention_2",device_map="auto",trust_remote_code=True,
    )
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,
                                          use_fast=False,
                                         )

This model produces both thinking and reflection sections, marked by these special tokens:

thinking_start = '<|thinking|>'
thinking_end = '<|/thinking|>'
reflect_start="<|reflect|>"
reflect_end= "<|/reflect|>"

image/png

Inference examples

We provide several examples for how to use this model, using Colab notebooks and Hugging Face transformers.

Colab PRefLexOR Inference: Thinking and Reflection and Agentic Reasoning

Open in Colab

Simple inference:

from PRefLexOR import *

txt = 'What is the relationship between materials and music?' + f' Use {think_start}.'

output_text, messages = generate_local_model(
    model=model, 
    tokenizer=tokenizer, 
    prompt=txt, 
    system_prompt='',  
    num_return_sequences=1, 
    repetition_penalty=1.0, 
    temperature=0.1, 
    max_new_tokens=2024, 
    messages=[], 
    do_sample=True
)

print(output_text)

Extract thinking and output:

thinking    = extract_text(output_text, thinking_start=think_start, thinking_end=think_end)[0].strip()
reflection  = extract_text(output_text, thinking_start=reflect_start, thinking_end=reflect_end)[0].strip()
answer_only = extract_text(output_text, thinking_start=reflect_end, thinking_end="NONE").strip()

print ("THINKING:\n\n", thinking)
print ("REFLECTION:\n\n", reflection)
print ("ANSWER:\n\n", answer_only)

Example output text:

<|thinking|>
**Reasoning Steps**:

Materials influence music through:
1. **Composition**: Materials can be used as sound sources, creating audio frequencies.
2. **Production**: Materials can be employed in musical instruments, producing sound waves.
3. **Design**: Materials can guide the design of instruments, determining their acoustic properties.
4. **Experimentation**: Materials can be manipulated to create unusual sounds or textures, experimenting with sound.

**Relevant Materials or Concepts**:

Materials, Sound Waves, Musical Instruments, Acoustic Properties, Experimentation

**Design Principles**:

Materials significantly influence music through four design principles:

1. **Composition as Sound Source**: Materials can be used as sound sources, creating audio frequencies. This is evident in instruments like the piano or guitar, where the material properties of the string or soundboard directly affect the sound produced.

2. **Instrument Design**: Materials play a crucial role in the design of musical instruments. Different materials possess unique physical properties that can enhance or modify the sound produced by an instrument. For instance, the hollow body of a guitar resonates sound waves, producing a distinctive sound.

3. **Material Manipulation for Unusual Sounds**: Materials can be manipulated to create unusual or experimental sounds. This is evident in the study's example of the metal flute, where playing with the properties of metal allow for the production of unusual sounds.

**Material Properties or Features**:

Materials' properties and features significantly influence music through four primary aspects: composition, production, design, and experimentation.

**Hypothesis**:

"Materials inherently influence music through composition, production, design, and experimentation."

**Additional Background**:

Materials play a significant role in music, extending from the composition and production of sound to the design and manipulation of musical instruments. Here are some additional aspects where materials influence music:

1. **Composition and Production**: Materials can be used as sound sources, such as those found in nature, like seashells or wood, which can be struck or blown to produce sound. Additionally, materials can be used to create musical instruments, like metal wind chimes or glass harmonicas, which produce sound when vibrated or bent.

2. **Design**: Materials can guide the design of musical instruments, determining their acoustic properties, such as frequency range, resonance, and timbre. For instance, the trumpet's conical shape and metal alloy contribute to its bright, piercing sound, while the piano's wooden body and string material produce a warm, nuanced sound.

3. **Experimentation**: Materials can be manipulated to create unusual or experimental sounds, opening up new possibilities for musical expression. For example, the sounds produced by striking or blowing on various materials, such as glass, wood, or metal, can be unique and interesting, allowing for creative sound design and improvisation.

4. **Cultural Significance**: Materials have significant cultural and historical significance in music, reflecting the traditions, beliefs, and practices of societies. For instance, the use of wood in stringed instruments, such as violins and guitars, is common in Western classical music, while in other cultures, like African or Asian, materials like wood, bone, or silk may be used. Additionally, the decoration and ornamentation of instruments, often using materials like metal, can convey meaning and identity.

5. **Sustainability and Technology**: As technology advances and sustainability becomes more prevalent, materials are likely to play an even greater role in music, particularly in the production and distribution of music. For example, carbon fiber, a lightweight, high-stiffness material, is already used in some musical instruments, such as flutes, while advancements in biotechnology could lead to the development of sustainable, functional materials for music-related applications.

These aspects highlight the diverse and interconnected ways in which materials influence music, from composition and production to design, experimentation, and cultural significance.

<|/thinking|>
<|reflect|>
**Improvements/Corrections**:

1. **Clarify the Scope**: Specify whether the analysis focuses on traditional or classical music, or if it also considers other genres and forms of music.

2. **Consider Environmental Factors**: While not the primary focus, materials' environmental properties (e.g., density, conductivity, durability) can also influence music, especially in the design and production of instruments.

3. **Acknowledge Limitations**: Recognize that materials have inherent limitations, such as frequency ranges or resonance characteristics, which can constrain musical expression.

4. **Emphasize Interactions**: Highlight the complex interactions between materials, such as their internal structure, surface properties, and their influence on sound production and manipulation.

5. **Provide Examples**: Include specific examples of materials and their roles in music to make the analysis more concrete and relatable.
<|/reflect|>

Materials play a significant role in music, extending from the composition and production of sound to the design and manipulation of musical instruments. Here are some key aspects where materials influence music:

1. **Composition and Production**: Materials can serve as sound sources, such as those found in nature, like seashells or wood, which can be struck or blown to produce sound. Additionally, materials can be used to create musical instruments, like metal wind chimes or glass harmonicas, which produce sound when vibrated or bent.

2. **Instrument Design**: Materials play a crucial role in the design of musical instruments. They can enhance or modify the sound produced by an instrument. For instance, the hollow body of a guitar resonates sound waves, producing a distinctive sound. Similarly, the body of a drum acts as a resonator, amplifying the sound produced by the drumhead and bass.

3. **Experimentation**: Materials can be manipulated to create unusual sounds or textures, opening up new possibilities for musical expression. For example, the sounds produced by striking or blowing on various materials, such as glass, wood, or metal, can be unique and interesting, allowing for creative sound design and improvisation.

4. **Cultural Significance**: Materials have significant cultural and historical significance in music, reflecting the traditions, beliefs, and practices of societies. For instance, the use of wood in stringed instruments, such as violins and guitars, is common in Western classical music. In other cultures, like African or Asian, materials like wood, bone, or silk may be used for different musical purposes.

5. **Sustainability and Technology**: As technology advances and sustainability becomes more prevalent, materials are likely to play an even greater role in music, particularly in the production and distribution of music. For example, carbon fiber, a lightweight, high-stiffness material, is already used in some musical instruments, such as flutes, while advancements in biotechnology could lead to the development of sustainable, functional materials for music-related applications.

The model response provides a thorough assessment of how materials impact music through sound production, instrument design, experimentation, cultural significance, and sustainability. It covers each area effectively, particularly by showing how materials like wood, metal, and glass influence sound creation and resonate differently in instruments, adding unique acoustic qualities such as timbre. The response is well-structured, progressing from foundational concepts to broader impacts, and it is made more engaging by specific examples, like the guitar’s hollow body and carbon fiber’s role in modern instruments.

Recursive inference using multi-agentic modeling

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load reasoning model
model_name='lamm-mit/PRefLexOR_ORPO_DPO_EXO_REFLECT_10222024'
model = AutoModelForCausalLM.from_pretrained(model_name,     
    torch_dtype =torch.bfloat16,
    attn_implementation="flash_attention_2",device_map="auto",trust_remote_code=True,
    )
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True,
                                          use_fast=False,
                                         )
# Load critic model
model_name_critic = "meta-llama/Llama-3.2-3B-Instruct"

critic_model = AutoModelForCausalLM.from_pretrained(
    model_name_critic, 
    torch_dtype=torch.bfloat16, 
    attn_implementation="flash_attention_2", 
    device_map="auto", 
    trust_remote_code=True
)

Example inference

output_text, output_list, output_text_integrated = recursive_response(
    model=model, 
    tokenizer=tokenizer, 
    model_critic=model_base, 
    tokenizer_critic=tokenizer, 
    question='How do biological materials fail gracefully? Brief answer.', 
    N=3, 
    temperature=0.1, 
    temperature_improvement=0.1, 
    system_prompt='You are a helpful assistant.', 
    system_prompt_critic='You carefully improve responses, with attention to detail, and following all directions.'
)

Printing the output:

for i, item in enumerate(output_list):
    answer_only = extract_text(item, thinking_start="<|/reflect|>", thinking_end="NONE")
    print(f"i={i}", 64 * "-")
    print(answer_only)

print(64 * "#")
print ("INTEGRATED RESPONSE:")
print(output_text_integrated)
print(64 * "#")

Citation

@article{buehler2024PRefLexOR,
      title={PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking}, 
      author={Markus J. Buehler},
      year={2024},
      eprint={2410.12375},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2410.12375}, 
}
Downloads last month
27
Safetensors
Model size
3.61B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including lamm-mit/PRefLexOR_ORPO_DPO_EXO_REFLECT_10222024