VisionReward-Image

Introduction

We present VisionReward, a general strategy to aligning visual generation models——both image and video generation——with human preferences through a fine-grainedand multi-dimensional framework. We decompose human preferences in images and videos into multiple dimensions,each represented by a series of judgment questions, linearly weighted and summed to an interpretable and accuratescore. To address the challenges of video quality assess-ment, we systematically analyze various dynamic features of videos, which helps VisionReward surpass VideoScore by 17.2% and achieve top performance for video preference prediction. Here, we present the model of VisionReward-Image.

Merging and Extracting Checkpoint Files

Use the following command to merge the split files into a single .tar file and then extract it into the specified directory:

cat ckpts/split_part_* > ckpts/visionreward_image.tar
tar -xvf ckpts/visionreward_image.tar

Using this model

You can quickly install the Python package dependencies and run model inference in our github.