PosterReward- Achieving Accurate Assessment of Graphic Design using AI Preferences
如果无法正常显示,请先停止浏览器的去广告插件。
1. Part 2. Poster Reward Model
PosterReward: Achieving Accurate Assessment of Graphic Design using
AI Preferences
GitHub: https://github.com/MeiGen-AI/PosterReward
Web: https://alexlai2860.github.io/PosterReward/
Model: https://huggingface.co/MeiGen-AI/PosterReward_v1
ArXiv: https://arxiv.org/abs/2603.29855
1
2. 1. Introduction – Reward Model in Visual Generation
Introduction:
Reward Modeling for Visual Genera
tion:
A neural network trained to output a scal
ar score for a given image and prompt, ac
ting as a measurable proxy for human pre
ference to guide model optimization.
Preference Datasets:
Preference Dataset Example from ImageReward
A dataset containing prompts and genera
ted image pairs labeled with rankings (e.
g., "winner" vs. "loser") to capture human
judgments on quality and alignment.
2
3. 1. Introduction – Motivation of PosterReward
Current Challenges:
Background:
•
•
Recent text-to-image models have made end-to-end poster generation
increasingly feasible.
Current reward models only evaluate general preferences. This has
limited the development of the poster generation.
•
•
•
Training a reward model requires preference data.
Human preference annotation is costly and hard
to scale.
Discriminative reward models are efficient but
often black-box.
Dedicated benchmarks for poster reward modeling
are missing.
Key contributions:
- Propose and validate an AI-based pipeline for scalable
poster preference data construction.
- Develop PosterReward with both design analysis and
scalar scoring.
- Build PosterRewardBench and PosterBench for reward
model and generation evaluation.
Scoring Result of PosterReward
3
4. 2. Method – Dataset Construction
We propose using MLLMs for labeling, significantly reducing dataset construction costs through a carefully
designed pipeline. We first use small-sized models to initially screen and reduce the amount of data, and finally
use the consensus of multiple MLLMs as preference labels.
4
5. 2. Method – Dataset Construction
… beer house poster with a neon-lit beer tap on
the left, vinyl records beside it , and a frothy mug
in the center … [Prompt Following]
… The title “终结者2” is presented in a futuristic,
three-dimensional sans-serif font at the bottom
center of the poster … [Layout Design]
AI-human preference
consistency
Data distribution
diagram
… At the top, the title "Neon Sculpture Expo
2024" glows in bold white neon font, while below
the bench is the tagline "Where Art Meets
Future" in a curved neon-blue … [Text Rendering]
… The poster for "Glitter" presents a close-up of
Mariah Carey bathed in light, her head tilted
slightly with a radiant smile and windswept hair …
[Aesthetic Value]
Preference Dataset Sample in
Poster-Preference-70k
5
6. 2. Method – Model Design
Yes. Image 1 is better … / No. Image 2 is …4.86 … (scalar)Image Analysis : 1. Fundamental Image Quality …5.45 … (scalar)
Generative HeadScore HeadGenerative HeadScore Head
Qwen3 LM DecoderQwen3 LM DecoderQwen3 LM DecoderQwen3 LM Decoder
Vision Encoder
Image1, Image2
Prompt
+
Question
Vision Encoder
Image
PosterReward-Pairwise
Joint Supervised Fine-tuning
Task1 : Pairwise Compare
Task2 : Pointwise Analyze
Answer Generation
Vision Encoder
Prompt
PosterReward-Lite
Joint Rejection Sampling
3 samplings
#1. Answer 1
#2. Answer 2
#3. Answer 3
Best Answer
Image
Analysis Module
Score-Module Training
Offline Analysis Construction
Analysis Module
(PosterReward-Pairwise)
Vision Encoder
Prompt
+
Question
Image
PosterReward
Reinforcement Learning
Rewards
Prompt
+
Analysis
Score Module
Poster
Reward
Score Module
Rollout
Analysis Module
PosterReward training pipeline and model structure diagram. We offer three reward models with different structures, and the bottom
shows the training pipeline. Our training pipeline consists of four cascaded stages: Joint Supervised Fine-Tuning, Joint Rejection Sampling,
Score-Module Training, and Reinforcement Learning.
6
7. 2. Method – Experiment
Design validity verification 1:
Joint Fine-tuning the understanding and
evaluation tasks can provide better performance
on both tasks.
Design validity verification 2:
The inclusion of analytical text and the GRPO
strategy of "Scorer as mllm reward" can
effectively improve model performance.
7
8. 2. Method – Experiment
Achieve SOTA
performance on both
in-domain and out-of-
domain datasets.
Using PosterReward as
the reward model and
performing
reinforcement learning
on Qwen-Image using
Flow-GRPO.
Qwen
-Image
+Poster +HPSv3 +Unified
Reward
Reward
+Pick
Score
8
9. 2. Method – Experiment
Visual comparison of
SD3.5-Medium fine-
tuned with various
reward models
From Left to Right:
SD3.5-Medium,
PosterReward,
HPSv3,
UnifiedReward,
PaddleOCR,
UnifiedReward +
PaddleOCR
9
10. 2. Method – Experiment
We further propose
PosterBench, which uses
PosterReward to evaluate
the performance of
different advanced
generative models in
graphic design scenarios.
The ranking obtained
using PosterReward is
consistent with human
subjective perception,
which further proves
that PosterReward has
good generalization
ability.
10
11. Thank You