PosterReward- Achieving Accurate Assessment of Graphic Design using AI Preferences

如果无法正常显示，请先停止浏览器的去广告插件。

1. Part 2. Poster Reward Model PosterReward: Achieving Accurate Assessment of Graphic Design using AI Preferences GitHub: https://github.com/MeiGen-AI/PosterReward Web: https://alexlai2860.github.io/PosterReward/ Model: https://huggingface.co/MeiGen-AI/PosterReward_v1 ArXiv: https://arxiv.org/abs/2603.29855 1

2. 1. Introduction – Reward Model in Visual Generation Introduction: Reward Modeling for Visual Genera tion: A neural network trained to output a scal ar score for a given image and prompt, ac ting as a measurable proxy for human pre ference to guide model optimization. Preference Datasets: Preference Dataset Example from ImageReward A dataset containing prompts and genera ted image pairs labeled with rankings (e. g., "winner" vs. "loser") to capture human judgments on quality and alignment. 2

3. 1. Introduction – Motivation of PosterReward Current Challenges: Background: • • Recent text-to-image models have made end-to-end poster generation increasingly feasible. Current reward models only evaluate general preferences. This has limited the development of the poster generation. • • • Training a reward model requires preference data. Human preference annotation is costly and hard to scale. Discriminative reward models are efficient but often black-box. Dedicated benchmarks for poster reward modeling are missing. Key contributions: - Propose and validate an AI-based pipeline for scalable poster preference data construction. - Develop PosterReward with both design analysis and scalar scoring. - Build PosterRewardBench and PosterBench for reward model and generation evaluation. Scoring Result of PosterReward 3

4. 2. Method – Dataset Construction We propose using MLLMs for labeling, significantly reducing dataset construction costs through a carefully designed pipeline. We first use small-sized models to initially screen and reduce the amount of data, and finally use the consensus of multiple MLLMs as preference labels. 4

5. 2. Method – Dataset Construction … beer house poster with a neon-lit beer tap on the left, vinyl records beside it , and a frothy mug in the center … [Prompt Following] … The title “终结者2” is presented in a futuristic, three-dimensional sans-serif font at the bottom center of the poster … [Layout Design] AI-human preference consistency Data distribution diagram … At the top, the title "Neon Sculpture Expo 2024" glows in bold white neon font, while below the bench is the tagline "Where Art Meets Future" in a curved neon-blue … [Text Rendering] … The poster for "Glitter" presents a close-up of Mariah Carey bathed in light, her head tilted slightly with a radiant smile and windswept hair … [Aesthetic Value] Preference Dataset Sample in Poster-Preference-70k 5

6. 2. Method – Model Design Yes. Image 1 is better … / No. Image 2 is …4.86 … (scalar)Image Analysis : 1. Fundamental Image Quality …5.45 … (scalar) Generative HeadScore HeadGenerative HeadScore Head Qwen3 LM DecoderQwen3 LM DecoderQwen3 LM DecoderQwen3 LM Decoder Vision Encoder Image1, Image2 Prompt + Question Vision Encoder Image PosterReward-Pairwise Joint Supervised Fine-tuning Task1 : Pairwise Compare Task2 : Pointwise Analyze Answer Generation Vision Encoder Prompt PosterReward-Lite Joint Rejection Sampling 3 samplings #1. Answer 1 #2. Answer 2 #3. Answer 3 Best Answer Image Analysis Module Score-Module Training Offline Analysis Construction Analysis Module (PosterReward-Pairwise) Vision Encoder Prompt + Question Image PosterReward Reinforcement Learning Rewards Prompt + Analysis Score Module Poster Reward Score Module Rollout Analysis Module PosterReward training pipeline and model structure diagram. We offer three reward models with different structures, and the bottom shows the training pipeline. Our training pipeline consists of four cascaded stages: Joint Supervised Fine-Tuning, Joint Rejection Sampling, Score-Module Training, and Reinforcement Learning. 6

7. 2. Method – Experiment Design validity verification 1: Joint Fine-tuning the understanding and evaluation tasks can provide better performance on both tasks. Design validity verification 2: The inclusion of analytical text and the GRPO strategy of "Scorer as mllm reward" can effectively improve model performance. 7

8. 2. Method – Experiment Achieve SOTA performance on both in-domain and out-of- domain datasets. Using PosterReward as the reward model and performing reinforcement learning on Qwen-Image using Flow-GRPO. Qwen -Image +Poster +HPSv3 +Unified Reward Reward +Pick Score 8

9. 2. Method – Experiment Visual comparison of SD3.5-Medium fine- tuned with various reward models From Left to Right: SD3.5-Medium, PosterReward, HPSv3, UnifiedReward, PaddleOCR, UnifiedReward + PaddleOCR 9

10. 2. Method – Experiment We further propose PosterBench, which uses PosterReward to evaluate the performance of different advanced generative models in graphic design scenarios. The ranking obtained using PosterReward is consistent with human subjective perception, which further proves that PosterReward has good generalization ability. 10

11. Thank You