- Obtain the GEMINI API key from here.
- The system prompt (
verifier_prompt.txt
) comes from Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps. - The script currently doesn't make concurrent API calls to keep things simple. To speed things up, for the batched setup, concurrency would be crucial.
- An online demo to instantly try it out is available here.
- To use an open model like Qwen2.5, refer here (TODO).
When the grade_images_with_gemini.py
is successfully executed without any changes, it should print:
[
{
"accuracy_to_prompt": {
"score": 8.0,
"explanation": "The image includes a black SUV and a mountain in the background, aligning with the prompt's description. However, the car isn't very shiny."
},
"creativity_and_originality": {
"score": 6.0,
"explanation": "The image does not show much originality, but is a fairly standard and realistic depiction of the prompt."
},
"visual_quality_and_realism": {
"score": 9.0,
"explanation": "The visual quality and realism are high, with good detail in the car and the mountain landscape."
},
"consistency_and_cohesion": {
"score": 9.0,
"explanation": "The image is consistent and cohesive, with the car appropriately placed in the landscape and the perspective making sense."
},
"emotional_or_thematic_resonance": {
"score": 7.0,
"explanation": "The image evokes a sense of adventure or travel, fitting the scene and the presence of the SUV."
},
"overall_score": {
"score": 7.4,
"explanation": "Overall, the image is a good representation of the prompt, with minor issues in shine and originality."
}
},
{
"accuracy_to_prompt": {
"score": 10.0,
"explanation": "The image accurately depicts a green and funny creature standing in front of a forest. The prompt is completely fulfilled."
},
"creativity_and_originality": {
"score": 9.0,
"explanation": "The design of the creature is quite creative and unusual, adding an element of originality to the image."
},
"visual_quality_and_realism": {
"score": 10.0,
"explanation": "The visual quality is high with good detail. Realism is stylized, but well-rendered and visually appealing."
},
"consistency_and_cohesion": {
"score": 10.0,
"explanation": "The image is cohesive, with the character fitting naturally into the forest background."
},
"emotional_or_thematic_resonance": {
"score": 10.0,
"explanation": "The image successfully evokes a sense of humor and whimsy, matching the 'funny' aspect of the prompt."
},
"overall_score": {
"score": 9.8,
"explanation": "Overall, the image is an excellent representation of the prompt with high accuracy, visual quality, and thematic resonance."
}
}
]
Note
Thanks to Google for providing Google Cloud Project credits to support the project.