Florence-2 Large Model Struggles with Color Recognition in Phrase Grounding Task
Hello everyone,
I recently experimented with the Florence-2 Large model, specifically on a task involving caption to phrase grounding. Here's what I observed:
Scenario: I used two images containing various colored electric cables and provided the text input "yellow electric cables" for the phrase grounding task.
Issue: Instead of accurately identifying and surrounding the yellow electric cables, the model surrounded the entire picture with a bounding box labeled "yellow electric cables." in the first image and surrounded red electric cables with a bounding box labeled "yellow electric cables." in the second image.
This suggests that the model may have difficulties distinguishing specific colors or objects within an image when given a direct phrase to ground. Has anyone else experienced similar issues with color recognition in phrase grounding tasks using Florence-2 Large? If so, how did you address it?
I'm looking forward to hearing your thoughts and suggestions!
Thank you!