microsoft/Florence-2-large · Florence-2 Large Model Struggles with Color Recognition in Phrase Grounding Task

Hello everyone,

I recently experimented with the Florence-2 Large model, specifically on a task involving caption to phrase grounding. Here's what I observed:

Scenario: I used two images containing various colored electric cables and provided the text input "yellow electric cables" for the phrase grounding task.

Issue: Instead of accurately identifying and surrounding the yellow electric cables, the model surrounded the entire picture with a bounding box labeled "yellow electric cables." in the first image and surrounded red electric cables with a bounding box labeled "yellow electric cables." in the second image.

This suggests that the model may have difficulties distinguishing specific colors or objects within an image when given a direct phrase to ground. Has anyone else experienced similar issues with color recognition in phrase grounding tasks using Florence-2 Large? If so, how did you address it?

I'm looking forward to hearing your thoughts and suggestions!

Thank you!