Skip to content

Keypoint RCNN visibility flag for keypoints #5872

Open
@mbadal1996

Description

@mbadal1996

🚀 The feature

Hello All,

This is only my first day posting a request here so I apologize for any errors on my part. Also, sorry for the long post below.

The purpose of this post is to request an improvement/correction for the visibility flag behavior of Keypoint RCNN. Based on my results and those of other users I have encountered on different forums and sites, Keypoint RCNN always predicts a flag value of v=1 for all keypoints, no matter the training flag value for v>0 (even v=0), and predicts coordinates for them as well. In other words, the model does not appear to actually learn the flag value. My understanding is that the flag should be learned and is supposed to follow the COCO convention (v=0 ‘not in image’; v=1 ‘occluded’; v=2 ‘visible’) but does not do so.

Motivation, pitch

Given the usefulness of the visibility flags, being able to accurately predict them and use the information during inference to mark occluded vs. visible keypoints would be an important addition to the model capability. My understanding is that this is already supposed to be the case, but for some reason the documentation as well as the model behavior on this are lacking. I have found the performance of Keypoint RCNN overall to be very good and I have successfully fine-tuned it on my custom (multiclass) dataset with very good success in predicting the class, bbox, and keypoints. It would be very helpful to be able to distinguish between keypoints using visibility flag.

Alternatives

No response

Additional context

My hope in writing here is to request and encourage updating of the model to address the issue/addition suggested. If not, then if I could please get some help in tracking down the source code where Keypoint RCNN is converting all flags to v=1 and handling/training flags so that I might be able to modify this behavior, as the model does not seem to learn the flag values presently. In my use case, what I want is for Keypoint RCNN to successfully predict the right flag (e.g. v=0) so that I can use it later on, or at least predict a coordinate of (0.0,0.0) (or some other fixed value) for keypoints with v=0. The need is to be able to distinguish between visible and occluded keypoints. Even just two learned flags that work as expected (v=0 and v=1) would be very useful to have. Any suggestions or guidance would be great. Thanks for taking the time to reply.

cc @datumbox @YosuaMichael

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions