Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, e.g., collapsing diverse "human walk on / sit on / lay on beach" into "human on beach". Given such SGG, the down-stream tasks such as VQA can hardly infer better scene structures than merel...
Highlights, strengths & weaknesses, commercial applications, and societal impact — written for this paper on demand.
No comments yet
Be the first to share your thoughts!