Saturday Sep 23, 2023
arxiv Preprint - LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
In this episode we discuss LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent by Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai. The paper introduces a new approach called LLM-Grounder for grounding 3D visual scenes using natural language queries. It utilizes a Large Language Model (LLM) to break down complex queries and a visual grounding tool to identify objects in the scene. The method does not require labeled training data and achieves state-of-the-art accuracy on the ScanRefer benchmark.
Comments (0)
To leave or reply to comments, please download free Podbean or
No Comments
To leave or reply to comments,
please download free Podbean App.