Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding

Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding
Author :
Publisher :
Total Pages : 105
Release :
ISBN-10 : OCLC:1078238026
ISBN-13 :
Rating : 4/5 ( Downloads)

Book Synopsis Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding by : Chengcheng Yu

Download or read book Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding written by Chengcheng Yu and published by . This book was released on 2017 with total page 105 pages. Available in PDF, EPUB and Kindle. Book excerpt: My thesis studies this topic in three perspective: (1) 3D scene reconstruction to understand the 3D structure of a scene. (2) Geometry and physics reasoning to understand the relationships of objects in a scene. (3) The interaction between human action and objects in a scene. Specifically, the 3D reconstruction builds a unified grammatical framework capable of reconstructing a variety of scene types (e.g., urban, campus, county etc.) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; (ii) pair-wise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods. For geometry and physics reasoning, we present an approach for scene understanding by reasoning physical stability of objects from point cloud. We utilize a simple observation that, by human design, objects in static scenes should be stable with respect to gravity. This assumption is applicable to all scene categories and poses useful constraints for the plausible interpretations (parses) in scene understanding. Our method consists of two major steps: 1) geometric reasoning: recovering solid 3D volumetric primitives from defective point cloud; and 2) physical reasoning: grouping the unstable primitives to physically stable objects by optimizing the stability and the scene prior. We propose to use a novel disconnectivity graph (DG) to represent the energy landscape and use a Swendsen-Wang Cut (MCMC) method for optimization. In experiments, we demonstrate that the algorithm achieves substantially better performance for i) object segmentation, ii) 3D volumetric recovery of the scene, and iii) better parsing result for scene understanding in comparison to state-of-the-art methods in both public dataset and our own new dataset. Detecting potential dangers in the environment is a fundamental ability of living beings. In order to endure such ability to a robot, my thesis presents an algorithm for detecting potential falling objects, i.e. physically unsafe objects, given an input of 3D point clouds captured by the range sensors. We formulate the falling risk as a probability or a potential that an object may fall given human action or certain natural disturbances, such as earthquake and wind. Our approach differs from traditional object detection paradigm, it first infers hidden and situated "causes (disturbance) of the scene, and then introduces intuitive physical mechanics to predict possible "effects (falls) as consequences of the causes. In particular, we infer a disturbance field by making use of motion capture data as a rich source of common human pose movement. We show that, by applying various disturbance fields, our model achieves a human level recognition rate of potential falling objects on a dataset of challenging and realistic indoor scenes.


Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding Related Books

Single View 3D Reconstruction and Parsing Using Geometric Commonsense for Scene Understanding
Language: en
Pages: 105
Authors: Chengcheng Yu
Categories:
Type: BOOK - Published: 2017 - Publisher:

DOWNLOAD EBOOK

My thesis studies this topic in three perspective: (1) 3D scene reconstruction to understand the 3D structure of a scene. (2) Geometry and physics reasoning to
Computer Vision – ECCV 2022
Language: en
Pages: 806
Authors: Shai Avidan
Categories: Computers
Type: BOOK - Published: 2022-10-20 - Publisher: Springer Nature

DOWNLOAD EBOOK

The 39-volume set, comprising the LNCS books 13661 until 13699, constitutes the refereed proceedings of the 17th European Conference on Computer Vision, ECCV 20
Representations and Techniques for 3D Object Recognition and Scene Interpretation
Language: en
Pages: 172
Authors: Derek Hoiem
Categories: Computers
Type: BOOK - Published: 2011 - Publisher: Morgan & Claypool Publishers

DOWNLOAD EBOOK

One of the grand challenges of artificial intelligence is to enable computers to interpret 3D scenes and objects from imagery. This book organizes and introduce
Learning Single-view 3D Reconstruction of Objects and Scenes
Language: en
Pages: 122
Authors: Shubham Tulsiani
Categories:
Type: BOOK - Published: 2018 - Publisher:

DOWNLOAD EBOOK

We address the task of inferring the 3D structure underlying an image, in particular focusing on two questions -- how we can plausibly obtain supervisory signal
Combining Geometry and Learning for Scene Understanding
Language: en
Pages: 344
Authors: Arun Kumar Chockalingam Santha Kumar
Categories:
Type: BOOK - Published: 2018 - Publisher:

DOWNLOAD EBOOK

When an image is captured, the 3D Euclidean space describing its world is projected onto a 2D plane, effectively losing most pertinent underlying 3D Euclidean g