Vision And Learning Lab

Our research team focuses on computer vision, deep learning, and computer graphics. Research areas include 3D reconstruction, image-based rendering, image and video synthesis, object localization using millimeter-wave radar, unsupervised deep learning, video forgery detection.

We are looking for students, RAs/interns, postdocs, and collaborators. If you’re interested, please directly contact supasorn@gmail.com

Our Members

Faculty Member

Lab Administrator

  • Saralit Eiamcharoen (สราลิต เอี่ยมเจริญ)

Research Assistants

  • Jiraphon Yenphraphai (จิรพนธ์ เย็นพระพาย)

PhD students

  • Nakorn Kumchaiseemak (ณกรณ์ ขำชัยสีเมฆ)
  • Nontawat Tritrong (นนทวัชร  ตริตรอง)
  • Norawit Urailertprasert  (นรวิชญ์ อุไรเลิศประเสริฐ)
  • Pakkapon Phongthawee (ภัคพล พงษ์ทวี)
  • Pattaramanee Arsomngern (ภัทรมณี อาศรมเงิน)
  • Pantawat Ponglertnapakorn (พันธวัตถ์ พงศ์เลิศนภากร)
  • Pitchaporn Rewatbowornwong (พิชชาพร เรวัตบวรวงศ์)
  • Sasikarn Khwanmuang (ศศิกาญจน์ ขวัญเมือง)
  • Suttisak Wizadwongsa (สุทธิศักดิ์ วิเศษวงษา)

Research topics

3D Light-field Reconstruction

We explore the problem of view synthesis which allows for the generation of new views of objects or scenes from a set of sparse input images. We focus on objects with complex appearances such as glass or fluffy objects with thin structures that foil state-of-the-art techniques for 3D reconstruction. Our ultimate goal is to build a tool that can visualize any object in a Virtual Reality (VR) or Augmented Reality (AR) headset with a true 3D parallax as well as on any desktop and mobile phone in real-time. Main applications include displaying products for online e-commerce and digitizing artifacts.

Solving this requires understanding the 3D scene from 2D images and a new efficient, compact representation that supports real-time rendering. Challenges include handling view-dependent effects such as those from a glossy surface, occlusions, and depth uncertainty.  We’re addressing these with a practical and robust deep learning solution.

Improve the quality of Time-Lapse Reconstruction from Internet Photos

Recently, Ricardo Martin et al. shows how one can build a time-lapse from internet photos of a landmark taken by different people at different times and locations. The results were awe-inspiring and even showed some surprising changes that could never be noticed otherwise. However, real-world scenes are dynamic, filled with constantly-changing building structures, moving trees, and energetic crowds of visiting tourists. Unfortunately, these non-stationary parts of the scene lead to unrealistic and blurry time-lapse videos. We aim to improve upon the resolution, sharpness, temporal smoothness, and realism of the current state of the art, with a direct collaboration with Dr. Ricardo Martin.

Unsupervised Landmark Detection via Spatial Reasoning

Detecting visual landmarks or keypoints is a crucial component in many vision applications such as robot localization, object detection & retrieval, 3D reconstruction. However, building state-of-the-art detectors often require collecting and manually annotating a dataset which is expensive and time-consuming. We ask: Is it possible to leverage some form of spatial reasoning to automatically build a detector that is not only cheaper to build but also outperforms those that require human annotations for downstream tasks? Here we explore the idea of collaborative dual network training that learns to select and detect landmarks in an unsupervised fashion.

Tracing the Source of DeepFake

DeepFake can be used in many harmful ways, for example, delivery misleading political speeches, cyberbullying, etc. What is DeepFake? It is a video of a person generated by an “AI”(Face2Face, FaceSwap, DeepFake) which could be made to say new things or act arbitrarily.  To generate a new DeepFake video, most AI methods require a real video of the person as a source content to synthesize a new speech. The ability to reliably retrieve the original source of a DeepFake could provide evidence for the inauthenticity of a video in question, particularly for public figures whose video clips are commonly accessible. Our goal is to develop a system that can index online videos and perform video retrieval to help combating the misuse of DeepFake videos.

Set Representation Learning for Retrieval Task

Set is a type of data that naturally arises in real-world and defines a collection of things, such as a set objects in an image or a set of 3D points ("point cloud") captured from a laser scanner. Utilizing these sets with modern techniques such as Deep Learning turns out to be challenging because the set has a property called permutation invariance (which means the elements in the set can be swapped around without affecting what the set represents) and traditional neural networks treat each set element's position differently. In this project, we aim to design a network that can handle set input correctly, thus allowing set to be more widely used for other several tasks like classification, retrieval, and segmentation.

Publications

Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning”, S. Suwajanakorn, N. Snavely, J. Tompson, M. Norouzi, NIPS 2018 (oral).