Volumetric video technology overlaps with the latest CG industry in terms of equipment, and there are many keywords that are unfamiliar to the video industry. So let’s take a look at the main relevant keywords used in the volumetric video industry.
Free viewpoint video
EyeVision, developed by Professor Takeo Kanade of Carnegie Mellon University, was first used for the 2001 Super Bowl broadcast by CBS.
3D data generated by volumetric capture is sometimes called free-viewpoint video because it can be rendered as 2D video from free viewpoints of up, down, left, right, front and back of the subject in post-processing after shooting. A long time ago, Professor Takeo Kanade of Carnegie Mellon University in the United States and others developed a technology called EyeVision, which became a hot topic when it was used in “Super Bowl 35” held in January 2001.
3D scanner “Matterport Pro3” featuring next-generation LiDAR device Volumetric video is a video that moves in the time direction, but 3D scanning is a technology that captures the shape and texture of stationary objects such as indoors and outdoors of buildings and objects, and it is not assumed that the generated data will move. do not have. In order to perform 3D scanning, an RGB camera for photographing the surface texture of the subject and a scanner (laser, lidar, infrared, etc.) for measuring the 3D shape of the subject are used together.
Photogrammetry, also called photogrammetry, is a method of generating a 3D model from a large number of photographs (still images). Photographs are taken with a general digital camera, and 3D data can be obtained by processing the photographic data with software dedicated to photogrammetry. Since the hardware used is a commercially available camera or personal computer, it is also a method that allows you to easily create a 3D model. It is assumed that the subject is stationary.
While volumetric capture captures the subject’s movement and texture (actual) at the same time, the method of capturing only the movement of the subject, especially the human, is called motion capture. With a head-mounted display (HMD) for VR, the movement and tilt of the HMD and the controller held in the hand can be taken, so that the self (the head and hands of the avatar) reflected on the screen can be changed according to the physical movements of the HMD and the controller. Moving is an example of motion capture.
Light Field Camera
A light field camera is literally a camera that can record a light field. A normal camera sensor records the intensity and color of the light that enters the sensor, but it cannot record the angle at which the light enters. By recording that angle as well, it is possible to change the focus of the photographed photograph in post-processing, and it is possible to change the lighting and texture.
A device that can measure the distance from the camera to the subject is called a depth camera. The table below shows examples of depth cameras on the market as of 2020, and you can see that there are various methods for measuring depth. In the 3D scanning described above, not only dedicated 3D scanners but also these depth cameras are sometimes used.
■ Examples of depth cameras (depth sensors) used for 3D scanning
Microsoft Azure Kinect DK
Intel RealSense D457
Intel RealSense L515
Active IR Stereo（ToF）
Image Sensor Technology
Depth FOV: 120° x 120°
Resolution: 1024 x 1024
Depth Flamerate: 30fps
Maxim Depth Distance: 0.25m
Image Sensor Technology
Depth FOV: 87° x 58°
Resolution: 1280 x 720
Depth Flamerate: Up to 90fps
Maxim Depth Distance: 0.52m
Usage Environment Indoor/Outdoor
One of the basic data processing algorithms used in volumetric capture is the visual volume intersection method. In the visual volume intersection method, multiple 2D silhouettes are photographed by synchronized cameras, a visual volume is created with the silhouette as a cross section with the camera position as the vertex, and the silhouette is projected back into 3D space. This is a technique for restoring dimensional shapes. Also called Shape from Silhouette.
One of the data formats of volumetric video is point cloud. A point cloud is a point that is the minimum unit of data acquired by a camera or sensor. It gathers and forms a group. This 3D point cloud data continues in the time direction like a movie.
Another data format for volumetric video is a point-to-point mesh. With point cloud data, there is space between points as you zoom in, but with mesh data, each point becomes a vertex and is connected by sides and faces. There are several methods of converting point cloud data to mesh data, and in the process noise removal, smoothing, interpolation, super-resolution creation, etc. are sometimes performed, and high image quality and high precision of data are realized. .
The texture of volumetric video often reflects the lighting conditions at the time of shooting, and if the 3D data is used in a lighting environment completely different from that at the time of shooting, the brightness, color, shading, etc. of the texture will be reproduced. not. Rewriting technology solves this problem. In 3D data that can be relighted, reflection coefficients are added to the texture data, resulting in volumetric video data with higher reproducibility.
Originally Written in Japanese by Takayuki Aoki
CEO of Kadinche Co., Ltd. In 2009, received a Ph.D. (policy and media) from Keio University. After working for Sony Corporation, he established Kadinche Corporation. Engaged in software development related to XR at Kadinche. In 2018, he established Miecle Co., Ltd., a joint venture with Shochiku Co., Ltd., and opened Daikanyama Metaverse Studio in January 2022 to work on content production using virtual production methods.