Depthkit

Accessible Volumetric Video

2017 - Present

CTO, Head of Engineering, Software Architect, Graphics Engineer

Scatter's first product, Depthkit is an accessible creative software tool for volumetric filmmaking. Depthkit enables filmmakers and creators to capture full motion volumetric video for immersive media, visual effects and virtual production using just a laptop and a depth sensor. With thousands of users, Depthkit is the most widely used volumetric video solution.

I joined Scatter in 2017 and began working on a ground-up rewrite of Depthkit, with the aim of bringing accessible volumetric capture to market. In 2019 we launched our full-body, multi-sensor volumetric capture solution, Depthkit Studio, and have been iterating on it since. This project has presented me with countless technical challenges to overcome, and has helped me grow as an engineer, manager, and human.

High Performance & Efficiency

We wanted to keep Depthkit as affordable and accessible as possible, so our goal was to enable full-body volumetric capture using just a single consumer-grade PC as the host machine. Even with a sparse set of up to 10 sensors (some solutions use hundreds of cameras), volumetric capture involves a massive amount of data. Each sensor serves both a high resolution color stream and a depth image, streaming at 30 frames per second. To process and save all of this data in real-time, I developed a highly flexible and scalable heterogenous data-flow pipeline, which allows Depthkit to take advantage of all the available CPU and GPU computing power of a modern PC. Reusable pipeline stages allow us to scale the system depending on the number of connected sensors, as well as the number of CPU cores; this allows Depthkit to keep up with the demanding tasks of capture, processing, and exporting volumetric video data, all without the need to offload heavy tasks to the cloud. Indeed, some export formats can be processed several times faster than real-time, or even streamed live.

The design philosophy of Depthkit has been to bias toward real-time user feedback wherever possible, even when users might typically be used to waiting for results. For example, the desired results of volumetric capture is to reconstruct a 3D representation of the captured subject at video frame rates. When these hard technical problems are no longer in a user's way, the result is that it allows them to work with the tool rather than against it, and make creative decisions faster with fewer headaches. When things just work, the user does not even notice, and ideally never knows how technically hard it was to make the magic happen.

In other solutions, this may be done using an off-the-shelf 3D reconstruction algorithm which are often not optimized for speed (let alone real-time), and can take seconds or minutes to process a single frame. In comparison, the 3D reconstruction method I developed for Depthkit can extract and texture a highly detailed surface from a raw point cloud in milliseconds, allowing for instantaneous user feedback. For the curious (and the graphics nerds out there), this involves constructing, optimizing, and querying an octree, as well as triangulating, rasterizing and texturing the results, all in real-time.