Rendering Comparisons 
Taking multiview videos as input, our method learns the spatiotemporal density, color, and velocity fields.

Volumetric Density, (frontsidetop, rendered with uniform ambient lighting) 
Warping Errors, (left: warp frame i to i+1; right: warp frame i and i+1 to i+0.5) 
 NeuralVolumes, alpha2den: Due to the "ghost density", the alpha (defined in NV) fails to model the actual smoke density. 
Velocity, (the middle slice of frontsidetop, intensity reduced outside visual hull) 
Vorticity, (the middle slice of frontsidetop within the visual hull) 
Rendering Comparisons 
Volumetric Density, (frontsidetop) Velocity(left) and Vorticity(right), (middle slices, frontsidetop) Warping Error, (left: warp frame i to i+1; right: warp frame i and i+1 to i+0.5) 
The conclusion of the synthetic scene evaluation is consistent with the real case, where NeuralVolumes contains "ghost density" and GlobalTrans has noises. The noise is more visible when looking at the density alone in "Volumetric Density" on top right, as well as in the velocity and vorticity fields. "Warping Error" on the bottom right shows that our results fulfill the transport equation better than GlobalTrans on the real case. 
Rendering Comparisons 
 NeuralVolumes: With much "ghost density", NeuralVolumes can easily render more details, since keeping viewconsistency is not necessary with their occlusion. 
Volumetric Comparisons 
The colorbleeding artifact is more visible in the density visualization. Our velocity is closer to the reference with enhanced vorticity. 
Sphere Scene, Ref  Ours  Ours w.o. d2v  NeRF+T  Neural Volumes 

"Ghost density" is visible everywhere for NeRF+T. NeuralVolumes has some "ghost density" in white and some in the color of the sphere. Ours reconstructs the smoke nicely, while Ours w.o. d2v is slightly blurry 
Unsupervised Separation of Static and Dynamic parts 

Static and dynamic components are nicely seperated. 
Estimated Density and Velocity 

With the modelbased supervision, our full model presents more accurate velocity with clearer vorticity. 
Car  Game 

Scenes  Image Resolution  Total Training Iterations 
Total Training Time 
Hyperparameters for Radiance Supervision 
Hyperparameters for Velocity Supervision 


ScalarFlow  Synthetic  360x640  200k  30h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.1\mathcal{L}_{ghost}$  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$ 
Real  540x960  500k  74h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.1\mathcal{L}_{ghost}$  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$  
Complex Lighting 
Plume  400x400  200k  31h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.05\mathcal{L}_{ghost}$  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$ 
Sphere  400x400  150k  37h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.05\mathcal{L}_{ghost} + 0.05\mathcal{L}_{overlay} $  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$  
Complex Obstacles 
Car  960x500  200k  51h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.01\mathcal{L}_{ghost} + 0.05\mathcal{L}_{overlay} $  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$ 
Game  800x800  250k  64h  $\mathcal{L}_{\widetilde{\mathit{img}}} + 0.025\mathcal{L}_{VGG} + 0.01\mathcal{L}_{ghost} + 0.05\mathcal{L}_{overlay} $  $2\mathcal{L}_{\frac{D\sigma}{Dt}} + 0.0005\mathcal{L}_{NSE} + 6\mathcal{L}_{d2v}$ 