Using 3D information to generate real-world corruptions. The top row shows 2D corruptions applied uniformly over the image, e.g. as in Common Corruptions, disregarding 3D information. This leads to corruptions that are unlikely to happen in the real world, e.g. having the same motion blur over the entire image irrespective of distance from camera (top left). Middle row shows their 3D counterparts from 3D Common Corruptions (3DCC). The circled regions highlight the effect of incorporating 3D information. More specifically, in 3DCC, 1. motion blur has a motion parallax effect where objects further away from the camera seem to move less, 2. defocus blur has a depth of field effect, akin to a large aperture effect in real cameras, where certain regions of the image can be selected to be in focus, 3. lighting takes the scene geometry into account when illuminating the scene and casts shadows on objects, 4. fog gets denser further away from the camera, 5. occlusions of a target object, e.g. fridge (blue mask), are created by changing the camera’s viewpoint and having its view naturally obscured by another object, e.g. the plant (red mask). This is in contrast to its 2D counterpart that randomly discards patches.
The new corruptions. We propose a diverse set of new corruption operations ranging from defocusing (near/far focus) to lighting changes and 3D-semantic ones, e.g. object occlusion. These corruptions are all automatically generated, efficient to compute, and can be applied to most datasets. We show that they expose vulnerabilities in models and are a good approximation of realistic corruptions.
Existing robustness mechanisms are found to be insufficient for addressing real-world corruptions approximated by 3DCC. Performance of models with different robustness mechanisms under 3DCC for surface normals (left) and depth (right) estimation tasks are shown. Each bar shows the l1 error averaged over all 3DCC corruptions (lower is better). The red line denotes the performance of the baseline model on clean (uncorrupted) data. The results denote that existing robustness mechanisms, including those with diverse augmentations, perform poorly under 3DCC. Please see the paper for details.
3DCC can be applied to most datasets, even those that do not come with 3D information. Several query images from the ImageNet and COCO dataset are shown above with near focus, far focus and fog 3D corruptions applied. Notice how the objects in the circled regions go from sharp to blurry depending on the focus region and scene geometry. To get the depth information needed to create these corruptions, predictions from MiDaS model is used. This gives a good enough approximation to generate realistic corruptions (as also analyzed in the paper).
Qualitative results of learning with 3D data augmentation on random queries from OASIS, Adobe After Effects generated data, manually collected DSLR data, and in-the-wild YouTube videos for surface normals. The ground truth is gray when it is not available, e.g. for YouTube. Our predictions in the last row are noticeably sharper and more accurate compared to baselines. Please see the paper for more details.