从二维图像中理解和重建三维场景和结构一直是计算机视觉的基本目标之一。传统视觉方法(几何方法)需要时间域上一系列图像才能完成图像三维重建,但在条件不允许时,用单图片恢复 3D 结构是需要的并有价值的。为了获得图像到三维的相关性,之前的方法主要依赖于基于二维的关键点(轮廓)[3,6,8,9]或形状(外观)[2,5,7,10,13]的匹配损失。然而,这些方法要么仅限于任务特定的领域,要么由于二维特征的稀疏性,只能提供微弱的监督。相比之下,通过模拟图像形成的物理机制,渲染将每个像素与三维参数联系起来,通过反向传播,可以获得通用三维推理任务的密集像素级监督,这是传统方法无法实现的。文章[1]通过将SoftRas”软”光栅器与一个简单的网格生成器相结合,只使用多视图图像来训练网络,而不需要任何3D监督。在测试时,可以从单个RGB图像中重建三维网格和网格纹理。
使用ShapeNet数据,按重构框架训练模型,迭代250000次完成模型训练。下面展示单视图网格重建的测试结果。(1)形状重建结果如图7:(a) input:2D image (b) output: 3D object图7.形状重建结果(2)颜色重建结果如图8:(a) input:2D image (b) output: texture图8.形状重建结果(3)整体网格重建结果如图9: (a) input:2D image (b) output: texture and 3D object图9.整体重建结果
四.2d-3d 风格迁移
2D 到 3D 风格转换是通过优化网格的形状和纹理来执行的,将 2D 图像空间中的信息流入 3D 空间来实现的。通过SoftRas 颜色重建,可以获得3d 形状对应的纹理图,颜色损失被测量为渲染图像和输入图像之间的L1范数,使用可微渲染器将具有纹理的模型渲染到不同的视图中。将图像风格在每个视图上施加损失,通过一个可微渲染器反向传播以更新纹理,以便我们可以获得新的纹理风格。对于物体的2d 到3d 风格迁移,不需要对应纹理的规范性,也可直接使用风格图像代替颜色重建得到的纹理图像,完成风格迁移。实现流程如图10所示。图10.2d-3d 风格化流程 其他图像2d-3d风格迁移结果如图11所示:图11.2d-3d 风格化结果示例
五.总结
本文根据文章[1]介绍一个真正的可微渲染框架(SoftRas),它能够以完全可微的方式直接呈现给定的网格。在统一的渲染框架中考虑外部变量和内在变量,并生成从像素到网格顶点及其属性(颜色、正常等)的有效梯度。通过重新定义传统的离散操作和视觉相关深度z作为可微概率过程来实现这一目标。能够提供更有效的监督信号,梯度流到看不见顶点,并优化网格三角形的z坐标,从而对单视图网格重建和基于图像的形状拟合任务有了显著的改进。本文这里主要用于单视图网格重建任务,同时介绍了物体2d 到3d 风格化的整体流程,并完成了结果示例。2D 到 3D 风格转换通过优化纹理来执行,使2D 图像空间中的信息流入3D 空间来实现。 参考文献:[1] Shichen Liu, Tianye Li, Weikai Chen, and Hao Li. Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning.2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2020. [2] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques, pages 187– 194. ACM Press/Addison-Wesley Publishing Co., 1999. 1, 3[3] F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black. Keep it smpl: Automatic estimation of 3d human pose and shape from a single image. In European Conference on Computer Vision, pages 561–578. Springer, 2016. 1, 3, 5[4] H. Kato, Y. Ushiku, and T. Harada. Neural 3d mesh renderer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3907–3916, 2018. 1, 2, 4, 5,6, 7, 8, 13, 15[5] H. Lensch, J. Kautz, M. Goesele, W. Heidrich, and H.-P. Seidel. Image-based reconstruction of spatial appearance and geometric detail. ACM Transactions on Graphics (TOG), 22(2):234–257, 2003. 1[6] F. Liu, D. Zeng, Q. Zhao, and X. Liu. Joint face alignment and 3d face reconstruction. In European Conference on Computer Vision, pages 545–560. Springer, 2016. 1[7] M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black. Smpl: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6):248, 2015. 1, 3, 8[8] W. Matusik, C. Buehler, R. Raskar, S. J. Gortler, and L. McMillan. Image-based visual hulls. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 369–374. ACM Press/AddisonWesley Publishing Co., 2000. 1[9] G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis. Coarse-to-fifine volumetric prediction for single-image 3d human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7025–7034, 2017. 1, 5[10] R. Zhang, P.-S. Tsai, J. E. Cryer, and M. Shah. Shape-from shading: a survey. IEEE transactions on pattern analysis and machine intelligence, 21(8):690–706, 1999. 1[11] X. Yan, J. Yang, E. Yumer, Y. Guo, and H. Lee. Perspective transformer nets: Learning single-view 3d object reconstruction without 3d supervision. In Advances in Neural Information Processing Systems, pages 1696–1704, 2016. 5, 6, 7[12] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang. Pixel2mesh: Generating 3d mesh models from single rgb images. In ECCV, 2018. 3, 5, 6, 7[13] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):681–685, 2001. 1 文章来自一点资讯AI图像图形实验室(AIIG)团队