kaolin.render.mesh.dibr_rasterization(height, width, face_vertices_z, face_vertices_image, face_features, face_normals_z, sigmainv=7000, boxlen=0.02, knum=30, multiplier=1000)

Fully differentiable DIB-R renderer implementation, that renders 3D triangle meshes with per-vertex per-face features to generalized feature “images”, soft foreground masks, depth and face index maps.

See for usage with textures and lighting.

Originally proposed by Chen, Whenzheng, et al. in Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer NeurIPS 2019

  • height (int) – the size of rendered images

  • width (int) – the size of rendered images

  • face_vertices_z (torch.FloatTensor) – 3D points depth (z) value of the face vertices in camera coordinate, of shape \((\text{batch_size}, \text{num_faces}, 3)\).

  • face_vertices_image (torch.FloatTensor) – 2D positions of the face vertices on image plane, of shape \((\text{batch_size}, \text{num_faces}, 3, 2)\), Note that face_vertices_camera is projected on image plane (z=-1) and forms face_vertices_image. The coordinates of face_vertices_image are between [-1, 1], which corresponds to normalized image pixels.

  • face_features (torch.FloatTensor or list of torch.FloatTensor) – Features (per-vertex per-face) to be drawn, of shape \((\text{batch_size}, \text{num_faces}, 3, \text{feature_dim})\), feature is the features dimension, for instance with vertex colors num_features=3 (R, G, B), and texture coordinates num_features=2 (X, Y), or a list of num_features, of shapes \((\text{batch_size}, \text{num_faces}, 3, \text{feature_dim[i]})\)

  • face_normals_z (torch.FloatTensor) – Normal directions in z axis, fo shape \((\text{batch_size}, \text{num_faces})\), only faces with normal z >= 0 will be drawn

  • sigmainv (int) – Smoothness term for soft mask, the higher, the sharper, the range is [1/3e-4, 1/3e-5]. Default: 7000.

  • boxlen (float) – We assume the pixel will only be influenced by nearby faces and boxlen controls the area size, the range is [0.05, 0.2]. Default: 0.1.

  • knum (int) – Maximum faces that influence one pixel. The range is [20, 100]. Default: 30. Note that the higher boxlen, the bigger knum.

  • multiplier (int) – To avoid numeric issue, we enlarge the coordinates by a multiplier. Default: 1000.


  • The rendered features of shape \((\text{batch_size}, \text{height}, \text{width}, \text{num_features})\), if face_features is a list of torch.FloatTensor, return of torch.FloatTensor, of shapes \((\text{batch_size}, \text{height}, \text{width}, \text{num_features[i]})\).

  • The rendered soft mask. It is generally sued in IoU loss to deform the shape, of shape \((\text{batch_size}, \text{height}, \text{width})\).

  • The rendered face index, 0 is void and face index start from 1, of shape \((\text{batch_size}, \text{height}, \text{width})\).

Return type

(torch.FloatTensor, torch.FloatTensor, torch.LongTensor)

kaolin.render.mesh.prepare_vertices(vertices, faces, camera_proj, camera_rot=None, camera_trans=None, camera_transform=None)

Wrapper function to move and project vertices to cameras then index them with faces.

  • vertices (torch.Tensor) – the meshes vertices, of shape \((\text{batch_size}, \text{num_vertices}, 3)\).

  • faces (torch.LongTensor) – the meshes faces, of shape \((\text{num_faces}, \text{face_size})\).

  • camera_proj (torch.Tensor) – the camera projection vector, of shape \((3, 1)\).

  • camera_rot (torch.Tensor, optional) – the camera rotation matrices, of shape \((\text{batch_size}, 3, 3)\).

  • camera_trans (torch.Tensor, optional) – the camera translation vectors, of shape \((\text{batch_size}, 3)\).

  • camera_transform (torch.Tensor, optional) – the camera transformation matrices, of shape \((\text{batch_size}, 4, 3)\). Replace camera_trans and camera_rot.


The vertices in camera coordinate indexed by faces, of shape \((\text{batch_size}, \text{num_faces}, \text{face_size}, 3)\). The vertices in camera plan coordinate indexed by faces, of shape \((\text{batch_size}, \text{num_faces}, \text{face_size}, 2)\). The face normals, of shape \((\text{batch_size}, \text{num_faces})\).

Return type

(torch.Tensor, torch.Tensor, torch.Tensor)

kaolin.render.mesh.spherical_harmonic_lighting(imnormal, lights)

Creates lighting effects.

Follows convention set by Wojciech Jarosz in Efficient Monte Carlo Methods for Light Transport in Scattering Media.

  • imnormal (torch.FloatTensor) – per pixel normal, of shape \((\text{batch_size}, \text{height}, \text{width}, 3)\)

  • lights (torch.FloatTensor) – spherical harmonic lighting parameters, of shape \((\text{batch_size}, 9)\)


lighting effect, shape of \((\text{batch_size}, \text{height}, \text{width})\)

Return type



kaolin.render.mesh.texture_mapping(texture_coordinates, texture_maps, mode='nearest')

Interpolates texture_maps by texture_coordinates.

Note that opengl tex coord is different from pytorch’s coord. opengl coord ranges from 0 to 1, y axis is from bottom to top and it supports circular mode(-0.1 is the same as 0.9) pytorch coord ranges from -1 to 1, y axis is from top to bottom and does not support circular filtering is the same as the mode parameter for torch.nn.functional.grid_sample.

  • texture_coordinates (torch.FloatTensor) – image texture coordinate, of shape \((\text{batch_size}, h, w, 2)\)

  • texture_maps (torch.FloatTensor) – textures, of shape \((\text{batch_size}, 3, h', w')\). Here, h’ & w’ is the height and width of texture maps while h and w is height and width of rendered image. For each pixel in the rendered image we use the coordinates in the texture_coordinates to query corresponding RGB color in texture maps. h’ & w’ could be different from h & w


interpolated texture, of shape \((\text{batch_size}, h, w, 3)\)

Return type