# kaolin.render.mesh¶

## API¶

kaolin.render.mesh.deftet_sparse_render(pixel_coords, render_ranges, face_vertices_z, face_vertices_image, face_features, knum=300, eps=1e-08)

Fully differentiable volumetric renderer devised by Gao et al. in Learning Deformable Tetrahedral Meshes for 3D Reconstruction NeurIPS 2020.

This is rasterizing a mesh w.r.t to a list of pixel coordinates, but instead of just rendering the closest intersection. it will render all the intersections sorted by depth order, returning the interpolated features and the indexes of faces intersected for each intersection in padded arrays.

Note

The function is not differentiable w.r.t pixel_coords.

Note

if face_camera_vertices and face_camera_z are produced by camera functions in kaolin.render.camera, then the expected range of values for pixel_coords is [-1., 1.] and the expected of range values for render_range is [-inf, 0.].

Parameters
• pixel_coords (torch.Tensor) – Image coordinates to render, of shape $$(\text{batch_size}, \text{num_pixels}, 2)$$.

• render_ranges (torch.Tensor) – Depth ranges on which intersection get rendered, of shape $$(\text{batch_size}, \text{num_pixels}, 2)$$.

• face_vertices_z (torch.Tensor) – 3D points values of the face vertices in camera coordinate, values in front of camera are expected to be negative, higher values being closer to the camera. of shape $$(\text{batch_size}, \text{num_faces}, 3)$$.

• face_vertices_image (torch.Tensor) – 2D positions of the face vertices on image plane, of shape $$(\text{batch_size}, \text{num_faces}, 3, 2)$$, Note that face vertices are projected on image plane (z=-1) to forms face_vertices_image.

• face_features (torch.Tensor or list of torch.Tensor) – Features (per-vertex per-face) to be drawn, of shape $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim})$$, feature is the features dimension, for instance with vertex colors num_features=3 (R, G, B), and texture coordinates num_features=2 (X, Y), or a list of num_features, of shapes $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim[i]})$$.

• knum (int) – Maximum number of faces that influence one pixel. Default: 300.

• eps (float) – Epsilon value used to normalize barycentric weights. Default: 1e-8.

Returns

• The rendered features, of shape $$(\text{batch_size}, \text{num_pixels}, \text{knum}, \text{feature_dim})$$, if face_features is a list of torch.Tensor, then it returns a list of torch.Tensor, of shapes $$(\text{batch_size}, \text{num_pixels}, \text{knum}, \text{feature_dim[i]})$$.

• The rendered face index, -1 is void, of shape $$(\text{batch_size}, \text{num_pixels}, \text{knum})$$.

Return type

(torch.Tensor or list of torch.Tensor, torch.LongTensor)

kaolin.render.mesh.dibr_rasterization(height, width, face_vertices_z, face_vertices_image, face_features, face_normals_z, sigmainv=7000, boxlen=0.02, knum=30, multiplier=None, eps=None, rast_backend='cuda')

Fully differentiable DIB-R renderer implementation, that renders 3D triangle meshes with per-vertex per-face features to generalized feature “images”, soft foreground masks, and face index maps.

Parameters
• height (int) – the size of rendered images.

• width (int) – the size of rendered images.

• face_vertices_z (torch.FloatTensor) – 3D points depth (z) value of the face vertices in camera coordinate, of shape $$(\text{batch_size}, \text{num_faces}, 3)$$.

• face_vertices_image (torch.FloatTensor) – 2D positions of the face vertices on image plane, of shape $$(\text{batch_size}, \text{num_faces}, 3, 2)$$, Note that face_vertices_camera is projected on image plane (z=-1) and forms face_vertices_image. The coordinates of face_vertices_image are between $$[-1, 1]$$, which corresponds to normalized image pixels.

• face_features (torch.FloatTensor or list of torch.FloatTensor) – Features (per-vertex per-face) to be drawn, of shape $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim})$$, feature is the features dimension, for instance with vertex colors num_features=3 (R, G, B), and texture coordinates num_features=2 (X, Y), or a list of num_features, of shapes $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim[i]})$$

• face_normals_z (torch.FloatTensor) – Normal directions in z axis, of shape $$(\text{batch_size}, \text{num_faces})$$, only faces with normal z >= 0 will be drawn.

• sigmainv (float) – Smoothness term for computing the softmask, the higher the sharper. The recommended range is $$[1/3e-4, 1/3e-5]$$. Defaut: 7000.

• boxlen (float) – Margin over bounding box of faces which will threshold which pixels will be influenced by the face. The value should be adapted to sigmainv, to threshold values close to 0. The recommended range is [0.05, 0.2]. Default: 0.02.

• knum (int) – Maximum number of faces that can influence one pixel. The value should be adapted to boxlen, to avoid missing faces. The recommended range is [20, 100]. Default: 30.

• multiplier (float) – To avoid numerical issue, we internally enlarge the 2d coordinates by a multiplier. Default: 1000.

• eps (float) – Epsilon value used to normalize barycentric weights in rasterization. Especially matter with small triangles, to increase or decrease in case of exploding or vanishing gradient. Ignored if backend is ‘nvdiffrast’. Default: 1e-8.

• backend (string) – Backend used for the rasterization, can be [‘cuda’, ‘nvdiffrast’, nvdiffrast_fwd’]. ‘nvdiffrast_fwd’ is using nvdiffrast library for the forward pass only and kaolin’s custom Op for backward pass.

Returns

• The rendered features of shape $$(\text{batch_size}, \text{height}, \text{width}, \text{num_features})$$, if face_features is a list of torch.FloatTensor, return of torch.FloatTensor, of shapes $$(\text{batch_size}, \text{height}, \text{width}, \text{num_features[i]})$$.

• The rendered soft mask, of shape $$(\text{batch_size}, \text{height}, \text{width})$$. It is generally used with kaolin.metrics.render.mask_iou() to compute the silhouette loss.

• The rendered face index, -1 is None, of shape $$(\text{batch_size}, \text{height}, \text{width})$$.

Return type

(torch.Tensor, torch.Tensor, torch.LongTensor)

kaolin.render.mesh.dibr_soft_mask(face_vertices_image, selected_face_idx, sigmainv=7000, boxlen=0.02, knum=30, multiplier=1000.0)

Compute a soft mask generally used with kaolin.metrics.render.mask_iou() to compute a silhouette loss, as defined by Chen, Wenzheng, et al. in Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer Neurip 2019.

Parameters
• face_vertices_image (torch.Tensor) – 2D positions of the face vertices on image plane, of shape $$(\text{batch_size}, \text{num_faces}, 3, 2)$$, Note that face_vertices_camera is projected on image plane (z=-1) and forms face_vertices_image. The coordinates of face_vertices_image are between $$[-1, 1]$$, which corresponds to normalized image pixels.

• selected_face_idx (torch.LongTensor) – Rendered face index, of shape $$(\text{batch_size}, \text{height}, \text{width})$$. See 2nd returned value from kaolin.render.mesh.rasterize().

• sigmainv (float) – Smoothness term for computing the softmask, the higher the sharper. The recommended range is $$[1/3e-4, 1/3e-5]$$. Defaut: 7000.

• boxlen (float) – Margin over bounding box of faces which will threshold which pixels will be influenced by the face. The value should be adapted to sigmainv, to threshold values close to 0. The recommended range is [0.05, 0.2]. Default: 0.02.

• knum (int) – Maximum number of faces that can influence one pixel. The value should be adapted to boxlen, to avoid missing faces. The recommended range is [20, 100]. Default: 30.

• multiplier (float) – To avoid numerical issue, we internally enlarge the 2d coordinates by a multiplier. Default: 1000.

Returns

The soft mask, of shape $$(\text{batch_size}, \text{height}, \text{width})$$.

Return type

(torch.FloatTensor)

kaolin.render.mesh.prepare_vertices(vertices, faces, camera_proj, camera_rot=None, camera_trans=None, camera_transform=None)

Wrapper function to move and project vertices to cameras then index them with faces.

Parameters
• vertices (torch.Tensor) – the meshes vertices, of shape $$(\text{batch_size}, \text{num_vertices}, 3)$$.

• faces (torch.LongTensor) – the meshes faces, of shape $$(\text{num_faces}, \text{face_size})$$.

• camera_proj (torch.Tensor) – the camera projection vector, of shape $$(3, 1)$$.

• camera_rot (torch.Tensor, optional) – the camera rotation matrices, of shape $$(\text{batch_size}, 3, 3)$$.

• camera_trans (torch.Tensor, optional) – the camera translation vectors, of shape $$(\text{batch_size}, 3)$$.

• camera_transform (torch.Tensor, optional) – the camera transformation matrices, of shape $$(\text{batch_size}, 4, 3)$$. Replace camera_trans and camera_rot.

Returns

The vertices in camera coordinate indexed by faces, of shape $$(\text{batch_size}, \text{num_faces}, \text{face_size}, 3)$$. The vertices in camera plan coordinate indexed by faces, of shape $$(\text{batch_size}, \text{num_faces}, \text{face_size}, 2)$$. The face normals, of shape $$(\text{batch_size}, \text{num_faces})$$.

Return type
kaolin.render.mesh.rasterize(height, width, face_vertices_z, face_vertices_image, face_features, valid_faces=None, multiplier=None, eps=None, backend='cuda')

Fully differentiable rasterization implementation, that renders 3D triangle meshes with per-vertex per-face features to generalized feature “images”.

Backend can be selected among, nvdiffrast library if available (see installation instructions), or custom cuda ops improved from originally proposed by Chen, Whenzheng, et al. in Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer NeurIPS 2019.

Note

nvdiffrast library is relying on OpenGL and so can be faster especially on larger mesh and resolution.

Parameters
• height (int) – the size of rendered images.

• width (int) – the size of rendered images.

• face_vertices_z (torch.FloatTensor) – 3D points depth (z) value of the face vertices in camera coordinate, of shape $$(\text{batch_size}, \text{num_faces}, 3)$$.

• face_vertices_image (torch.FloatTensor) – 2D positions of the face vertices on image plane, of shape $$(\text{batch_size}, \text{num_faces}, 3, 2)$$, Note that face_vertices_camera is projected on image plane (z=-1) and forms face_vertices_image. The coordinates of face_vertices_image are between $$[-1, 1]$$, which corresponds to normalized image pixels.

• face_features (torch.FloatTensor or list of torch.FloatTensor) – Features (per-vertex per-face) to be drawn, of shape $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim})$$, feature is the features dimension, for instance with vertex colors num_features=3 (R, G, B), and texture coordinates num_features=2 (X, Y), or a list of num_features, of shapes $$(\text{batch_size}, \text{num_faces}, 3, \text{feature_dim[i]})$$

• valid_faces (torch.BoolTensor) – Mask of faces being rasterized, of shape $$(\text{batch_size}, \text{num_faces})$$. Default: All faces are valid.

• multiplier (int) – To avoid numeric issue, we enlarge the coordinates by a multiplier. Used only with backend ‘cuda’ at forward pass. Default: 1000.

• eps (float) – Epsilon value used to normalize barycentric weights. Especially matter with small triangles, to increase or decrease in case of exploding or vanishing gradient. Ignored if backend is ‘nvdiffrast’. Default: 1e-8.

• backend (string) – Backend used for the rasterization, can be [‘cuda’, ‘nvdiffrast’, nvdiffrast_fwd’]. ‘nvdiffrast_fwd’ is using nvdiffrast library for the forward pass only and kaolin’s custom Op for backward pass.

Returns

• The rendered features of shape $$(\text{batch_size}, \text{height}, \text{width}, \text{num_features})$$, if face_features is a list of torch.FloatTensor, return of torch.FloatTensor, of shapes $$(\text{batch_size}, \text{height}, \text{width}, \text{num_features[i]})$$.

• The rendered face index, -1 is None, of shape $$(\text{batch_size}, \text{height}, \text{width})$$.

Return type

(torch.FloatTensor, torch.LongTensor)

kaolin.render.mesh.spherical_harmonic_lighting(imnormal, lights)

Creates lighting effects.

Follows convention set by Wojciech Jarosz in Efficient Monte Carlo Methods for Light Transport in Scattering Media.

Parameters
• imnormal (torch.FloatTensor) – per pixel normal, of shape $$(\text{batch_size}, \text{height}, \text{width}, 3)$$

• lights (torch.FloatTensor) – spherical harmonic lighting parameters, of shape $$(\text{batch_size}, 9)$$

Returns

lighting effect, shape of $$(\text{batch_size}, \text{height}, \text{width})$$

Return type

(torch.FloatTensor)

https://cs.dartmouth.edu/~wjarosz/publications/dissertation/appendixB.pdf

kaolin.render.mesh.texture_mapping(texture_coordinates, texture_maps, mode='nearest')

Interpolates texture_maps by dense or sparse texture_coordinates. This function supports sampling texture coordinates for: 1. An entire 2D image 2. A sparse point cloud of texture coordinates.

Parameters
• texture_coordinates (torch.FloatTensor) – dense image texture coordinate, of shape $$(\text{batch_size}, h, w, 2)$$ or sparse texture coordinate for points, of shape $$(\text{batch_size}, \text{num_points}, 2)$$ Coordinates are expected to be normalized between [0, 1]. Note that opengl tex coord is different from pytorch’s coord. opengl coord ranges from 0 to 1, y axis is from bottom to top and it supports circular mode(-0.1 is the same as 0.9) pytorch coord ranges from -1 to 1, y axis is from top to bottom and does not support circular filtering is the same as the mode parameter for torch.nn.functional.grid_sample.

• texture_maps (torch.FloatTensor) –

textures of shape $$(\text{batch_size}, \text{num_channels}, h', w')$$. Here, $$h'$$ & $$w'$$ are the height and width of texture maps.

If texture_coordinates are image texture coordinates - For each pixel in the rendered image of height we use the coordinates in texture_coordinates to query corresponding value in texture maps. Note that height $$h$$ and width $$w$$ of the rendered image could be different from $$h'$$ & $$w'$$.

If texture_coordinates are sparse texture coordinates - For each point in texture_coordinates we query the corresponding value in texture_maps.

Returns

interpolated texture of shape $$(\text{batch_size}, h, w, \text{num_channels})$$ or interpolated texture of shape $$(\text{batch_size}, \text{num_points}, \text{num_channels})$$

Return type

(torch.FloatTensor)