kaolin.render.camera.Camera¶

API¶

class kaolin.render.camera.Camera(extrinsics, intrinsics)¶

Bases: object

Camera is a one-stop class for all camera related differentiable / non-differentiable transformations.

Camera objects are represented by batched instances of 2 submodules:

CameraExtrinsics: The extrinsics properties of the camera (position, orientation). These are usually embedded in the view matrix, used to transform vertices from world space to camera space.

CameraIntrinsics: The intrinsics properties of the lens (such as field of view / focal length in the case of pinhole cameras). Intrinsics parameters vary between different lens type, and therefore multiple CameraIntrinsics subclasses exist, to support different types of cameras: pinhole / perspective, orthographic, fisheye, and so forth. For pinehole and orthographic lens, the intrinsics are embedded in a projection matrix. The intrinsics module can be used to transform vertices from camera space to Normalized Device Coordinates.

Note

To avoid tedious invocation of camera functions through camera.extrinsics.someop() and camera.intrinsics.someop(), kaolin overrides the __get_attributes__ function to forward any function calls of camera.someop() to the appropriate extrinsics / intrinsics submodule.

The entire pipeline of transformations can be summarized as (ignoring homogeneous coordinates):

World Space                                         Camera View Space
     V         ---CameraExtrinsics.transform()--->         V'          ---CameraIntrinsics.transform()---
Shape~(B, 3)            (view matrix)                  Shape~(B, 3)                                     |
                                                                                                        |
                                                                       (linear lens: projection matrix) |
                                                                              + homogeneus -> 3D        |
                                                                                                        V
                                                                             Normalized Device Coordinates (NDC)
                                                                                        Shape~(B, 3)

When using view / projection matrices, conversion to homogeneous coordinates is required.
Alternatively, the `transform()` function takes care of such projections under the hood when needed.

How to apply transformations with kaolin’s Camera:

Linear camera types, such as the commonly used pinhole camera, support the view_projection_matrix() method. The returned matrix can be used to transform vertices through pytorch’s matrix multiplication, or even be passed to shaders as a uniform.

All Cameras are guaranteed to support a general transform() function which maps coordinates from world space to Normalized Device Coordinates space. For some lens types which perform non linear transformations, the view_projection_matrix() is non-defined. Therefore the camera transformation must be applied through a dedicated function. For linear cameras, transform() may use matrices under the hood.

Camera parameters may also be queried directly. This is useful when implementing camera params aware code such as ray tracers.

How to control kaolin’s Camera:

CameraExtrinsics: is packed with useful methods for controlling the camera position and orientation: translate(), rotate(), move_forward(), move_up(), move_right(), cam_pos(), cam_up(), cam_forward(), cam_up().

CameraIntrinsics: exposes a lens zoom() operation. The exact functionality depends on the camera type.

How to optimize the Camera parameters:

Both CameraExtrinsics: and CameraIntrinsics maintain torch.Tensor buffers of parameters which support pytorch differentiable operations.

Setting camera.requires_grad_(True) will turn on the optimization mode.

The gradient_mask() function can be used to mask out gradients of specific Camera parameters.

Note

CameraExtrinsics: supports multiple representions of camera parameters (see: switch_backend). Specific representations are better fit for optimization (e.g.: they maintain an orthogonal view matrix). Kaolin will automatically switch to using those representations when gradient flow is enabled For non-differentiable uses, the default representation may provide better speed and numerical accuracy.

Other useful camera properties:

Cameras follow pytorch in part, and support arbitrary dtype and device types through the to(), cpu(), cuda(), half(), float(), double() methods and dtype(), device() properties.

CameraExtrinsics: and CameraIntrinsics: individually support the requires_grad() property.

Cameras implement torch.allclose() for comparing camera parameters under controlled numerical accuracy. The operator == is reserved for comparison by ref.

Cameras support batching, either through construction, or through the cat() method.

Note

Since kaolin’s cameras are batched, the view/projection matrices are of shapes \((\text{num_cameras}, 4, 4)\), and some operations, such as transform() may return values as shapes of \((\text{num_cameras}, \text{num_vectors}, 3)\).

Concluding remarks on coordinate systems and other confusing conventions:

kaolin’s Cameras assume column major matrices, for example, the inverse view matrix (cam2world) is defined as:

\[\begin{split}\begin{bmatrix} r1 & u1 & f1 & px \\ r2 & u2 & f2 & py \\ r3 & u3 & f3 & pz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

This sometimes causes confusion as the view matrix (world2cam) uses a transposed 3x3 submatrix component, which despite this transposition is still column major (observed through the last t column):

\[\begin{split}\begin{bmatrix} r1 & r2 & r3 & tx \\ u1 & u2 & u3 & ty \\ f1 & f2 & f3 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]
kaolin’s cameras do not assume any specific coordinate system for the camera axes. By default, the right handed cartesian coordinate system is used. Other coordinate systems are supported through change_coordinate_system() and the coordinates.py module:
   Y
   ^
   |
   |---------> X
  /
Z
kaolin’s NDC space is assumed to be left handed (depth goes inwards to the screen). The default range of values is [-1, 1].

Parameters

extrinsics (CameraExtrinsics) –
intrinsics (CameraIntrinsics) –

classmethod cat(cameras)¶

Concatenate multiple Camera’s.

Assumes all cameras use the same width, height, near and far planes.

Parameters: cameras (Sequence of Camera) – the cameras to concatenate.
Returns: The concatenated cameras as a single Camera.
Return type: (Camera)

cpu()¶

Return type: Camera

cuda()¶

Return type: Camera

property device: device¶: torch device of parameters tensors

double()¶

Return type: Camera

property dtype: dtype¶: torch dtype of parameters tensors

float()¶

Return type: Camera

classmethod from_args(**kwargs)¶

A convenience constructor for the camera class, which takes all extrinsics & intrinsics arguments at once, and disambiguates them to construct a complete camera object.

The correct way of using this constructor is by specifying the camera args as **kwargs, for example:

# Construct a pinhole camera with perspective projection
Camera.from_args(
    eye=torch.tensor([10.0, 0.0, 0.0]),
    at=torch.tensor([0.0, 0.0, 0.0]),
    up=torch.tensor([0.0, 1.0, 0.0]),
    fov=30 * np.pi / 180,   # alternatively focal_x, optionally specify: focal_y, x0, y0
    width=800, height=800,
    near=1e-2, far=1e2,
    dtype=torch.float64,
    device='cuda'
)
# Construct an orthographic camera
Camera.from_args(
    eye=np.array([10.0, 0.0, 4.0]),
    at=np.array([0.0, 0.0, 0.0]),
    up=np.array([0.0, 1.0, 0.0]),
    width=800, height=800,
    near=-800, far=800,
    fov_distance=1.0,
    dtype=torch.float32,
    device='cuda'
)
# Construct a pinhole camera
Camera.from_args(
    view_matrix=torch.tensor([[1.0, 0.0, 0.0, 0.5],
                              [0.0, 1.0, 0.0, 0.5],
                              [0.0, 0.0, 1.0, 0.5],
                              [0.0, 0.0, 0.0, 1.0]]),
    focal_x=1000,
    width=1600, height=1600,
)

Parameters

**kwargs (dict of str, *) –

keywords specifying the parameters of the camera. Valid options are a combination of extrinsics, intrinsics and general properties:

Extrinsic params: eye, at, up / view_matrix / cam_pos, cam_dir

Perspective intrinsic params: fov / focal_x, optionally: x0, y0, focal_y, fov_direction

Orthographic intrinsic params: fov_distance optionally: x0, y0

General intrinsic dimensions: width, height, optionally: near, far

Tensor params properties - optionally: device, dtype

gradient_mask(*args)¶

Creates a gradient mask, which allows to backpropagate only through params designated as trainable.

This function does not consider the requires_grad field when creating this mask.

Note

The 3 extrinsics camera axes are always masked as trainable together. This design choice ensures that these axes, as well as the view matrix, remain orthogonal.

Parameters: *args (Union[str, ExtrinsicsParamsDefEnum, IntEnum]) – A vararg list of the extrinsic and intrinsic params that should allow gradient flow. This function also supports conversion of params from their string names. (i.e: ‘t’ will convert to PinholeParamsDefEnum.t).
Returns: the gradient masks, of same shapes than self.extrinsics.parameters() and self.intrinsics.parameters().
Return type: (torch.BoolTensor, torch.BoolTensor)

Example

>>> extrinsics_mask, intrinsics_mask = camera.gradient_mask('t', 'focal_x', 'focal_y')
>>> # equivalent to the args:
>>> # ExtrinsicsParamsDefEnum.t, IntrinsicsParamsDefEnum.focal_x, IntrinsicsParamsDefEnum.focal_y
>>> extrinsics_params, intrinsic_params = camera.params()
>>> extrinsics_params.register_hook(lambda grad: grad * extrinsics_mask.float())
>>> # extrinsics will now allow gradient flow only for the camera location
>>> intrinsic_params.register_hook(lambda grad: grad * intrinsics_mask.float())
>>> # intrinsics will now allow gradient flow only for the focal length

half()¶

Return type: Camera

property height: int¶: Camera image plane height (pixel resolution)

property lens_type: str¶: A textual description of the camera lens type. (i.e ‘pinhole’, ‘ortho’)

named_params()¶

Get a descriptive list of named parameters per camera.

Returns: The named parameters.
Return type: (list of dict)

parameters()¶

Returns the full parameters set of the camera, divided to extrinsics and intrinsics parameters

Returns: the extrinsics and the intrinsics parameters.
Return type: (torch.Tensor, torch.Tensor)

requires_grad_(val)¶

Toggle gradient flow for both extrinsics and intrinsics params.

Note

To read the requires_grad attribute access the extrinsics / intrinsics components explicitly, as their requires_grad status may differ.

Parameters: val (bool) –

to(*args, **kwargs)¶

Return type: Camera

transform(vectors)¶

Applies extrinsic and instrinsic projections consecutively, thereby projecting the vectors from world to NDC space.

Parameters: vectors (torch.Tensor) – the vectors to transform, of shape \((\text{batch_size}, 3)\) or \((\text{num_cameras}, \text{batch_size}, 3)\).
Returns: The vectors projected to NDC space, of the same shape as vectors, transform can be broadcasted.
Return type: (torch.Tensor)

view_projection_matrix()¶

Return the composed view projection matrix.

Note

Works only for cameras with linear projection transformations.

Returns: The view projection matrix, of shape \(( ext{num_cameras}, 4, 4)\)
Return type: (torch.Tensor)

property width: int¶: Camera image plane width (pixel resolution)