kaolin.render.camera.CameraExtrinsics

API

class kaolin.render.camera.CameraExtrinsics(backend, shared_fields=None)

Bases: object

Holds the extrinsics parameters of a camera: position and orientation in space.

This class maintains the view matrix of camera, used to transform points from world coordinates to camera / eye / view space coordinates.

This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix:

\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]

where R is a 3x3 rotation matrix and t is a 3x1 translation vector for the orientation and position respectively.

This class is batched and may hold information from multiple cameras.

CameraExtrinsics relies on a dynamic representation backend to manage the tradeoff between various choices such as speed, or support for differentiable rigid transformations. Parameters are stored as a single tensor of shape \((\text{num_cameras}, K)\), where K is a representation specific number of parameters. Transformations and matrices returned by this class support differentiable torch operations, which in turn may update the extrinsic parameters of the camera:

                         convert_to_mat
    Backend                 ---- >            Extrinsics
Representation R                             View Matrix M
Shape (num_cameras, K),                    Shape (num_cameras, 4, 4)
                            < ----
                         convert_from_mat

Note

Unless specified manually with switch_backend(), kaolin will choose the optimal representation backend depending on the status of requires_grad.

Note

Users should be aware, but not concerned about the conversion from internal representations to view matrices. kaolin performs these conversions where and if needed.

Supported backends:

  • “matrix_se3”: A flattened view matrix representation, containing the full information of special euclidean transformations (translations and rotations). This representation is quickly converted to a view matrix, but differentiable ops may cause the view matrix to learn an incorrect, non-orthogonal transformation.

  • “matrix_6dof_rotation”: A compact representation with 6 degrees of freedom, ensuring the view matrix remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.

Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the choice of the user. Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection matrix.

Parameters
DEFAULT_BACKEND = 'matrix_se3'
DEFAULT_DIFFERENTIABLE_BACKEND = 'matrix_6dof_rotation'
property R: Tensor

A tensor whose columns represent the directions of world-axes in camera coordinates, of shape \((\text{num_cameras}, 3, 3)\).

This is the R submatrix of the extrinstic matrix:

\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]

defined as:

\[\begin{split}R = \begin{bmatrix} r1 & r2 & r3 \\ u1 & u2 & u3 \\ f1 & f2 & f3 \end{bmatrix}\end{split}\]

with:

  • r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates

  • u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates

  • f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates

classmethod available_backends()

Returns: (iterable of str):

list of available representation backends, to be used with switch_backend()

Return type

Iterable[str]

property backend_name: str

the unique name used to register the currently used representation backend.

Values available by default:

  • “matrix_se3”: A flattened view matrix representation, containing the full information of special eucilidean transformations (translations and rotations). This representation is quickly converted to a view matrix, but differentiable ops may cause the view matrix to learn an incorrect, non-orthogonal transformation.

  • “matrix_6dof_rotation”: A compact representation with 6 degrees of freedom, ensuring the view matrix remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.

property basis_change_matrix

The transformation matrix (permutation + reflections) used to change the coordinates system of this camera from the default cartesian one to another.

This matrix is manipulated by: change_coordinate_system(), reset_coordinate_system()

cam_forward()

Returns the camera forward axis -

See: https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/lookat-function/framing-lookat-function.html

Returns

the camera forward axis, in world coordinates.

Return type

(torch.Tensor)

cam_pos()

Returns: (torch.Tensor): the camera position, in world coordinates

Return type

Tensor

cam_right()

Returns: (torch.Tensor): the camera right axis, in world coordinates

Return type

Tensor

cam_up()

Returns: (torch.Tensor): the camera up axis, in world coordinates

Return type

Tensor

classmethod cat(cameras)

Concatenate multiple CameraExtrinsics’s.

Assumes all cameras use the same coordinate system. (kaolin will not alert if not, the coordinate system will be selected as the first camera)

Parameters

cameras (Sequence of CameraExtrinsics) – the cameras extrinsics to concatenate.

Returns

The concatenated cameras extrinsics as a single CameraExtrinsics

Return type

(CameraExtrinsics)

change_coordinate_system(basis_change)

Applies a coordinate system change using the given 3x3 permutation & reflections matrix.

For instance:

  1. From a Y-up coordinate system (cartesian) to Z-up:

\[\begin{split}\text{basis_change} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{bmatrix}\end{split}\]
  1. From a right handed coordinate system (Z pointing outwards) to a left handed one (Z pointing inwards):

\[\begin{split}\text{basis_change} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{bmatrix}\end{split}\]

The basis_change is assumed to have a determinant of +1 or -1.

Parameters

basis_change (numpy.ndarray or torch.Tensor) – a composition of axes permutation and reflections, of shape \((3, 3)\)

cpu()
Return type

CameraExtrinsics

cuda()
Return type

CameraExtrinsics

property device: device

the torch device of parameters tensor

double()
Return type

CameraExtrinsics

property dtype: dtype

the torch dtype of parameters tensor

float()
Return type

CameraExtrinsics

classmethod from_camera_pose(cam_pos, cam_dir, dtype=torch.float32, device=None, requires_grad=False, backend=None)

Constructs the extrinsics from the camera pose and orientation in world coordinates.

Parameters
  • cam_pos (numpy.ndarray or torch.Tensor) – the location of the camera center in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)

  • cam_dir (numpy.ndarray or torch.Tensor) – the camera’s orientation with respect to the world, of shape \((3, 3)\) or \((\text{num_cameras}, 3, 3)\)

  • dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None, torch.get_default_dtype() will be used

  • device (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used

  • requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics

  • backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as None to let kaolin automatically select the optimal backend. Valid values: matrix_se3, matrix_6dof_rotation (see class description).

Returns

the camera extrinsics

Return type

(CameraExtrinsics)

classmethod from_lookat(eye, at, up, dtype=torch.float32, device=None, requires_grad=False, backend=None)

Constructs the extrinsic from camera position, camera up vector, and destination the camera is looking at.

This constructor is compatible with glm’s lookat function, which by default assumes a cartesian right-handed coordinate system (z axis positive direction points outwards from screen).

Parameters
  • eye (numpy.ndarray or torch.Tensor) – the location of the camera center in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)

  • up (numpy.ndarray or torch.Tensor) – the vector pointing up from the camera in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)

  • at (numpy.ndarray or torch.Tensor) – the direction the camera is looking at in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)

  • dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None, the torch.get_default_dtype() will be used

  • device (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used

  • requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics

  • backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as None to let kaolin automatically select the optimal backend. Valid values: matrix_se3, matrix_6dof_rotation (see class description).

Returns

the camera extrinsics

Return type

(CameraExtrinsics)

classmethod from_view_matrix(view_matrix, dtype=torch.float32, device=None, requires_grad=False, backend=None)

Constructs the extrinsics from a given view matrix of shape \((\text{num_cameras}, 4, 4)\).

The matrix should be a column major view matrix, for converting vectors from world to camera coordinates (a.k.a: world2cam matrix):

\[\begin{split}\begin{bmatrix} r1 & r2 & r3 & tx \\ u1 & u2 & u3 & ty \\ f1 & f2 & f3 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

with:

  • r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates

  • u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates

  • f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates

  • t: Position - the world origin in camera coordinates

if you’re using a different coordinate system, the axes may be permuted.

Parameters
  • view_matrix (numpy.ndarray or torch.Tensor) – view matrix, of shape \((\text{num_cameras}, 4, 4)\)

  • dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None, the torch.get_default_dtype() will be used

  • device (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used

  • requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics

  • backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as None to let kaolin automatically select the optimal backend. Valid values: matrix_se3, matrix_6dof_rotation (see class description).

Returns

the camera extrinsics

Return type

(CameraExtrinsics)

gradient_mask(*args)

Creates a gradient mask, which allows to backpropagate only through params designated as trainable.

This function does not consider the requires_grad field when creating this mask.

Note

The 3 camera axes are always masked as trainable together. This design choice ensures that these axes, as well as the view matrix, remain orthogonal.

Parameters

*args (Union[str, ExtrinsicsParamsDefEnum]) – A vararg list of the extrinsics params that should allow gradient flow. This function also supports conversion of params from their string names. (i.e: ‘t’ will convert to ExtrinsicsParamsDefEnum.t)

Return type

Tensor

Example

>>> # equivalent to:   mask = extrinsics.gradient_mask(ExtrinsicsParamsDefEnum.t)
>>> mask = extrinsics.gradient_mask('t')
>>> extrinsics.params.register_hook(lambda grad: grad * mask.float())
>>> # extrinsics will now allow gradient flow only for the camera location
Returns

the gradient mask, of same shape than self.parameters()

Return type

(torch.BoolTensor)

Parameters

args (Union[str, ExtrinsicsParamsDefEnum]) –

half()
Return type

CameraExtrinsics

inv_transform_rays(ray_orig, ray_dir)

Transforms rays from camera space to world space (hence: “inverse transform”).

Apply rigid transformation of the camera extrinsics. The camera coordinates are cast to the precision of the vectors argument.

Parameters
  • ray_orig (torch.Tensor) – the origins of rays, of shape \((\text{num_rays}, 3)\) or \((\text{num_cameras}, \text{num_rays}, 3)\)

  • ray_dir (torch.Tensor) – the directions of rays, of shape \((\text{num_rays}, 3)\) or \((\text{num_cameras}, \text{num_rays}, 3)\)

Returns

the transformed ray origins and directions, of same shape than inputs

Return type

(torch.Tensor, torch.Tensor)

inv_view_matrix()

Returns the inverse of the view matrix used to convert vectors from camera to world coordinates (a.k.a: cam2world matrix). This matrix is column major:

\[\begin{split}\begin{bmatrix} r1 & u1 & f1 & px \\ r2 & u2 & f2 & py \\ r3 & u3 & f3 & pz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

with:

  • r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates

  • u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates

  • f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates

  • t: Position - the world origin in camera coordinates

if you’re using a different coordinate system, the axes may be permuted.

Returns

the inverse view matrix, of shape \((\text{num_cameras}, 4, 4)\)

Return type

(torch.Tensor)

move_forward(amount)

Translates the camera along the camera forward axis.

Parameters

amount (torch.Tensor or float) – Amount of translation, measured in world coordinates.

move_right(amount)

Translates the camera along the camera right axis.

Parameters

amount (torch.Tensor or float) – Amount of translation, measured in world coordinates

move_up(amount)

Translates the camera along the camera up axis.

Parameters

amount (torch.Tensor or float) – Amount of translation, measured in world coordinates.

named_params()

Get a descriptive list of named parameters per camera.

Returns

The named parameters.

Return type

(list of dict)

parameters()

Returns: (torch.Tensor):

the extrinsics parameters buffer. This is essentially the underlying representation of the extrinsics, and is backend dependant.

Return type

Tensor

property requires_grad: bool

True if the current extrinsics object allows gradient flow.

Note

All extrinsics backends allow gradient flow, but some are not guaranteed to maintain a rigid transformation view matrix.

reset_coordinate_system()

Resets the coordinate system back to the default one used by kaolin (right-handed cartesian: x pointing right, y pointing up, z pointing outwards)

rotate(yaw=None, pitch=None, roll=None)

Executes an inplace rotation of the camera using the given yaw, pitch, and roll amounts.

Input can be float / tensor float units will apply the same rotation on all cameras, where torch.Tensors allow for applying a per-camera rotation. Rotation is applied in camera space.

Parameters
  • yaw (torch.Tensor or float) – Amount of rotation in radians around normal direction of right-up plane

  • pitch (torch.Tensor or float) – Amount of rotation in radians around normal direction of right-forward plane

  • roll (torch.Tensor or float) – Amount of rotation in radians around normal direction of up-forward plane

switch_backend(backend_name)

Switches the representation backend to a different implementation.

Note

Manually switching the representation backend will hint kaolin it should turn off automatic backend selection. Users should normally use this manual feature if they’re testing a new type of representation. For most use cases, it is advised to let kaolin choose the representation backend automatically, and avoid using this function explicitly.

Warning

This function does not allow gradient flow, as it is error prone.

Parameters

backend_name (str) – the backend to switch to, must be a registered backend. Values supported by default: matrix_se3, matrix_6dof_rotation (see class description).

property t: Tensor

The position of world origin in camera coordinates, a torch.Tensor of shape \((\text{num_cameras}, 3, 1)\)

This is the t vector of the extrinsic matrix:

\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]

See also

cam_pos for the camera position in world coordinates.

to(*args, **kwargs)

An instance of this object with the parameters tensor on the given device. If the specified device is the same as this object, this object will be returned. Otherwise a new object with a copy of the parameters tensor on the requested device will be created.

See also

torch.Tensor.to()

Return type

CameraExtrinsics

transform(vectors)

Apply rigid transformation of the camera extrinsics such that objects in world coordinates are transformed to camera space coordinates.

The camera coordinates are cast to the precision of the vectors argument.

Parameters

vectors (torch.Tensor) – the vectors, of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)

Returns

the transformed vector, of same shape than vectors

Return type

(torch.Tensor)

translate(t)

Translates the camera in world coordinates. The camera orientation axes will not change.

Parameters

t (torch.Tensor) – Amount of translation in world space coordinates, of shape \((3,)\) or \((3, 1)\) broadcasting over all the cameras, or \((\text{num_cameras}, 3, 1)\) for applying unique translation per camera.

update(mat)

Updates extrinsics parameters to match the given view matrix.

Parameters

mat (torch.Tensor) – the new view matrix, of shape \((\text{num_cameras}, 4, 4)\)

view_matrix()

Returns a column major view matrix for converting vectors from world to camera coordinates (a.k.a: world2cam matrix):

\[\begin{split}\begin{bmatrix} r1 & r2 & r3 & tx \\ u1 & u2 & u3 & ty \\ f1 & f2 & f3 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]

with:

  • r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates

  • u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates

  • f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates

  • t: Position - the world origin in camera coordinates

if you’re using a different coordinate system, the axes may be permuted.

The matrix returned by this class supports pytorch differential operations

Note

practitioners are advised to choose a representation backend which supports differentiation of rigid transformations

Note

Changes modifying the returned tensor will also update the extrinsics parameters.

Returns

the view matrix, of shape \((\text{num_cameras}, 4, 4)\) (homogeneous coordinates)

Return type

(torch.Tensor)