kaolin.render.camera.CameraExtrinsics¶
API¶
- class kaolin.render.camera.CameraExtrinsics(backend, shared_fields=None)¶
Bases:
object
Holds the extrinsics parameters of a camera: position and orientation in space.
This class maintains the view matrix of camera, used to transform points from world coordinates to camera / eye / view space coordinates.
This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix:
\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]where R is a 3x3 rotation matrix and t is a 3x1 translation vector for the orientation and position respectively.
This class is batched and may hold information from multiple cameras.
CameraExtrinsics
relies on a dynamic representation backend to manage the tradeoff between various choices such as speed, or support for differentiable rigid transformations. Parameters are stored as a single tensor of shape \((\text{num_cameras}, K)\), where K is a representation specific number of parameters. Transformations and matrices returned by this class support differentiable torch operations, which in turn may update the extrinsic parameters of the camera:convert_to_mat Backend ---- > Extrinsics Representation R View Matrix M Shape (num_cameras, K), Shape (num_cameras, 4, 4) < ---- convert_from_mat
Note
Unless specified manually with
switch_backend()
, kaolin will choose the optimal representation backend depending on the status ofrequires_grad
.Note
Users should be aware, but not concerned about the conversion from internal representations to view matrices. kaolin performs these conversions where and if needed.
Supported backends:
“matrix_se3”: A flattened view matrix representation, containing the full information of special euclidean transformations (translations and rotations). This representation is quickly converted to a view matrix, but differentiable ops may cause the view matrix to learn an incorrect, non-orthogonal transformation.
“matrix_6dof_rotation”: A compact representation with 6 degrees of freedom, ensuring the view matrix remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.
Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the choice of the user. Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection matrix.
- Parameters
backend (ExtrinsicsRep) –
shared_fields (dict) –
- DEFAULT_BACKEND = 'matrix_se3'¶
- DEFAULT_DIFFERENTIABLE_BACKEND = 'matrix_6dof_rotation'¶
- property R: Tensor¶
A tensor whose columns represent the directions of world-axes in camera coordinates, of shape \((\text{num_cameras}, 3, 3)\).
This is the R submatrix of the extrinstic matrix:
\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]defined as:
\[\begin{split}R = \begin{bmatrix} r1 & r2 & r3 \\ u1 & u2 & u3 \\ f1 & f2 & f3 \end{bmatrix}\end{split}\]with:
r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates
u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates
f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates
- classmethod available_backends()¶
Returns: (iterable of str):
list of available representation backends, to be used with
switch_backend()
- property backend_name: str¶
the unique name used to register the currently used representation backend.
Values available by default:
“matrix_se3”: A flattened view matrix representation, containing the full information of special eucilidean transformations (translations and rotations). This representation is quickly converted to a view matrix, but differentiable ops may cause the view matrix to learn an incorrect, non-orthogonal transformation.
“matrix_6dof_rotation”: A compact representation with 6 degrees of freedom, ensuring the view matrix remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.
- property basis_change_matrix¶
The transformation matrix (permutation + reflections) used to change the coordinates system of this camera from the default cartesian one to another.
This matrix is manipulated by:
change_coordinate_system()
,reset_coordinate_system()
- cam_forward()¶
Returns the camera forward axis -
- Returns
the camera forward axis, in world coordinates.
- Return type
- cam_right()¶
Returns: (torch.Tensor): the camera right axis, in world coordinates
- Return type
- classmethod cat(cameras)¶
Concatenate multiple CameraExtrinsics’s.
Assumes all cameras use the same coordinate system. (kaolin will not alert if not, the coordinate system will be selected as the first camera)
- Parameters
cameras (Sequence of CameraExtrinsics) – the cameras extrinsics to concatenate.
- Returns
The concatenated cameras extrinsics as a single CameraExtrinsics
- Return type
- change_coordinate_system(basis_change)¶
Applies a coordinate system change using the given 3x3 permutation & reflections matrix.
For instance:
From a Y-up coordinate system (cartesian) to Z-up:
\[\begin{split}\text{basis_change} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & -1 \\ 0 & 1 & 0 \end{bmatrix}\end{split}\]From a right handed coordinate system (Z pointing outwards) to a left handed one (Z pointing inwards):
\[\begin{split}\text{basis_change} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & -1 \end{bmatrix}\end{split}\]The basis_change is assumed to have a determinant of +1 or -1.
See also
- Parameters
basis_change (numpy.ndarray or torch.Tensor) – a composition of axes permutation and reflections, of shape \((3, 3)\)
- cpu()¶
- Return type
- cuda()¶
- Return type
- double()¶
- Return type
- float()¶
- Return type
- classmethod from_camera_pose(cam_pos, cam_dir, dtype=torch.float32, device=None, requires_grad=False, backend=None)¶
Constructs the extrinsics from the camera pose and orientation in world coordinates.
- Parameters
cam_pos (numpy.ndarray or torch.Tensor) – the location of the camera center in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)
cam_dir (numpy.ndarray or torch.Tensor) – the camera’s orientation with respect to the world, of shape \((3, 3)\) or \((\text{num_cameras}, 3, 3)\)
dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None,
torch.get_default_dtype()
will be useddevice (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used
requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics
backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as
None
to let kaolin automatically select the optimal backend. Valid values:matrix_se3
,matrix_6dof_rotation
(see class description).
- Returns
the camera extrinsics
- Return type
- classmethod from_lookat(eye, at, up, dtype=torch.float32, device=None, requires_grad=False, backend=None)¶
Constructs the extrinsic from camera position, camera up vector, and destination the camera is looking at.
This constructor is compatible with glm’s lookat function, which by default assumes a cartesian right-handed coordinate system (z axis positive direction points outwards from screen).
- Parameters
eye (numpy.ndarray or torch.Tensor) – the location of the camera center in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)
up (numpy.ndarray or torch.Tensor) – the vector pointing up from the camera in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)
at (numpy.ndarray or torch.Tensor) – the direction the camera is looking at in world-coordinates, of shape \((3,)\), \((3, 1)\), \((\text{num_cameras}, 3)\) or \((\text{num_cameras}, 3, 1)\)
dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None, the
torch.get_default_dtype()
will be useddevice (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used
requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics
backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as
None
to let kaolin automatically select the optimal backend. Valid values:matrix_se3
,matrix_6dof_rotation
(see class description).
- Returns
the camera extrinsics
- Return type
- classmethod from_view_matrix(view_matrix, dtype=torch.float32, device=None, requires_grad=False, backend=None)¶
Constructs the extrinsics from a given view matrix of shape \((\text{num_cameras}, 4, 4)\).
The matrix should be a column major view matrix, for converting vectors from world to camera coordinates (a.k.a: world2cam matrix):
\[\begin{split}\begin{bmatrix} r1 & r2 & r3 & tx \\ u1 & u2 & u3 & ty \\ f1 & f2 & f3 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]with:
r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates
u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates
f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates
t: Position - the world origin in camera coordinates
if you’re using a different coordinate system, the axes may be permuted.
See also
- Parameters
view_matrix (numpy.ndarray or torch.Tensor) – view matrix, of shape \((\text{num_cameras}, 4, 4)\)
dtype (optional, str) – the dtype used for the tensors managed by the CameraExtrinsics. If dtype is None, the
torch.get_default_dtype()
will be useddevice (optional, str) – the device on which the CameraExtrinsics object will manage its tensors. If device is None, the default torch device will be used
requires_grad (bool) – Sets the requires_grad field for the params tensor of the CameraExtrinsics
backend (str) – The backend used to manage the internal representation of the extrinsics, and how it is converted to a view matrix. Different representations are tuned to varied use cases: speed, differentiability w.r.t rigid transformations space, and so forth. Normally this should be left as
None
to let kaolin automatically select the optimal backend. Valid values:matrix_se3
,matrix_6dof_rotation
(see class description).
- Returns
the camera extrinsics
- Return type
- gradient_mask(*args)¶
Creates a gradient mask, which allows to backpropagate only through params designated as trainable.
This function does not consider the
requires_grad
field when creating this mask.Note
The 3 camera axes are always masked as trainable together. This design choice ensures that these axes, as well as the view matrix, remain orthogonal.
- Parameters
*args (Union[str, ExtrinsicsParamsDefEnum]) – A vararg list of the extrinsics params that should allow gradient flow. This function also supports conversion of params from their string names. (i.e: ‘t’ will convert to
ExtrinsicsParamsDefEnum.t
)- Return type
Example
>>> # equivalent to: mask = extrinsics.gradient_mask(ExtrinsicsParamsDefEnum.t) >>> mask = extrinsics.gradient_mask('t') >>> extrinsics.params.register_hook(lambda grad: grad * mask.float()) >>> # extrinsics will now allow gradient flow only for the camera location
- half()¶
- Return type
- inv_transform_rays(ray_orig, ray_dir)¶
Transforms rays from camera space to world space (hence: “inverse transform”).
Apply rigid transformation of the camera extrinsics. The camera coordinates are cast to the precision of the vectors argument.
- Parameters
ray_orig (torch.Tensor) – the origins of rays, of shape \((\text{num_rays}, 3)\) or \((\text{num_cameras}, \text{num_rays}, 3)\)
ray_dir (torch.Tensor) – the directions of rays, of shape \((\text{num_rays}, 3)\) or \((\text{num_cameras}, \text{num_rays}, 3)\)
- Returns
the transformed ray origins and directions, of same shape than inputs
- Return type
- inv_view_matrix()¶
Returns the inverse of the view matrix used to convert vectors from camera to world coordinates (a.k.a: cam2world matrix). This matrix is column major:
\[\begin{split}\begin{bmatrix} r1 & u1 & f1 & px \\ r2 & u2 & f2 & py \\ r3 & u3 & f3 & pz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]with:
r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates
u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates
f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates
t: Position - the world origin in camera coordinates
if you’re using a different coordinate system, the axes may be permuted.
See also
- Returns
the inverse view matrix, of shape \((\text{num_cameras}, 4, 4)\)
- Return type
- move_forward(amount)¶
Translates the camera along the camera forward axis.
- Parameters
amount (torch.Tensor or float) – Amount of translation, measured in world coordinates.
- move_right(amount)¶
Translates the camera along the camera right axis.
- Parameters
amount (torch.Tensor or float) – Amount of translation, measured in world coordinates
- move_up(amount)¶
Translates the camera along the camera up axis.
- Parameters
amount (torch.Tensor or float) – Amount of translation, measured in world coordinates.
- named_params()¶
Get a descriptive list of named parameters per camera.
- Returns
The named parameters.
- Return type
(list of dict)
- parameters()¶
Returns: (torch.Tensor):
the extrinsics parameters buffer. This is essentially the underlying representation of the extrinsics, and is backend dependant.
- Return type
- property requires_grad: bool¶
True if the current extrinsics object allows gradient flow.
Note
All extrinsics backends allow gradient flow, but some are not guaranteed to maintain a rigid transformation view matrix.
- reset_coordinate_system()¶
Resets the coordinate system back to the default one used by kaolin (right-handed cartesian: x pointing right, y pointing up, z pointing outwards)
- rotate(yaw=None, pitch=None, roll=None)¶
Executes an inplace rotation of the camera using the given yaw, pitch, and roll amounts.
Input can be float / tensor float units will apply the same rotation on all cameras, where torch.Tensors allow for applying a per-camera rotation. Rotation is applied in camera space.
- Parameters
yaw (torch.Tensor or float) – Amount of rotation in radians around normal direction of right-up plane
pitch (torch.Tensor or float) – Amount of rotation in radians around normal direction of right-forward plane
roll (torch.Tensor or float) – Amount of rotation in radians around normal direction of up-forward plane
- switch_backend(backend_name)¶
Switches the representation backend to a different implementation.
Note
Manually switching the representation backend will hint kaolin it should turn off automatic backend selection. Users should normally use this manual feature if they’re testing a new type of representation. For most use cases, it is advised to let kaolin choose the representation backend automatically, and avoid using this function explicitly.
Warning
This function does not allow gradient flow, as it is error prone.
- Parameters
backend_name (str) – the backend to switch to, must be a registered backend. Values supported by default:
matrix_se3
,matrix_6dof_rotation
(see class description).
- property t: Tensor¶
The position of world origin in camera coordinates, a torch.Tensor of shape \((\text{num_cameras}, 3, 1)\)
This is the t vector of the extrinsic matrix:
\[\begin{split}\begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix}\end{split}\]See also
cam_pos
for the camera position in world coordinates.
- to(*args, **kwargs)¶
An instance of this object with the parameters tensor on the given device. If the specified device is the same as this object, this object will be returned. Otherwise a new object with a copy of the parameters tensor on the requested device will be created.
See also
torch.Tensor.to()
- Return type
- transform(vectors)¶
Apply rigid transformation of the camera extrinsics such that objects in world coordinates are transformed to camera space coordinates.
The camera coordinates are cast to the precision of the vectors argument.
- Parameters
vectors (torch.Tensor) – the vectors, of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)
- Returns
the transformed vector, of same shape than
vectors
- Return type
- translate(t)¶
Translates the camera in world coordinates. The camera orientation axes will not change.
- Parameters
t (torch.Tensor) – Amount of translation in world space coordinates, of shape \((3,)\) or \((3, 1)\) broadcasting over all the cameras, or \((\text{num_cameras}, 3, 1)\) for applying unique translation per camera.
- update(mat)¶
Updates extrinsics parameters to match the given view matrix.
- Parameters
mat (torch.Tensor) – the new view matrix, of shape \((\text{num_cameras}, 4, 4)\)
- view_matrix()¶
Returns a column major view matrix for converting vectors from world to camera coordinates (a.k.a: world2cam matrix):
\[\begin{split}\begin{bmatrix} r1 & r2 & r3 & tx \\ u1 & u2 & u3 & ty \\ f1 & f2 & f3 & tz \\ 0 & 0 & 0 & 1 \end{bmatrix}\end{split}\]with:
r: Right - world x axis, in camera coordinates, also the camera right axis, in world coordinates
u: Up - world y axis, in camera coordinates, also the camera up axis, in world coordinates
f: Forward - world z axis, in camera coordinates, also the camera forward axis, in world coordinates
t: Position - the world origin in camera coordinates
if you’re using a different coordinate system, the axes may be permuted.
See also
The matrix returned by this class supports pytorch differential operations
Note
practitioners are advised to choose a representation backend which supports differentiation of rigid transformations
Note
Changes modifying the returned tensor will also update the extrinsics parameters.
- Returns
the view matrix, of shape \((\text{num_cameras}, 4, 4)\) (homogeneous coordinates)
- Return type