kaolin.render.camera.PinholeIntrinsics¶

API¶

class kaolin.render.camera.PinholeIntrinsics(width, height, params, near=0.01, far=100.0)¶

Bases: CameraIntrinsics

Holds the intrinsics parameters of a pinhole camera: how it should project from camera space to normalized screen / clip space. The intrinsics parameters are used to define the lens attributes of the perspective projection matrix.

The pinhole camera explicitly exposes the projection transformation matrix. This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL). See documentation of CameraIntrinsics for numerous ways of how to use this class.

Kaolin assumes a left handed NDC coordinate system: after applying the projection matrix, the depth increases inwards into the screen.

The complete perspective matrix can be described by the following factorization:

\[ \begin{align}\begin{aligned}\begin{split}\text{FullProjectionMatrix} = &\text{Ortho} \times \text{Depth Scale} \times \text{Perspective} \\ = &\begin{bmatrix} 2/(r-l) & 0 & 0 & tx \\ 0 & 2/(t-b) & 0 & ty \\ 0 & 0 & -2/(f-n) & tz \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ \times &\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & B & A \\ 0 & 0 & 0 & -1 \end{bmatrix} \\ \times &\begin{bmatrix} \text{focal_x} & 0 & -x0 & 0 \\ 0 & \text{focal_y} & -y0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \\\end{split}\\\begin{split}= \begin{bmatrix} 2*\text{focal_x}/(r - l) & 0 & -2x0/(r - l) - tx & 0 \\ 0 & 2*\text{focal_y}/(t - b) & -2y0/(t - b) - ty & 0 \\ 0 & 0 & V & U \\ 0 & 0 & -1 & 0 \end{bmatrix}\end{split}\end{aligned}\end{align} \]

where:

focal_x, focal_y, x0, y0: are the intrinsic parameters of the camera The focal length, together with the image plane width / height, determines the field of view (fov). This is the effective lens zoom of the scene. The principal point offsets: x0, y0 allow another DoF to translate the origin of the image plane. By default, kaolin assumes the NDC origin is at the canvas center (see projection_matrix())

n, f: are the near and far clipping planes, which define the min / max depth of the view frustum. The near and far planes are also used to normalize the depth values to normalized device coordinates (see ndc_matrix() documentation).

r, l, t, b: are the right, left, top and bottom borders of the view frustum, and are defined by the perspective fov (derived from the focal length) and image plane dimensions.

tx, ty, tz: are defined as:

\(tx = -(r + l) / (r - l)\)

\(ty = -(t + b) / (t - b)\)

\(tz = -(f + n) / (f - n)\)

U, V: are elements which define the NDC range, see ndc_matrix() for an elaboration on how these are defined.

A, B: can be reverse engineered from U, V and are uniquely defined by them (and in fact serve a similar function).

This matrix sometimes appear in the literature in a slightly simplified form, for example, if the principal point offsets x0 = 0, y0 = 0 and the NDC coords are defined in the range \([-1, 1]\):

\[\begin{split}\begin{bmatrix} 2*\text{focal_x}/(r-l) & 0 & -tx & 0 \\ 0 & 2*\text{focal_y}/(t - b) & -ty & 0 \\ 0 & 0 & V & U \\ 0 & 0 & -1 & 0 \end{bmatrix}\end{split}\]

The resulting vector multiplied by this matrix is in homogeneous clip space, and requires division by the 4th coordinate (w) to obtain the final NDC coordinates.

Since the choice of NDC space is application dependent, kaolin maintains the separation of Perspective matrix, which depends only on the choice of intrinsic parameters from the Depth Scale and Ortho matrices, (which are squashed together to define the view frustum and NDC range).

See also

projection_matrix() which combines both operations.

Note

This matrix actually converts coordinates to clip space, and requires an extra division by the w coordinates to obtain the NDC coordinates. However, it is named ndc_matrix as the elements are chosen carefully according to the definitions of the NDC space.

Vectors transformed by this matrix will reside in the kaolin clip space, which is left handed (depth increases in the direction that goes inwards the screen):

Y      Z
^    /
|  /
|---------> X

The final NDC coordinates can be obtained by dividing each vector by its w coordinate (perspective division).

!! NDC matrices depends on the choice of NDC space, and should therefore be chosen accordingly !! The ndc matrix is a composition of 2 matrices which define the view frustum:

\[\begin{split}ndc &= Ortho \times Depth Scale \\ &= \begin{bmatrix} 2. / (r - l) & 0. & 0. & tx \\ 0. & 2. / (t - b) & 0. & ty \\ 0. & 0. & -2. / (\text{far} - \text{near}) & tz \\ 0. & 0. & 0. & 1. \end{bmatrix} \times \begin{bmatrix} 1. & 0. & 0. & 0. \\ 0. & 1. & 0. & 0. \\ 0. & 0. & B & A \\ 0. & 0. & 0. & 1. \end{bmatrix} \\ &= \begin{bmatrix} 2. / (r - l) & 0. & 0. & -tx \\ 0. & 2. / (t - b) & 0. & -ty \\ 0. & 0. & U & V \\ 0. & 0. & 0. & -1. \end{bmatrix}\end{split}\]

n, f: are the near and far clipping planes, which define the min / max depth of the view frustum. The near and far planes are also used to normalize the depth values to normalized device coordinates.
r, l, t, b: are the right, left, top and bottom borders of the view frustum, and are defined by the perspective fov (derived from the focal length) and image plane dimensions.
tx, ty, tz: are defined as:

\(tx = -(r + l) / (r - l)\)

\(ty = -(t + b) / (t - b)\)

\(tz = -(f + n) / (f - n)\)
U, V: are elements which define the NDC range.
A, B: can be reverse engineered from U, V and are uniquely defined by them (and in fact serve a similar function).

Input values are determined by the screen dimensions and intrinsic coordinate conventions, for example:

\((\text{left}=0, \text{right}=\text{width}, \text{bottom}=\text{height}, \text{top}=0)\) for origin at top-left of the screen, y axis pointing downwards.

\((\text{left}=-\dfrac{\text{width}}{2}, \text{right}=\dfrac{\text{width}}{2}, \text{bottom}=-\dfrac{\text{height}}{2}, \text{top}=\dfrac{\text{height}}{2})\) for origin at center of the screen, and y axis pointing upwards.

Parameters

left (float) – location of the left face of the view-frustum.
right (float) – location of the right face of the view-frustum.
bottom (float) – location of the bottom face of the view-frustum.
top (float) – location of the top face of the view-frustum.
near (float) – location of the near face of the view-frustum. Should always be larger than zero and smaller than the far clipping plane. If used in conjunction with a perspective matrix, the near clipping plane should be identical for both.
far (float) – location of the near face of the view-frustum. Should always be larger than the near clipping plane. If used in conjunction with a perspective matrix, the far clipping plane should be identical for both.

Returns

the ndc matrix, of shape \((1, 4, 4)\).

Return type

(torch.Tensor)

normalize_depth(depth)¶

Normalizes depth values to the NDC space defined by the view frustum.

Parameters: depth (torch.Tensor) – the depths to be normalized, of shape \((\text{num_depths},)\) or \((\text{num_cameras}, \text{num_depths})\)
Returns: The normalized depth values to the ndc range defined by the projection matrix, of shape \((\text{num_cameras}, \text{num_depths})\)
Return type: (torch.Tensor)

classmethod param_types()¶

Returns: an enum describing each of the intrinsic parameters managed by the pinhole camera. This enum also defines the order in which values are kept within the params buffer.
Return type: (IntrinsicsParamsDefEnum)

perspective_matrix()¶

Constructs a matrix which performs perspective projection from camera space to homogeneous clip space.

The perspective matrix embeds the pinhole camera intrinsic parameters, which together with the near / far clipping planes specifies how the view-frustum should be transformed into a cuboid-shaped space. The projection does not affect visibility of objects, but rather specifies how the 3D world should be down-projected to a 2D image.

This matrix does not perform clipping and is not concerned with NDC coordinates, but merely describes the perspective transformation itself. This leaves this matrix free from any api specific conventions of the NDC space.

When coupled with ndc_matrix(), the combination of these two matrices produces a complete perspective transformation from camera space to NDC space, which by default is aligned to traditional OpenGL standards. See also projection_matrix(), which produces a squashed matrix of these two operations together.

The logic essentially builds an torch autodiff compatible equivalent of the following tensor:

\[\begin{split}\text{perspective_matrix} = \begin{bmatrix} \text{focal_x} & 0. & -x0 & 0. \\ 0. & \text{focal_y} & -y0 & 0. \\ 0. & 0. & 0. & 1. \\ 0. & 0. & 1. & 0. \end{bmatrix}\end{split}\]

which is a modified form of the intrinsic camera matrix:

\[\begin{split}\begin{bmatrix} \text{focal_x} & 0. & x0 \\ 0. & \text{focal_y} & y0 \\ 0. & 0. & 1. \end{bmatrix}\end{split}\]

Returns: The perspective matrix, of shape \((\text{num_cameras}, 4, 4)\)
Return type: (torch.Tensor)

project(vectors)¶

Applies perspective projection to obtain Clip Coordinates (this function does not perform perspective division the actual Normalized Device Coordinates).

Assumptions:

Camera is looking down the negative “z” axis (that is: camera forward axis points outwards from screen, OpenGL compatible).
Practitioners are advised to keep near-far gap as narrow as possible, to avoid inherent depth precision errors.

Parameters

vectors (torch.Tensor) – the vectors to be transformed, can homogeneous of shape \((\text{num_vectors}, 4)\) or \((\text{num_cameras}, \text{num_vectors}, 4)\) or non-homogeneous of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)

Returns

the transformed vectors, of same shape as vectors but, with homogeneous coordinates,: where the last dim is 4

Return type

(torch.Tensor)

projection_matrix()¶

Creates an OpenGL compatible perspective projection matrix to clip coordinates. This is the default perspective projection matrix used by kaolin: it assumes the NDC origin is at the center of the canvas (hence x0, y0 offsets are measured relative to the center).

Returns: the projection matrix, of shape \((\text{num_cameras}, 4, 4)\)
Return type: (torch.Tensor)

tan_half_fov(camera_fov_direction=CameraFOV.VERTICAL)¶

tan(fov/2) in radians

Parameters: camera_fov_direction (optional, CameraFOV) – the leading direction of the fov. Default: vertical
Returns: tan(fov/2) in radians, of size \((\text{num_cameras},)\)
Return type: (torch.Tensor)

transform(vectors)¶

Applies perspective projection to obtain Normalized Device Coordinates (this function also performs perspective division).

Assumptions:

Camera is looking down the negative z axis (that is: Z axis points outwards from screen, OpenGL compatible).
Practitioners are advised to keep near-far gap as narrow as possible, to avoid inherent depth precision errors.

Parameters: vectors (torch.Tensor) – the vectors to be transformed, can homogeneous of shape \((\text{num_vectors}, 4)\) or \((\text{num_cameras}, \text{num_vectors}, 4)\) or non-homogeneous of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)
Returns: the transformed vectors, of same shape as vectors but with non-homogeneous coords, e.g. the last dim 3
Return type: (torch.Tensor)

property width: int¶

property x0: FloatTensor¶: The horizontal offset from the NDC origin in image space By default, kaolin defines the NDC origin at the canvas center.

property y0: FloatTensor¶: The vertical offset from the NDC origin in image space By default, kaolin defines the NDC origin at the canvas center.

zoom(amount)¶

Applies a zoom on the camera by adjusting the lens.

Parameters: amount (torch.Tensor or float) – Amount of adjustment, measured in degrees. Mind the conventions - To zoom in, give a positive amount (decrease fov by amount -> increase focal length) To zoom out, give a negative amount (increase fov by amount -> decrease focal length)