kaolin.render.camera.PinholeIntrinsics

API

class kaolin.render.camera.PinholeIntrinsics(width, height, params, near=0.01, far=100.0)

Bases: CameraIntrinsics

Holds the intrinsics parameters of a pinhole camera: how it should project from camera space to normalized screen / clip space. The intrinsics parameters are used to define the lens attributes of the perspective projection matrix.

The pinhole camera explicitly exposes the projection transformation matrix. This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL). See documentation of CameraIntrinsics for numerous ways of how to use this class.

Kaolin assumes a left handed NDC coordinate system: after applying the projection matrix, the depth increases inwards into the screen.

The complete perspective matrix can be described by the following factorization:

\[ \begin{align}\begin{aligned}\begin{split}\text{FullProjectionMatrix} = &\text{Ortho} \times \text{Depth Scale} \times \text{Perspective} \\ = &\begin{bmatrix} 2/(r-l) & 0 & 0 & tx \\ 0 & 2/(t-b) & 0 & ty \\ 0 & 0 & -2/(f-n) & tz \\ 0 & 0 & 0 & 1 \end{bmatrix} \\ \times &\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & B & A \\ 0 & 0 & 0 & -1 \end{bmatrix} \\ \times &\begin{bmatrix} \text{focal_x} & 0 & -x0 & 0 \\ 0 & \text{focal_y} & -y0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix} \\\end{split}\\\begin{split}= \begin{bmatrix} 2*\text{focal_x}/(r - l) & 0 & -2x0/(r - l) - tx & 0 \\ 0 & 2*\text{focal_y}/(t - b) & -2y0/(t - b) - ty & 0 \\ 0 & 0 & V & U \\ 0 & 0 & -1 & 0 \end{bmatrix}\end{split}\end{aligned}\end{align} \]

where:

  • focal_x, focal_y, x0, y0: are the intrinsic parameters of the camera The focal length, together with the image plane width / height, determines the field of view (fov). This is the effective lens zoom of the scene. The principal point offsets: x0, y0 allow another DoF to translate the origin of the image plane. By default, kaolin assumes the NDC origin is at the canvas center (see projection_matrix())

  • n, f: are the near and far clipping planes, which define the min / max depth of the view frustum. The near and far planes are also used to normalize the depth values to normalized device coordinates (see ndc_matrix() documentation).

  • r, l, t, b: are the right, left, top and bottom borders of the view frustum, and are defined by the perspective fov (derived from the focal length) and image plane dimensions.

  • tx, ty, tz: are defined as:

    \(tx = -(r + l) / (r - l)\)

    \(ty = -(t + b) / (t - b)\)

    \(tz = -(f + n) / (f - n)\)

  • U, V: are elements which define the NDC range, see ndc_matrix() for an elaboration on how these are defined.

  • A, B: can be reverse engineered from U, V and are uniquely defined by them (and in fact serve a similar function).

This matrix sometimes appear in the literature in a slightly simplified form, for example, if the principal point offsets x0 = 0, y0 = 0 and the NDC coords are defined in the range \([-1, 1]\):

\[\begin{split}\begin{bmatrix} 2*\text{focal_x}/(r-l) & 0 & -tx & 0 \\ 0 & 2*\text{focal_y}/(t - b) & -ty & 0 \\ 0 & 0 & V & U \\ 0 & 0 & -1 & 0 \end{bmatrix}\end{split}\]

The resulting vector multiplied by this matrix is in homogeneous clip space, and requires division by the 4th coordinate (w) to obtain the final NDC coordinates.

Since the choice of NDC space is application dependent, kaolin maintains the separation of Perspective matrix, which depends only on the choice of intrinsic parameters from the Depth Scale and Ortho matrices, (which are squashed together to define the view frustum and NDC range).

See also

perspective_matrix() and ndc_matrix() functions.

This class is batched and may hold information from multiple cameras. Parameters are stored as a single tensor of shape \((\text{num_cameras}, 4)\).

The matrix returned by this class supports differentiable torch operations, which in turn may update the intrinsic parameters of the camera.

Parameters
DEFAULT_FAR = 100.0
DEFAULT_NEAR = 0.01
property cx: FloatTensor

The principal point X coordinate. Note: By default, the principal point is canvas center (kaolin defines the NDC origin at the canvas center).

property cy: FloatTensor

The principal point Y coordinate. Note: By default, the principal point is canvas center (kaolin defines the NDC origin at the canvas center).

property focal_x: FloatTensor
property focal_y: FloatTensor
fov(camera_fov_direction=CameraFOV.VERTICAL, in_degrees=True)

The field-of-view

Parameters
  • camera_fov_direction (CameraFOV) – the leading direction of the fov. Default: vertical

  • in_degrees (bool) – if True return result in degrees, else in radians. Default: True

Returns

the field-of-view, of shape \((\text{num_cameras},)\)

Return type

(torch.Tensor)

property fov_x

The field-of-view on horizontal leading direction

property fov_y

The field-of-view on vertical leading direction

classmethod from_focal(width, height, focal_x, focal_y=None, x0=None, y0=None, near=0.01, far=100.0, num_cameras=1, device=None, dtype=torch.float32)

Constructs a new instance of PinholeIntrinsics from focal length

Parameters
  • width (int) – width of the camera resolution

  • height (int) – height of the camera resolution

  • focal_x (float) – focal on x-axis

  • focal_y (optional, float) – focal on y-axis. Default: same that focal_x

  • x0 (optional, float) – horizontal offset from origin of the image plane (by default the center). Default: 0.

  • y0 (optional, float) – vertical offset origin of the image place (by default the center). Default: 0.

  • near (optional, float) – near clipping plane, define the min depth of the view frustrum or to normalize the depth values. Default: 1e-2

  • far (optional, float) – far clipping plane, define the max depth of teh view frustrum or to normalize the depth values. Default: 1e2

  • num_cameras (optional, int) – the numbers of camera in this object. Default: 1

  • device (optional, str) – the device on which parameters will be allocated. Default: cpu

  • dtype (optional, str) – the dtype on which parameters will be alloacted. Default: torch.float

Returns

the constructed pinhole camera intrinsics

Return type

(PinholeInstrinsics)

classmethod from_fov(width, height, fov, fov_direction=CameraFOV.VERTICAL, x0=0.0, y0=0.0, near=0.01, far=100.0, num_cameras=1, device=None, dtype=torch.float32)

Constructs a new instance of PinholeIntrinsics from field of view

Parameters
  • width (int) – width of the camera resolution

  • height (int) – height of the camera resolution

  • fov (float) – the field of view, in radians

  • fov_direction (optional, CameraFOV) – the direction of the field-of-view

  • x0 (optional, float) – horizontal offset from origin of the image plane (by default the center). Default: 0.

  • y0 (optional, float) – vertical offset origin of the image place (by default the center). Default: 0.

  • near (optional, float) – near clipping plane, define the min depth of the view frustrum or to normalize the depth values. Default: 1e-2

  • far (optional, float) – far clipping plane, define the max depth of teh view frustrum or to normalize the depth values. Default: 1e2

  • num_cameras (optional, int) – the numbers of camera in this object. Default: 1

  • device (optional, str) – the device on which parameters will be allocated. Default: cpu

  • dtype (optional, str) – the dtype on which parameters will be alloacted. Default: torch.float

Returns

the constructed pinhole camera intrinsics

Return type

(PinholeInstrinsics)

property height: int
property lens_type: str
ndc_matrix(left, right, bottom, top, near, far)

Constructs a matrix which performs the required transformation to project the scene onto the view frustum. (that is: it normalizes a cuboid-shaped view-frustum to clip coordinates, which are SCALED normalized device coordinates).

When used in conjunction with a perspective_matrix(), a transformation from camera view space to clip space can be obtained.

See also

projection_matrix() which combines both operations.

Note

This matrix actually converts coordinates to clip space, and requires an extra division by the w coordinates to obtain the NDC coordinates. However, it is named ndc_matrix as the elements are chosen carefully according to the definitions of the NDC space.

Vectors transformed by this matrix will reside in the kaolin clip space, which is left handed (depth increases in the direction that goes inwards the screen):

Y      Z
^    /
|  /
|---------> X

The final NDC coordinates can be obtained by dividing each vector by its w coordinate (perspective division).

!! NDC matrices depends on the choice of NDC space, and should therefore be chosen accordingly !! The ndc matrix is a composition of 2 matrices which define the view frustum:

\[\begin{split}ndc &= Ortho \times Depth Scale \\ &= \begin{bmatrix} 2. / (r - l) & 0. & 0. & tx \\ 0. & 2. / (t - b) & 0. & ty \\ 0. & 0. & -2. / (\text{far} - \text{near}) & tz \\ 0. & 0. & 0. & 1. \end{bmatrix} \times \begin{bmatrix} 1. & 0. & 0. & 0. \\ 0. & 1. & 0. & 0. \\ 0. & 0. & B & A \\ 0. & 0. & 0. & 1. \end{bmatrix} \\ &= \begin{bmatrix} 2. / (r - l) & 0. & 0. & -tx \\ 0. & 2. / (t - b) & 0. & -ty \\ 0. & 0. & U & V \\ 0. & 0. & 0. & -1. \end{bmatrix}\end{split}\]
  • n, f: are the near and far clipping planes, which define the min / max depth of the view frustum. The near and far planes are also used to normalize the depth values to normalized device coordinates.

  • r, l, t, b: are the right, left, top and bottom borders of the view frustum, and are defined by the perspective fov (derived from the focal length) and image plane dimensions.

  • tx, ty, tz: are defined as:

    \(tx = -(r + l) / (r - l)\)

    \(ty = -(t + b) / (t - b)\)

    \(tz = -(f + n) / (f - n)\)

  • U, V: are elements which define the NDC range.

  • A, B: can be reverse engineered from U, V and are uniquely defined by them (and in fact serve a similar function).

Input values are determined by the screen dimensions and intrinsic coordinate conventions, for example:

  1. \((\text{left}=0, \text{right}=\text{width}, \text{bottom}=\text{height}, \text{top}=0)\) for origin at top-left of the screen, y axis pointing downwards.

  2. \((\text{left}=-\dfrac{\text{width}}{2}, \text{right}=\dfrac{\text{width}}{2}, \text{bottom}=-\dfrac{\text{height}}{2}, \text{top}=\dfrac{\text{height}}{2})\) for origin at center of the screen, and y axis pointing upwards.

Parameters
  • left (float) – location of the left face of the view-frustum.

  • right (float) – location of the right face of the view-frustum.

  • bottom (float) – location of the bottom face of the view-frustum.

  • top (float) – location of the top face of the view-frustum.

  • near (float) – location of the near face of the view-frustum. Should always be larger than zero and smaller than the far clipping plane. If used in conjunction with a perspective matrix, the near clipping plane should be identical for both.

  • far (float) – location of the near face of the view-frustum. Should always be larger than the near clipping plane. If used in conjunction with a perspective matrix, the far clipping plane should be identical for both.

Returns

the ndc matrix, of shape \((1, 4, 4)\).

Return type

(torch.Tensor)

normalize_depth(depth)

Normalizes depth values to the NDC space defined by the view frustum.

Parameters

depth (torch.Tensor) – the depths to be normalized, of shape \((\text{num_depths},)\) or \((\text{num_cameras}, \text{num_depths})\)

Returns

The normalized depth values to the ndc range defined by the projection matrix, of shape \((\text{num_cameras}, \text{num_depths})\)

Return type

(torch.Tensor)

classmethod param_types()
Returns

an enum describing each of the intrinsic parameters managed by the pinhole camera. This enum also defines the order in which values are kept within the params buffer.

Return type

(IntrinsicsParamsDefEnum)

perspective_matrix()

Constructs a matrix which performs perspective projection from camera space to homogeneous clip space.

The perspective matrix embeds the pinhole camera intrinsic parameters, which together with the near / far clipping planes specifies how the view-frustum should be transformed into a cuboid-shaped space. The projection does not affect visibility of objects, but rather specifies how the 3D world should be down-projected to a 2D image.

This matrix does not perform clipping and is not concerned with NDC coordinates, but merely describes the perspective transformation itself. This leaves this matrix free from any api specific conventions of the NDC space.

When coupled with ndc_matrix(), the combination of these two matrices produces a complete perspective transformation from camera space to NDC space, which by default is aligned to traditional OpenGL standards. See also projection_matrix(), which produces a squashed matrix of these two operations together.

The logic essentially builds an torch autodiff compatible equivalent of the following tensor:

\[\begin{split}\text{perspective_matrix} = \begin{bmatrix} \text{focal_x} & 0. & -x0 & 0. \\ 0. & \text{focal_y} & -y0 & 0. \\ 0. & 0. & 0. & 1. \\ 0. & 0. & 1. & 0. \end{bmatrix}\end{split}\]

which is a modified form of the intrinsic camera matrix:

\[\begin{split}\begin{bmatrix} \text{focal_x} & 0. & x0 \\ 0. & \text{focal_y} & y0 \\ 0. & 0. & 1. \end{bmatrix}\end{split}\]
Returns

The perspective matrix, of shape \((\text{num_cameras}, 4, 4)\)

Return type

(torch.Tensor)

project(vectors)

Applies perspective projection to obtain Clip Coordinates (this function does not perform perspective division the actual Normalized Device Coordinates).

Assumptions:

  • Camera is looking down the negative “z” axis (that is: camera forward axis points outwards from screen, OpenGL compatible).

  • Practitioners are advised to keep near-far gap as narrow as possible, to avoid inherent depth precision errors.

Parameters

vectors (torch.Tensor) – the vectors to be transformed, can homogeneous of shape \((\text{num_vectors}, 4)\) or \((\text{num_cameras}, \text{num_vectors}, 4)\) or non-homogeneous of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)

Returns

the transformed vectors, of same shape as vectors but, with homogeneous coordinates,

where the last dim is 4

Return type

(torch.Tensor)

projection_matrix()

Creates an OpenGL compatible perspective projection matrix to clip coordinates. This is the default perspective projection matrix used by kaolin: it assumes the NDC origin is at the center of the canvas (hence x0, y0 offsets are measured relative to the center).

Returns

the projection matrix, of shape \((\text{num_cameras}, 4, 4)\)

Return type

(torch.Tensor)

tan_half_fov(camera_fov_direction=CameraFOV.VERTICAL)

tan(fov/2) in radians

Parameters

camera_fov_direction (optional, CameraFOV) – the leading direction of the fov. Default: vertical

Returns

tan(fov/2) in radians, of size \((\text{num_cameras},)\)

Return type

(torch.Tensor)

transform(vectors)

Applies perspective projection to obtain Normalized Device Coordinates (this function also performs perspective division).

Assumptions:

  • Camera is looking down the negative z axis (that is: Z axis points outwards from screen, OpenGL compatible).

  • Practitioners are advised to keep near-far gap as narrow as possible, to avoid inherent depth precision errors.

Parameters

vectors (torch.Tensor) – the vectors to be transformed, can homogeneous of shape \((\text{num_vectors}, 4)\) or \((\text{num_cameras}, \text{num_vectors}, 4)\) or non-homogeneous of shape \((\text{num_vectors}, 3)\) or \((\text{num_cameras}, \text{num_vectors}, 3)\)

Returns

the transformed vectors, of same shape as vectors but with non-homogeneous coords, e.g. the last dim 3

Return type

(torch.Tensor)

property width: int
property x0: FloatTensor

The horizontal offset from the NDC origin in image space By default, kaolin defines the NDC origin at the canvas center.

property y0: FloatTensor

The vertical offset from the NDC origin in image space By default, kaolin defines the NDC origin at the canvas center.

zoom(amount)

Applies a zoom on the camera by adjusting the lens.

Parameters

amount (torch.Tensor or float) – Amount of adjustment, measured in degrees. Mind the conventions - To zoom in, give a positive amount (decrease fov by amount -> increase focal length) To zoom out, give a negative amount (increase fov by amount -> decrease focal length)