Point Cloud

Scalabel currently supports labeling 3D bounding boxes on point cloud data. Data must be supplied in PLY format. Labels may be pre-loaded. See examples folder for sample pre-loaded data.

2D vs 3D

The process of labeling and working with 3D data is very different from that of 2D data. The biggest difference lies in how the data is visualized. Because our computer screens are 2D, it is very easy to render 2D data by simply placing it in a rectangle on a part of the screen. However, in order to view 3D data, we must simulate a camera to provide a 2D surface for viewing the data, and project the 3D data onto the camera just as how light would hit the sensor of a real camera. The camera has a position in the 3D space as well as a rotation, and can be moved around or rotated to change the portion of the data that is shown.

3D Rendering

Labeling in 3D also becomes very different. In 2D data, the point you click on the screen corresponds directly to where you click in the image. However, that is not the case for 3D. When you click on a position in the screen, we only have information of the X and Y coordinates on your screen. We have to make some guesses to get that 3rd dimension, which can cause things to move in unexpected ways. However, the methods we have developed for interacting with the 3D labels take care of most of these issues. More details can be found in the sections below.

Coordinate Systems and Axes

When working with 3D data, many times we will reference an ‘axis’ or ‘axes’ of the data to describe movement. An axis is simply a standardized direction in the data like forward/backward or north/south. Usually we use the X, Y, and Z axes when labeling 3D data, with the Z axis pointing “up”, the Y axis “forward”, and the X axis “right”.

A coordinate system or a frame of reference is a set of axes and an “origin”. The origin is a point in 3 dimensions (x, y, z) from which all other positions in the world are referenced. For example, if we were standing on a street looking at a car in front of us, we could say that the origin is located at our position and that the car is 5 m along the y axis, giving it a coordinate of (0, 5, 0).

During labeling, there are typically 3 frames to keep in mind: the world frame, the camera frame, and the object frame. When the direction of the axes are specified as relative to that of the sensor, the Z axis pointing towards the top of the sensor, the Y axis to the front, and the X axis to the right, and the origin is set at the location of the sensor, the coordinate system is that of the world frame. When the axes are set relative to the camera and the origin at the camera position, where forward is looking through the lens, the coordinate system is that of the camera frame. Lastly, when the axes’ directions are those of the object being labeled and the origin is set at the center of the object, the coordinate system is that of the object frame. Each object can have a different frame of reference.


The camera can be moved around in the scene to see different parts of the data. The camera can either be rotated around a point called the target in space, whose location is shown as a set of three lines (seen on the right) in the directions of the world axes, or translated along with this point in a certain direction. The table below has more details about how to move the camera. There are also videos providing visual demonstrations.

Shortcut Keys



Click + Drag

Rotate camera around target

Right Click + Drag

Move camera and target in the direction of the drag

W, A, S, D

Move camera forward, left, back, right


Move camera down


Move camera up

Double click

Double click on a point to move the target to the point and maintain the camera’s distance from the target


Zoom camera in and out to the target

In addition to the shortcuts, there are a few buttons located on the top left of each pane which can be used to adjust the camera movement:





Lock camera rotation to camera’s z-axis


Lock camera rotation to camera’s x-axis


Move camera to be on x-axis looking at target. If locked to selection, lock camera to box’s x-axis


Move camera to be on y-axis looking at target. If locked to selection, lock camera to box’s y-axis


Move camera to be on z-axis looking at target. If locked to selection, lock camera to box’s z-axis


Flip camera direction when locking to selection or moving camera to axis


Sync the target locations of all cameras


Lock camera to selected box. Target will always be at center of the box. Camera will be on one of the axes


Reset camera to default position & orientation

Camera Demos


Camera Dragging - right click and drag


Camera Rotation - left click and drag


Camera Zoom - mouse scroll


Move Target to Point - double click on point


Camera Axis Lock


Align Camera to Axis

3D Bounding Box Labeling

To add a 3D box, first navigate the target to the location where you wish the center of the box to be placed. Press the space key to add the box. Now you can choose to modify the box by dragging on one of the four control spheres which show up on the corners of the face closest to the camera, or by using the transformation controller. Dragging the corners will maintain the position of the opposite corner and only change two of the dimensions of the box. The transformation controller can be used to modify the rotation, translation, and scale of the box along one of the axes of the box, or along the world axes. The shortcut keys listed below can be used to change the mode.

When finished editing, you may press Escape to unselect the box. To edit a box that is not selected, you may either directly drag one of the corners or double click on it to select it. A box is selected when the transformation control appears.

Shortcut Keys




Add a new box


Change controller to scaling mode


Change controller to rotation mode


Change controller to translation mode


Toggle transformation controller axes between object and world frame

Box Editing Demos


Box Rotation


Box Translation


Box Scaling


Dragging Control Points

Sample Box Labeling Procedure

It may help to add more than one pane and lock the camera to the currently selected box to make the labeling process easier. Below is a video demonstrating one way of setting up the interface to take advantage of these features. There are four panes, three of which are aligned to one of the selection’s axes and the last being freely moving. To add a box, the target of the freely moving pane is moved to be near the object’s center. Then a box is added by pressing space. The rotation and size of the box is adjusted using the locked panes and verified using the freely moving pane.


Car Labeling

This section details some guidelines that may be helpful when labeling cars.


Determining object orientation in 3d space can be tricky. Luckily, cars are rigid (for the most part) and have neat mappings to 3d boxes. Generally, the forward direction of the car should be considered the vector in the direction from the center of mass to the hood. This should be the same as the forward direction of the world frame, the y-axis in most lidar data. This generally determines the orientation of the car about the vertical axis, which is typically the z-axis in lidar data. To determine the orientation about the other two axes, the box should be rotated so that the bottom face (the one closest to the road) should have the same normal as the road surface.


Most cars in point clouds are generally incomplete as they are either occluded by other objects or self-occluded. In these cases, it is easiest and most straightforward to label only the visible points, and not guess the extent of the car. The sides of the box should fit as closely to the roof, hood, and doors of the car as possible.