concepts.hw_interface.george_vision.object_centric_vision.ObjectCentricVisionPipeline#

class ObjectCentricVisionPipeline[source]#

Bases: object

The pipeline for object-centric vision.

Methods

construct_foreground_by_cropping([x_range, ...])

Constructs the foreground segmentation by cropping the point cloud.

construct_object_completion_by_extrusion([...])

Construct object completion by extrusion the top surface of the table.

construct_object_meshes([alpha, verbose])

Construct meshes for all detected objects using alpha shape.

construct_object_segmentation([...])

Constructs the object segmentation.

construct_pointcloud([camera_transform])

Constructs the point cloud from the color and depth images.

construct_table_segmentation([method])

Constructs the table segmentation.

export_scene(output_dir[, verbose])

Export the scene to a directory.

run_default_pipeline(camera_transformation)

Running the entire pipeline with the default parameters.

visualize_objects_2d([rect_th, text_size, ...])

Visualize the detected objects back-projected to the color image.

visualize_objects_3d([name, ...])

Visualize the detected objects in 3D.

visualize_objects_raw_detection()

Visualize the raw detection results in 2D.

visualize_pcd(name)

Visualize a point cloud.

Attributes

detected_objects

The result of object detection in the scene.

foreground_pcd

The point cloud of the foreground.

foreground_segmentation

The segmentation of the foreground, in the form of a boolean array of the same length as the point cloud.

image_detected_objects

The result of object detection in the image.

pcd

The point cloud of the scene.

table_model

The model of the table, in the form of a length-4 array representing the plane equation ax + by + cz + d = 0.

table_pcd

The point cloud of the table.

table_segmentation

The segmentation of the table, in the form of a boolean array of the same length as the point cloud.

tabletop_pcd

The point cloud of the tabletop.

tabletop_segmentation

The segmentation of the tabletop, in the form of a boolean array of the same length as the point cloud.

__init__(color_image, depth_image)[source]#

Initializes the pipeline.

Parameters:
  • color_image (ndarray) – The color image.

  • depth_image (ndarray) – The depth image.

__new__(**kwargs)#
construct_foreground_by_cropping(x_range=None, y_range=None, z_range=None, depth_range=None)[source]#

Constructs the foreground segmentation by cropping the point cloud.

Parameters:
  • x_range (Tuple[float, float] | None) – the range of x-axis in the world coordinate system.

  • y_range (Tuple[float, float] | None) – the range of y-axis in the world coordinate system.

  • z_range (Tuple[float, float] | None) – the range of z-axis in the world coordinate system.

  • depth_range (Tuple[float, float] | None) – the range of depth in the camera coordinate system.

Returns:

the foreground segmentation as a boolean array of shape (N,). N is the number of points in the point cloud.

Return type:

ndarray

construct_object_completion_by_extrusion(min_height=0.05, verbose=False)[source]#

Construct object completion by extrusion the top surface of the table.

Parameters:
  • min_height (float | None) – the minimum height of an object. If specified as None, skip the thin object filtering step. Otherwise, if an object is too thin, it will be extruded to the minimum height.

  • verbose (bool) – if True, print out debug information.

construct_object_meshes(alpha=0.1, verbose=False)[source]#

Construct meshes for all detected objects using alpha shape.

Parameters:
  • alpha (float) – the alpha value for alpha shape.

  • verbose (bool) – if True, print out debug information.

construct_object_segmentation(image_segmentation_model=None, detection_min_points=50, enforce_detected_points=True, dbscan=True, dbscan_eps=0.01, dbscan_min_samples=500, dbscan_min_points=1500, dbscan_with_sam=True, sam_model=None, sam_max_points=100000, outlier_filter=True, outlier_filter_radius=0.005, outlier_filter_min_neighbors=50, min_height=0.015, verbose=False)[source]#

Constructs the object segmentation. The segmentation is performed in the following steps:

  1. If image_segmentation_model is not None, use the image segmentation model to segment the image.

  2. If dbscan is True, use DBSCAN to cluster the remaining points from the previous step.

  3. If outlier_filter is True, use radius outlier filter to remove outliers for each object obtained from the previous two steps.

  4. Use height filter to remove objects that are too thin.

Parameters:
  • image_segmentation_model (ImageBasedPCDSegmentationModel | None) – the image segmentation model. If None, skip the image segmentation step.

  • detection_min_points (int) – the minimum number of points for an object to be considered as a valid object.

  • enforce_detected_points (bool) – if True, keep all points that are detected by the image segmentation model. Otherwise, only keep the points that are detected by the image segmentation model and are also in the “tabletop” segmentation, which equals foreground setminus table.

  • dbscan (bool) – if True, use DBSCAN to cluster the remaining points from the previous step.

  • dbscan_eps (float) – the maximum distance between two samples for one to be considered as in the neighborhood of the other.

  • dbscan_min_samples (int) – the number of samples in a neighborhood for a point to be considered as a core point.

  • dbscan_min_points (int) – the minimum number of points for an object to be considered as a valid object.

  • dbscan_with_sam (bool) – if True, use SAM to filter the points before DBSCAN.

  • sam_model (PointGuidedImageSegmentationModel | None) – the SAM model. If None, skip the SAM step.

  • sam_max_points (int) – the maximum number of points to use for SAM. This option is useful to avoid the table being detected as an object.

  • outlier_filter (bool) – if True, use radius outlier filter to remove outliers for each object obtained from the previous two steps.

  • outlier_filter_radius (float) – the radius of the sphere that will determine which points are neighbors.

  • outlier_filter_min_neighbors (int) – the minimum number of neighbors that a point must have to be considered as an inlier.

  • min_height (float) – the minimum height of an object.

  • verbose (bool) – if True, print out debug information.

construct_pointcloud(camera_transform=None)[source]#

Constructs the point cloud from the color and depth images.

Parameters:

camera_transform (CameraTransformation | None) – the camera transformation. If None, reconstruct the point cloud in the camera coordinate system.

Returns:

the point cloud.

Return type:

PointCloud

construct_table_segmentation(method='ransac', **kwargs)[source]#

Constructs the table segmentation.

Parameters:

method (str) – the method to use for table segmentation. Available methods are {‘ransac’, ‘threshold’}.

Returns:

the table segmentation as a boolean array of shape (N,). N is the number of points in the point cloud.

export_scene(output_dir, verbose=False)[source]#

Export the scene to a directory. All the metadata will be saved as “metainfo.json”.

Parameters:
  • output_dir (str) – the output directory.

  • verbose (bool)

run_default_pipeline(camera_transformation, x_range=(0.2, 1.0), y_range=(-0.5, 0.5), z_range=(-0.1, 0.5), depth_range=(0.05, 2.0), table_ransac_distance_threshold=0.01, table_ransac_n=3, table_ransac_num_iterations=1000, image_segmentation_model=None, detection_min_points=50, enforce_detected_points=True, dbscan=True, dbscan_eps=0.01, dbscan_min_samples=500, dbscan_min_points=1500, dbscan_with_sam=True, sam_model=None, sam_max_points=50000, outlier_filter=True, outlier_filter_radius=0.005, outlier_filter_min_neighbors=50, min_height=0.015, extrusion_min_height=0.03, mesh_alpha=0.1)[source]#

Running the entire pipeline with the default parameters.

Parameters:
visualize_objects_2d(rect_th=3, text_size=1, text_th=3)[source]#

Visualize the detected objects back-projected to the color image.

Parameters:
  • rect_th (int) – the thickness of the rectangle.

  • text_size (int) – the size of the text.

  • text_th (int) – the thickness of the text.

visualize_objects_3d(name='mesh', overlay_background_pcd=False)[source]#

Visualize the detected objects in 3D.

Parameters:
  • name (str) – the name of the geometry to visualize. Can be one of the following: - ‘mesh’: the reconstructed mesh. - ‘pcd’: the reconstructed point cloud. - ‘raw_pcd’: the raw point cloud.

  • overlay_background_pcd (bool) – whether to overlay the background point cloud.

visualize_objects_raw_detection()[source]#

Visualize the raw detection results in 2D.

visualize_pcd(name)[source]#

Visualize a point cloud. It will also visualize two coordinate frames: the camera frame and the world frame.

Parameters:

name (str) – the name of the point cloud to visualize. Can be one of the following: - ‘full’: the full point cloud. - ‘foreground’: the foreground point cloud. - ‘table’: the table point cloud. - ‘tabletop’: the tabletop point cloud.

property detected_objects: List[ObjectDetectionResult]#

The result of object detection in the scene.

property foreground_pcd: PointCloud#

The point cloud of the foreground.

property foreground_segmentation: ndarray#

The segmentation of the foreground, in the form of a boolean array of the same length as the point cloud.

property image_detected_objects: InstanceSegmentationResult#

The result of object detection in the image.

property pcd: PointCloud#

The point cloud of the scene.

property table_model: ndarray#

The model of the table, in the form of a length-4 array representing the plane equation ax + by + cz + d = 0.

property table_pcd: PointCloud#

The point cloud of the table.

property table_segmentation: ndarray#

The segmentation of the table, in the form of a boolean array of the same length as the point cloud.

property tabletop_pcd: PointCloud#

The point cloud of the tabletop. Tabletop is defined as the foreground points that are not table points.

property tabletop_segmentation: ndarray#

The segmentation of the tabletop, in the form of a boolean array of the same length as the point cloud. Tabletop is defined as the foreground points that are not table points.