geococo.utils

Functions

mask_label(input_raster, label)

Masks out an label from input_raster and flattens it to a 2D binary array. If it

window_intersect(input_raster, input_vector)

Generates a Rasterio Window from the intersecting extents of the input data. It

reshape_image(img_array, shape[, padding_value])

Reshapes 3D numpy array to match given 3D shape, done through slicing or padding.

generate_window_polygon(datasource, window)

Turns the spatial bounds of a given window to a shapely.Polygon object in a given

generate_window_offsets(window, schema)

Computes an array of window offsets bound by a given window.

window_factory(parent_window, schema[, boundless])

Generator that produces rasterio.Window objects in predetermined steps, within

estimate_average_bounds(gdf[, quantile])

Estimates the average size of all features in a GeoDataFrame.

estimate_schema(gdf, src[, quantile, window_bounds])

Attempts to find a schema that is able to represent the average GeoDataFrame

validate_labels(labels[, id_attribute, ...])

Validates all necessary attributes for a geococo-viable GeoDataFrame. It also

update_labels(labels, categories[, id_attribute, ...])

Updates labels with validated (super)category names and ids from given Category

get_date_created(raster_source)

Get the creation date of the input image, represented as a datetime object.

Module Contents

geococo.utils.mask_label(input_raster, label)[source]

Masks out an label from input_raster and flattens it to a 2D binary array. If it doesn’t overlap, the resulting mask will only consist of False bools.

Parameters:
  • input_raster (rasterio.io.DatasetReader) – open rasterio DatasetReader for the input raster

  • label (Union[shapely.geometry.Polygon, shapely.geometry.MultiPolygon]) – Polygon object representing the area to be masked (i.e. label)

Returns:

A 2D binary array representing the label

Return type:

numpy.ndarray

geococo.utils.window_intersect(input_raster, input_vector)[source]

Generates a Rasterio Window from the intersecting extents of the input data. It also verifies if the input data share the same CRS and if they physically overlap.

Parameters:
  • input_raster (rasterio.io.DatasetReader) – rasterio dataset (i.e. input image)

  • input_vector (geopandas.GeoDataFrame) – geopandas geodataframe (i.e. input labels)

Returns:

rasterio window that represent the intersection between input data extents

Return type:

rasterio.windows.Window

geococo.utils.reshape_image(img_array, shape, padding_value=0)[source]

Reshapes 3D numpy array to match given 3D shape, done through slicing or padding.

Parameters:
  • img_array (numpy.ndarray) – the numpy array to be reshaped

  • shape (Tuple[int, int, int]) – the desired shape (bands, rows, cols)

  • padding_value (int) – what value to pad img_array with (if too small)

Returns:

numpy array in desired shape

Return type:

numpy.ndarray

geococo.utils.generate_window_polygon(datasource, window)[source]

Turns the spatial bounds of a given window to a shapely.Polygon object in a given dataset’s CRS.

Parameters:
  • datasource (rasterio.io.DatasetReader) – a rasterio DatasetReader object that provides the affine transformation

  • window (rasterio.windows.Window) – bounds to represent as Polygon

Returns:

shapely Polygon representing the spatial bounds of a given window in a given CRS

Return type:

shapely.geometry.Polygon

geococo.utils.generate_window_offsets(window, schema)[source]

Computes an array of window offsets bound by a given window.

Parameters:
  • window (rasterio.windows.Window) – the bounding window (i.e. offsets will be within its bounds)

  • schema (geococo.window_schema.WindowSchema) – the parameters for the window generator

Returns:

an array of window offsets within the bounds of window

Return type:

numpy.ndarray

geococo.utils.window_factory(parent_window, schema, boundless=True)[source]

Generator that produces rasterio.Window objects in predetermined steps, within the given Window.

Parameters:
  • parent_window (rasterio.windows.Window) – the window that provides the bounds for all child_window objects

  • schema (geococo.window_schema.WindowSchema) – the parameters that determine the window steps

  • boundless (bool) – whether the child_window should be clipped by the parent_window or not

Yield:

a rasterio.Window used for windowed reading/writing

Return type:

Generator[rasterio.windows.Window, None, None]

geococo.utils.estimate_average_bounds(gdf, quantile=0.9)[source]

Estimates the average size of all features in a GeoDataFrame.

Parameters:
  • gdf (geopandas.GeoDataFrame) – GeoDataFrame that contains all features (i.e. shapely.Geometry objects)

  • quantile (float) – what quantile will represent the feature population

Returns:

a tuple of floats representing average width and height

Return type:

Tuple[float, float]

geococo.utils.estimate_schema(gdf, src, quantile=0.9, window_bounds=[(256, 256), (512, 512)])[source]

Attempts to find a schema that is able to represent the average GeoDataFrame feature (i.e. sufficient overlap) but within the bounds given by window_bounds.

Parameters:
  • gdf (geopandas.GeoDataFrame) – GeoDataFrame that contains features that determine the degree of overlap

  • src (rasterio.io.DatasetReader) – The rasterio DataSource associated with the resulting schema (i.e. bounds and pixelsizes)

  • quantile (float) – what quantile will represent the feature population

  • window_bounds (List[Tuple[int, int]]) – a list of possible limits for the window generators

Returns:

(if found) a viable WindowSchema with sufficient overlap within the window_bounds

Return type:

geococo.window_schema.WindowSchema

geococo.utils.validate_labels(labels, id_attribute='category_id', name_attribute=None, super_attribute=None)[source]

Validates all necessary attributes for a geococo-viable GeoDataFrame. It also checks for the presence of either category_id or category_name values and ensures valid geometry.

Parameters:
  • labels (geopandas.GeoDataFrame) – GeoDataFrame containing labels and category attributes

  • id_attribute (Optional[str]) – Column name that holds category_id values

  • name_attribute (Optional[str]) – Column name that holds category_name values

  • super_attribute (Optional[str]) – Column name that holds supercategory values

Returns:

Validated GeoDataFrame with coerced dtypes

Return type:

geopandas.GeoDataFrame

geococo.utils.update_labels(labels, categories, id_attribute='category_id', name_attribute=None)[source]

Updates labels with validated (super)category names and ids from given Category instances (i.e. source of truth created from current and previous labels). This ultimately just matches a given key (name or id) with keys in each Category instance and maps the associated (and validated) values to labels.

Parameters:
  • labels (geopandas.GeoDataFrame) – GeoDataFrame containing labels and category attributes (validated by validate_labels)

  • categories (List[geococo.coco_models.Category]) – list of Category instances created from current and previous labels

  • id_attribute (Optional[str]) – Column name that holds category_id values

  • name_attribute (Optional[str]) – Column name that holds category_name values

Returns:

labels with name, id and supercategory attributes from all given Category instances

Return type:

geopandas.GeoDataFrame

geococo.utils.get_date_created(raster_source)[source]

Get the creation date of the input image, represented as a datetime object. If no such information is available in the image’s metadata, we return the date the file itself was last modified.

Parameters:

raster_source (rasterio.io.DatasetReader) – reader for input image

Returns:

datetime object representing date_created

Return type:

datetime.datetime