Data Model

class src.data_model.data_model.LeafSequence(folder_path=None, filename_pattern=None, file_list=None, creation_mode=False)[source]

A sequence of full size Leaf Images

extract_changed_leaves(output_path, dif_len=1, overwrite=False, shift_256=False, combination_function=<function 'subtract_modulo'>)[source]

Extracts and saves changed leaf images. This uses the filepath list created when the leaf sequence is instantiated.

Parameters
  • output_path (str) – where the differenced leaves should be saved

  • dif_len (int) – the step size between the leaves to be differenced

  • overwrite (bool) – whether images that exist at the same file path should be overwritten

  • shift_256 (bool) – whether images should be shifted by 256; this also means that images will saved as uint16

  • combination_function – the combination function to be used; the default is to difference leaves

Return type

None

Returns

None

load_extracted_images(load_image=False, disable_pb=False, shift_256=False, transform_uint8=False)[source]

Instantiates Leaf objects using the file_list attribute and appends these objects to the image_objects attribute.

Parameters
  • load_image (bool) – whether to load the image array belonging to Leaf being created

  • disable_pb (bool) – whether the progress bar should be disabled

  • shift_256 (bool) – whether images should be shifted by 256; applies if load_image is true

  • transform_uint8 (bool) – whether images transformed to a uint8 format; applies if load_image is true

Return type

None

Returns

None

load_image_array(disable_pb=False, shift_256=False, transform_uint8=False)[source]

Loads all image arrays belonging to the Leaf objects in the sequence.

Parameters
  • disable_pb (bool) – whether the progress bar should be disabled

  • shift_256 (bool) – whether images should be shifted by 256

  • transform_uint8 (bool) – whether images transformed to a uint8 format

Return type

None

Returns

None

load_tile_sequence(load_image=False, folder_path=None, filename_pattern=None, shift_256=False, transform_uint8=False)[source]

Loads all tile objects belonging to the Leaf objects in the sequence.

Parameters
  • load_image (bool) – whether the tile arrays should also be loaded

  • folder_path (Optional[str]) – the folder path of the tiles

  • filename_pattern (Optional[str]) – the filename pattern of the tiles

  • shift_256 (bool) – whether images should be shifted by 256; applies if load_image is true

  • transform_uint8 (bool) – whether images transformed to a uint8 format; applies if load_image is true

Return type

None

Returns

None

predict_leaf_sequence(model, x_tile_length=None, y_tile_length=None, memory_saving=True, overwrite=False, save_prediction=True, shift_256=False, transform_uint8=False, threshold=0.5, **kwargs)[source]

Predicts segmentation maps using the Leaves in the sequence. The model used should implement a predict tile method. If memory saving is set to false a prediction array is assigned to each Leaf object in the sequence.

Parameters
  • model (Model) – a model which inherits Model and hence implements a predict tile method

  • x_tile_length (Optional[int]) – the x length of the tile used in the original training

  • y_tile_length (Optional[int]) – the y length of the tile used in the original training

  • memory_saving (bool) – if set to True, both the image array and prediction array are set to None; this should only be set to true if the predictions are being saved

  • overwrite (bool) – whether images that exist at the same file path should be overwritten

  • save_prediction (bool) – whether the prediction should be saved

  • shift_256 (bool) – whether images should be shifted by 256

  • transform_uint8 (bool) – whether images transformed to a uint8 format

  • threshold (float) – the threshold to use when saving predictions; i.e. a pixel is saved as an embolism if p(embolism) > threshold

  • kwargs – kwargs for the predict tile function

Return type

None

Returns

None

get_databunch_dataframe(embolism_only=False, csv_name=None)[source]

Extracts a databunch dataframe using the images in this sequence. The first field is the leaf path and the second field is the mask name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • embolism_only (bool) – whether only leaves with embolisms should be used

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, str]

Returns

DataBunch DF and sequence root folder path

get_tile_databunch_df(mseq, tile_embolism_only=False, leaf_embolism_only=False, csv_name=None)[source]

Extracts a combined databunch df using all tiles belonging to the Image objects in the sequence. The first field is the leaf tile path and the second field is the mask tile name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • mseq – a MaskSequence object

  • tile_embolism_only (bool) – whether only tiles with embolisms should be used

  • leaf_embolism_only (bool) – whether only leaves with embolisms should be

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, List[str]]

Returns

combined DataBunch DF and list of image root folder path

class src.data_model.data_model.MaskSequence(mpf_path=None, folder_path=None, filename_pattern=None, file_list=None, creation_mode=False)[source]

A sequence of full size Mask Images

extract_mask_from_multipage(output_path, overwrite=False, binarise=False)[source]

Extracts and saves mask images from a multipage file.

Parameters
  • output_path (str) – where the masks should be saved

  • overwrite (bool) – whether images that exist at the same file path should be overwritten

  • binarise (bool) – whether the masks should be binarised; i.e 0 indicates no embolism and 1 indicates embolism

Return type

None

Returns

None

load_extracted_images(load_image=False, disable_pb=False)[source]

Instantiates Mask objects using the file_list attribute and appends these objects to the image_objects attribute.

Parameters
  • load_image (bool) – whether to load the image array belonging to Mask being created

  • disable_pb (bool) – whether the progress bar should be disabled

Return type

None

Returns

None

load_image_array(disable_pb=False)[source]

Loads all image arrays belonging to the Leaf objects in the sequence.

Parameters

disable_pb (bool) – whether the progress bar should be disabled

Return type

None

Returns

None

load_tile_sequence(load_image=False, folder_path=None, filename_pattern=None)[source]

Loads all tile objects belonging to the Mask objects in the sequence.

Parameters
  • load_image (bool) – whether the tile arrays should also be loaded

  • folder_path (Optional[str]) – the folder path of the tiles

  • filename_pattern (Optional[str]) – the filename pattern of the tiles

Return type

None

Returns

None

get_databunch_dataframe(embolism_only=False, csv_name=None)[source]

Extracts a databunch dataframe using the images in this sequence. The first field is the leaf path and the second field is the mask name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • embolism_only (bool) – whether only leaves with embolisms should be used

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, str]

Returns

DataBunch DF and sequence root folder path

get_tile_databunch_df(lseq, tile_embolism_only=False, leaf_embolism_only=False, csv_name=None)[source]

Extracts a combined databunch df using all tiles belonging to the Image objects in the sequence. The first field is the leaf tile path and the second field is the mask tile name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • mseq – a MaskSequence object

  • tile_embolism_only (bool) – whether only tiles with embolisms should be used

  • leaf_embolism_only (bool) – whether only leaves with embolisms should be

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, List[str]]

Returns

combined DataBunch DF and list of image root folder path

binarise_sequence(disable_pb=False)[source]

Binarises all masks in the sequence.

Parameters

disable_pb (bool) – whether the progress bar should be disabled

Return type

None

Returns

None

class src.data_model.data_model.Leaf(path=None, sequence_parent=None, parents=None, folder_path=None, filename_pattern=None, file_list=None)[source]

A full Leaf Image

extract_me(filepath, combination_function=<function 'subtract_modulo'>, shift_256=False, overwrite=False)[source]

Extracts and saves changed leaf images. The extracted image and file path are stored in the image_array and path attributes respectively

Parameters
  • filepath (~.) – the filepath to save the extracted image

  • combination_function – the combination function to apply to images parents

  • shift_256 – whether the extracted image should be shifted by 256

  • overwrite (bool) – whether an image that exist at the same file path should be overwritten

Return type

None

Returns

None

load_extracted_images(load_image=False, disable_pb=False, shift_256=False, transform_uint8=False)[source]

Loads LeafTiles belonging to the Leaf.

Parameters
  • load_image (bool) – whether to load the image array belonging to LeafTile being created

  • disable_pb (bool) – whether the progress bar should be disabled

  • shift_256 (bool) – whether images should be shifted by 256; applies if load_image is true

  • transform_uint8 (bool) – whether images transformed to a uint8 format; applies if load_image is true

Return type

None

Returns

None

tile_me(length_x, stride_x, length_y, stride_y, output_path=None, overwrite=False)[source]

Tiles an image and creates LeafTile objects. These are appended to the image_object attribute.

Parameters
  • length_x (int) – the x-length of the tile

  • stride_x (int) – the size of the x stride

  • length_y (int) – the y-length of the tile

  • stride_y (int) – the size of the y stride

  • output_path (Optional[str]) – output path of where the tiles should be saved; if no path is provided, tiles are saved in a default location

  • overwrite (bool) – whether tiles that exist at the same file path should be overwritten

Return type

None

Returns

None

predict_leaf(model, x_tile_length=None, y_tile_length=None, memory_saving=True, overwrite=False, save_prediction=True, shift_256=False, transform_uint8=False, threshold=0.5, **kwargs)[source]

Predict segmentation maps using the Leaf objects image_array. The model used should implement a predict tile method. If memory saving is set to false a prediction array is assigned to the Leaf object.

Parameters
  • model – a model which inherits Model and hence implements a predict tile method

  • x_tile_length (Optional[int]) – the x length of the tile used in the original training

  • y_tile_length (Optional[int]) – the y length of the tile used in the original training

  • memory_saving (bool) – if set to True, both the image array and prediction array are set to None; this should only be set to true if the predictions are being saved

  • overwrite (bool) – whether images that exist at the same file path should be overwritten

  • save_prediction (bool) – whether the prediction should be saved

  • shift_256 (bool) – whether images should be shifted by 256

  • transform_uint8 (bool) – whether images transformed to a uint8 format

  • threshold (float) – the threshold to use when saving predictions; i.e. a pixel is saved as an embolism if p(embolism) > threshold

  • kwargs – kwargs for the predict tile function

Return type

None

Returns

None

get_databunch_dataframe(embolism_only=False, csv_name=None)[source]

Extracts a databunch dataframe using the tiles in this Leaf. The first field is the leaf tile path and the second field is the mask tile name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • embolism_only (bool) – whether only leaves with embolisms should be used

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, str]

Returns

DataBunch DF and sequence root folder path

class src.data_model.data_model.Mask(path=None, sequence_parent=None, folder_path=None, filename_pattern=None, file_list=None)[source]

A full Mask Image

create_mask(filepath, image, overwrite=False, binarise=False)[source]

Saves the PIL image at the provided file path. The image and file path are stored in the image_array and path attributes respectively.

Parameters
  • filepath (Union[Path, str]) – the filepath to save the extracted image (as a Path, or string)

  • image – the mask image (as a PIL image)

  • overwrite (bool) – whether an image that exist at the same file path should be overwritten

  • binarise (bool) – whether the mask should be binarised; this assumes that embolisms are indicated by a pixel intensity of 255

Return type

None

Returns

None

load_extracted_images(load_image=False, disable_pb=False)[source]

Loads MaskTiles belonging to the Mask.

Parameters
  • load_image (bool) – whether to load the image array belonging to LeafTile being created

  • disable_pb (bool) – whether the progress bar should be disabled

Return type

None

Returns

None

tile_me(length_x, stride_x, length_y, stride_y, output_path=None, overwrite=False)[source]

Tiles an image and creates MaskTile objects. These are appended to the image_object attribute.

Parameters
  • length_x (int) – the x-length of the tile

  • stride_x (int) – the size of the x stride

  • length_y (int) – the y-length of the tile

  • stride_y (int) – the size of the y stride

  • output_path (Optional[str]) – output path of where the tiles should be saved; if no path is provided, tiles are saved in a default location

  • overwrite (bool) – whether tiles that exist at the same file path should be overwritten

Return type

None

Returns

None

get_databunch_dataframe(embolism_only=False, csv_name=None)[source]

Extracts a databunch dataframe using the tiles in this Mask. The first field is the leaf tile path and the second field is the mask tile name. This is useful for Fastai. If a csv name is provided the DataFrame is saved.

Parameters
  • embolism_only (bool) – whether only leaves with embolisms should be used

  • csv_name (Optional[str]) – the name of the csv, which can also be a path; if this not provided, the DF will not be save

Return type

Tuple[DataFrame, str]

Returns

DataBunch DF and sequence root folder path

class src.data_model.data_model.MaskTile(path=None, sequence_parent=None)[source]

A Mask tile

class src.data_model.data_model.LeafTile(path=None, sequence_parent=None)[source]

A Leaf tile

predict_tile(model, memory_saving=True, **kwargs)[source]

Predicts and returns a segmentation map using the tile image.

Parameters
  • model (Model) – a model which inherits Model and hence implements a predict tile method

  • memory_saving (bool) – if set to True, the prediction array is not saved

  • kwargs – kwargs for the predict tile function

Return type

array

Returns

the prediction