Dataset

src.actions.dataset.create_dataset_structure(base_dir)[source]

Creates a skeleton dataset structure. Train, val, and test folders, each with embolism and no-embolism folders are created. A not-used folder for downsampled images is also created.

Parameters

base_dir (Union[Path, str]) – the directory where the dataset should be created, in either a pathlib Path or srt format

Return type

None

Returns

None

src.actions.dataset.move_data(lseq_list, mseq_list, dest_root_path, dest_folder='train')[source]

Populates the train folder in the dataset folder, where the dataset folder and its constituents were created using the create_dataset_structure function of this module.

Parameters
  • lseq_list (List[LeafSequence]) – list of LeafSequence objects

  • mseq_list (List[MaskSequence]) – list of MaskSequence objects

  • dest_root_path (Union[Path, str]) – destination root path; this can either be a Path object or a string

  • dest_folder (str) – destination folder; this is a folder in the destination root path

Return type

List[str]

Returns

None

Note

This function requires both leaves and masks to be in the same root directory

src.actions.dataset.downsample_dataset(dataset_root_path, filename_patterns, non_embolism_size=0.5)[source]

Downsamples a dataset, where the dataset was created using the create_dataset_structure and move_data functions.

Parameters
  • dataset_root_path (Union[Path, str]) – the root path of the dataset to downsample

  • filename_patterns (List[str]) – the filename patterns of the both the leaves and masks; this list has two elements

  • non_embolism_size (float) – the size of the no-embolism samples to keep

Return type

Tuple[List[List[str]], List[List[str]]]

Returns

two lists, the first has as elements a list of the embolism leaves and a list of the embolism masks, and the second as elements a list of the chosen no-embolism leaves and a list of the chosen no-embolism masks

src.actions.dataset.split_dataset(dataset_root_path, embolism_objects, non_embolism_objects, test_split=0.2, val_split=0.2)[source]

Splits a dataset into train, val, and test, by moving a portion of the train samples to val and test. The inputs for embolism objects and non-embolism objects are usually the outputs returned from the downsample_dataset function.

Parameters
  • dataset_root_path (Union[Path, str]) – the root path of the dataset to split

  • embolism_objects (List[List[str]]) – a list containing paths to embolism masks and leaves; list of leaves at item 0 and list of masks at item 1

  • non_embolism_objects (List[List[str]]) – list containing paths to non-embolism masks and leaves; list of leaves at item 0 and list of masks at item 1

  • test_split (float) – the percentage of the sample to use for the test set

  • val_split (float) – the percentage of the remaining sample, after the test set has been removed, to use for the validation set

Return type

None

Returns

None

src.actions.dataset.extract_dataset(lseq_list, mseq_list, dataset_path, downsample_split, test_split, val_split, lolo=None)[source]

Creates a dataset using a list of LeafSequence and MaskSequence objects

Parameters
  • lseq_list (List[LeafSequence]) – a list of LeafSequence objects

  • mseq_list (List[MaskSequence]) – a list of MaskSequence objects

  • dataset_path (Union[Path, str]) – the root path of where the dataset should be created

  • downsample_split (float) – the percentage to no-embolism samples to keep

  • test_split (float) – the percentage of the sample to use for the test set

  • val_split (float) – the percentage of the remaining sample, after the test set has been removed, to use for the validation set

  • lolo (Optional[int]) – the index of the leaf to leave out to use for testing, if a complete leaf should be used for testing; the index corresponds to the leafs position in the lseq_list and mseq_list

Return type

None

Returns

None

src.actions.dataset.flip_flop(leaf_image_array, mask_segmap, orientation, seed=3141)[source]

Reflects a sample on either on the x or y-axis

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • orientation (str) – whether to flip horizontally or vertically

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.translate_img(leaf_image_array, mask_segmap, x, y, seed=3141)[source]

Translates an image. The padding pixels are black.

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • x (float) – percentage to shift on the x-axis (between -1 and 1)

  • y (float) – percentage to shift on the y-axis (between -1 and 1)

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.rotate_img(leaf_image_array, mask_segmap, l, r, seed=3141)[source]

Rotates an image a random amount of degrees between (l,r). The padding pixels are black.

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • l (float) – degrees to rotate to the left

  • r (float) – degrees to rotate to the right

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.shear_img(leaf_image_array, mask_segmap, l, r, seed=3141)[source]

Shears an image a random amount of degrees between (l,r). The padding pixels are black.

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • l (float) – degrees to shear to the left

  • r (float) – degrees to shear to the right

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.crop_img(leaf_image_array, mask_segmap, v, h, seed=3141)[source]

Crops an image. The padding pixels are black.

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • v (float) – the percent to crop vertically

  • h (float) – the percent to crop horizontally

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.zoom_in_out(leaf_image_array, mask_segmap, x, y, seed=3141)[source]

Zooms in or out of an image. The padding pixels are black.

Parameters
  • leaf_image_array (array) – the input image

  • mask_segmap (SegmentationMapsOnImage) – the mask segmentation map

  • x (float) – % to zoom on the x-axis; 1 is 100%

  • y (float) – % to zoom on the x-axis; 1 is 100%

  • seed (int) – the random seed

Return type

Tuple[array, SegmentationMapsOnImage]

Returns

updated leaf input and mask

src.actions.dataset.save_image(leaf, mask, aug_type)[source]

Saves an augmented Leaf and Mask. The new filename includes the details of the augmentation.

Parameters
  • leaf (Leaf) – A Leaf object, with augmented image

  • mask (Mask) – A Mask object, with augmented image

  • aug_type (str) – the details of the augmentation to be added to the new filename

Return type

None

Returns

None

src.actions.dataset.augment_image(leaf, mask, df, aug_type, index, counts, func, **kwargs)[source]

Applies an augmentation to a sample. The augmented sample is rejected if the augmentation removes all embolisms from the image. If the augmentation is accepted, it is saved, and the aug_df is updated with the details of the augmentation. The updates to the df are made in place, so the df is mutated despite not being returned.

Parameters
  • leaf (array) – the input leaf

  • mask (array) – the input mask

  • df (DataFrame) – the augmentation df

  • aug_type (str) – the type of augmentation

  • index (int) – the index of the sample in the input df

  • counts (List[int]) – the counts of augmentation acceptance and rejection; the list has two elements

  • func – the augmentation function

  • kwargs – the kwargs for the augmentation function

Return type

List[int]

Returns

updated counts

src.actions.dataset.augmentation_algorithm(leaf, mask, aug_df, i, counts)[source]

Passes the sample through a series of possible augmentations: flip_flop, translate, zoom, crop, rotate, and shear. These augmentations are each applied with probability of 0.5. The augmented images are saved. The input DataFrame is updated with augmentations that were applied to the image. The count of augmentations is also updated.

Parameters
  • leaf (array) – the leaf to augment

  • mask (array) – the mask to augment

  • aug_df (DataFrame) – the augmentation df

  • i (int) – the position in the dataframe corresponding to the sample

  • counts (List[int]) – a list of counts, the first number is a count of times an augmentation was accepted and the second is the count of times an augmentation was rejected.

Return type

Tuple[DataFrame, List[int]]

Returns

None

src.actions.dataset.augment_dataset(lseq, mseq, **kwargs)[source]

Augments a dataset using the provided LeafSequence and MaskSequence. Both the LeafSequence and MaskSequence are usually created using the train folder from the dataset. The augmented files are saved in a folder called augmented at the common root folder of the leaf and mask sequence. A csv with the details of augmentation is also saved.

Parameters
Return type

None

Returns

None