Dataset¶
-
src.actions.dataset.create_dataset_structure(base_dir)[source]¶ Creates a skeleton dataset structure. Train, val, and test folders, each with embolism and no-embolism folders are created. A not-used folder for downsampled images is also created.
- Parameters
base_dir (
Union[Path,str]) – the directory where the dataset should be created, in either a pathlib Path or srt format- Return type
None- Returns
None
-
src.actions.dataset.move_data(lseq_list, mseq_list, dest_root_path, dest_folder='train')[source]¶ Populates the train folder in the dataset folder, where the dataset folder and its constituents were created using the create_dataset_structure function of this module.
- Parameters
lseq_list (
List[LeafSequence]) – list of LeafSequence objectsmseq_list (
List[MaskSequence]) – list of MaskSequence objectsdest_root_path (
Union[Path,str]) – destination root path; this can either be a Path object or a stringdest_folder (
str) – destination folder; this is a folder in the destination root path
- Return type
List[str]- Returns
None
Note
This function requires both leaves and masks to be in the same root directory
-
src.actions.dataset.downsample_dataset(dataset_root_path, filename_patterns, non_embolism_size=0.5)[source]¶ Downsamples a dataset, where the dataset was created using the create_dataset_structure and move_data functions.
- Parameters
dataset_root_path (
Union[Path,str]) – the root path of the dataset to downsamplefilename_patterns (
List[str]) – the filename patterns of the both the leaves and masks; this list has two elementsnon_embolism_size (
float) – the size of the no-embolism samples to keep
- Return type
Tuple[List[List[str]],List[List[str]]]- Returns
two lists, the first has as elements a list of the embolism leaves and a list of the embolism masks, and the second as elements a list of the chosen no-embolism leaves and a list of the chosen no-embolism masks
-
src.actions.dataset.split_dataset(dataset_root_path, embolism_objects, non_embolism_objects, test_split=0.2, val_split=0.2)[source]¶ Splits a dataset into train, val, and test, by moving a portion of the train samples to val and test. The inputs for embolism objects and non-embolism objects are usually the outputs returned from the downsample_dataset function.
- Parameters
dataset_root_path (
Union[Path,str]) – the root path of the dataset to splitembolism_objects (
List[List[str]]) – a list containing paths to embolism masks and leaves; list of leaves at item 0 and list of masks at item 1non_embolism_objects (
List[List[str]]) – list containing paths to non-embolism masks and leaves; list of leaves at item 0 and list of masks at item 1test_split (
float) – the percentage of the sample to use for the test setval_split (
float) – the percentage of the remaining sample, after the test set has been removed, to use for the validation set
- Return type
None- Returns
None
-
src.actions.dataset.extract_dataset(lseq_list, mseq_list, dataset_path, downsample_split, test_split, val_split, lolo=None)[source]¶ Creates a dataset using a list of LeafSequence and MaskSequence objects
- Parameters
lseq_list (
List[LeafSequence]) – a list of LeafSequence objectsmseq_list (
List[MaskSequence]) – a list of MaskSequence objectsdataset_path (
Union[Path,str]) – the root path of where the dataset should be createddownsample_split (
float) – the percentage to no-embolism samples to keeptest_split (
float) – the percentage of the sample to use for the test setval_split (
float) – the percentage of the remaining sample, after the test set has been removed, to use for the validation setlolo (
Optional[int]) – the index of the leaf to leave out to use for testing, if a complete leaf should be used for testing; the index corresponds to the leafs position in the lseq_list and mseq_list
- Return type
None- Returns
None
-
src.actions.dataset.flip_flop(leaf_image_array, mask_segmap, orientation, seed=3141)[source]¶ Reflects a sample on either on the x or y-axis
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation maporientation (
str) – whether to flip horizontally or verticallyseed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.translate_img(leaf_image_array, mask_segmap, x, y, seed=3141)[source]¶ Translates an image. The padding pixels are black.
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation mapx (
float) – percentage to shift on the x-axis (between -1 and 1)y (
float) – percentage to shift on the y-axis (between -1 and 1)seed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.rotate_img(leaf_image_array, mask_segmap, l, r, seed=3141)[source]¶ Rotates an image a random amount of degrees between (l,r). The padding pixels are black.
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation mapl (
float) – degrees to rotate to the leftr (
float) – degrees to rotate to the rightseed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.shear_img(leaf_image_array, mask_segmap, l, r, seed=3141)[source]¶ Shears an image a random amount of degrees between (l,r). The padding pixels are black.
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation mapl (
float) – degrees to shear to the leftr (
float) – degrees to shear to the rightseed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.crop_img(leaf_image_array, mask_segmap, v, h, seed=3141)[source]¶ Crops an image. The padding pixels are black.
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation mapv (
float) – the percent to crop verticallyh (
float) – the percent to crop horizontallyseed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.zoom_in_out(leaf_image_array, mask_segmap, x, y, seed=3141)[source]¶ Zooms in or out of an image. The padding pixels are black.
- Parameters
leaf_image_array (
array) – the input imagemask_segmap (
SegmentationMapsOnImage) – the mask segmentation mapx (
float) – % to zoom on the x-axis; 1 is 100%y (
float) – % to zoom on the x-axis; 1 is 100%seed (
int) – the random seed
- Return type
Tuple[array,SegmentationMapsOnImage]- Returns
updated leaf input and mask
-
src.actions.dataset.save_image(leaf, mask, aug_type)[source]¶ Saves an augmented Leaf and Mask. The new filename includes the details of the augmentation.
-
src.actions.dataset.augment_image(leaf, mask, df, aug_type, index, counts, func, **kwargs)[source]¶ Applies an augmentation to a sample. The augmented sample is rejected if the augmentation removes all embolisms from the image. If the augmentation is accepted, it is saved, and the aug_df is updated with the details of the augmentation. The updates to the df are made in place, so the df is mutated despite not being returned.
- Parameters
leaf (
array) – the input leafmask (
array) – the input maskdf (
DataFrame) – the augmentation dfaug_type (
str) – the type of augmentationindex (
int) – the index of the sample in the input dfcounts (
List[int]) – the counts of augmentation acceptance and rejection; the list has two elementsfunc – the augmentation function
kwargs – the kwargs for the augmentation function
- Return type
List[int]- Returns
updated counts
-
src.actions.dataset.augmentation_algorithm(leaf, mask, aug_df, i, counts)[source]¶ Passes the sample through a series of possible augmentations: flip_flop, translate, zoom, crop, rotate, and shear. These augmentations are each applied with probability of 0.5. The augmented images are saved. The input DataFrame is updated with augmentations that were applied to the image. The count of augmentations is also updated.
- Parameters
leaf (
array) – the leaf to augmentmask (
array) – the mask to augmentaug_df (
DataFrame) – the augmentation dfi (
int) – the position in the dataframe corresponding to the samplecounts (
List[int]) – a list of counts, the first number is a count of times an augmentation was accepted and the second is the count of times an augmentation was rejected.
- Return type
Tuple[DataFrame,List[int]]- Returns
None
-
src.actions.dataset.augment_dataset(lseq, mseq, **kwargs)[source]¶ Augments a dataset using the provided LeafSequence and MaskSequence. Both the LeafSequence and MaskSequence are usually created using the train folder from the dataset. The augmented files are saved in a folder called augmented at the common root folder of the leaf and mask sequence. A csv with the details of augmentation is also saved.
- Parameters
lseq (
LeafSequence) – LeafSequence object of the datasetmseq (
MaskSequence) – MaskSequence object of the dataset
- Return type
None- Returns
None