Description
🚀 Feature
When loading a dataset with ImageFolder
provide an optional argument to select a subset of classes.
Motivation
I deal with large (1000+) multi-class datasets upon which I train image classifiers. However, I usually don't want to train for all the classes at the same time.
Pitch
I'd like to change find_classes
function
vision/torchvision/datasets/folder.py
Line 61 in 20a771e
to
classes = sorted(entry.name for entry in os.scandir(directory)
if entry.is_dir() and (entry.name in allowed_classes or not allowed_classes))
where allowed_classes: Optional[str] = []
is an empty list by default but it can given to ImageFolder
at initialisation time (it has to be propagated back to DatasetFolder
where find_classes
is used).
Alternatives
I tried to
- manually create a new folder structure with only the relevant classes. This gets messy quite fast as I have now multiple versions of the same folders
- resort to a custom dataloader filtering loaded samples after initialisation (obviously this is very slow as soon as the number of images increases)
Additional context
Here I found a discussion on the same topic.
https://discuss.pytorch.org/t/how-to-sample-images-belonging-to-particular-classes/43776/8
cc @pmeier