Skip to content

Select subset of classes to sample from #3627

Closed
@alemelis

Description

@alemelis

🚀 Feature

When loading a dataset with ImageFolder provide an optional argument to select a subset of classes.

Motivation

I deal with large (1000+) multi-class datasets upon which I train image classifiers. However, I usually don't want to train for all the classes at the same time.

Pitch

I'd like to change find_classes function

classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())

to

classes = sorted(entry.name for entry in os.scandir(directory) 
                 if entry.is_dir() and (entry.name in allowed_classes or not allowed_classes))

where allowed_classes: Optional[str] = [] is an empty list by default but it can given to ImageFolder at initialisation time (it has to be propagated back to DatasetFolder where find_classes is used).

Alternatives

I tried to

  1. manually create a new folder structure with only the relevant classes. This gets messy quite fast as I have now multiple versions of the same folders
  2. resort to a custom dataloader filtering loaded samples after initialisation (obviously this is very slow as soon as the number of images increases)

Additional context

Here I found a discussion on the same topic.

https://discuss.pytorch.org/t/how-to-sample-images-belonging-to-particular-classes/43776/8

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions