add prototype imagenet dataset #4640

pmeier · 2021-10-18T08:36:19Z

pmeier · 2021-10-18T08:39:11Z

torchvision/prototype/datasets/_builtin/imagenet.py

+        if config.split == "train":
+            images = HttpResource(
+                "ILSVRC2012_img_train.tar",
+                sha256="b08200a27a8e34218a0e58fde36b0fe8f73bc377f4acea2d91602057c3ca45bb",
+            )
+        else:  # config.split == "val"
+            images = HttpResource(
+                "ILSVRC2012_img_val.tar",
+                sha256="c7e06a6c0baccf06d8dbeb6577d71efff84673a5dbdd50633ab44f8ea0456ae0",
+            )
+
+        devkit = HttpResource(
+            "ILSVRC2012_devkit_t12.tar.gz",
+            sha256="b59243268c0d266621fd587d2018f69e906fb22875aca0e295b48cafaa927953",
+        )


Although these files are not publicly accessible anymore, we can get away with it for now, since our download functionality is a no-op. I'll refactor this to include manual download instructions as soon as the torchdata download API is stable-ish.

torchvision/prototype/datasets/_builtin/imagenet.categories

fmassa

Thanks!

fmassa · 2021-10-18T12:37:13Z

torchvision/prototype/datasets/_builtin/imagenet.py

+        path, buffer = image_data
+
+        category = self.categories[label]
+        label = torch.tensor(label)


nit for the future: we might want to revisit if we want to store numbers as a 0d tensor or a raw number instead. It's generally much faster and smaller to rely on the raw python number, which might be a good thing if we want to minimize transfer / storage.

IIRC, we wanted to return the labels as custom tensors, which hold the category string besides the numerical label. With that in mind, I've already wrapped the labels in torch.tensor. Other numbers are left as is.

pmeier · 2021-10-19T06:56:05Z

torchvision/prototype/datasets/_builtin/imagenet.py

+    @property
+    def category_to_wnid(self) -> Dict[str, str]:
+        return self.info.extra.category_to_wnid
+
+    @property
+    def wnid_to_category(self) -> Dict[str, str]:
+        return self.info.extra.wnid_to_category


We need to cache self.info, because otherwise we parse the categories file in every single step. I'll send a follow-up PR, because this also affects all the other datasets.

pmeier · 2021-10-19T06:57:40Z

torchvision/prototype/datasets/utils/_dataset.py

+        self.extra = FrozenBunch(extra or dict())
+


This is added to enable each dataset to provide more static information beyond what the default DatasetInfo holds. By default this will be an empty namespace.

imagenet.categories

pmeier · 2021-10-20T13:10:44Z

Prototype test failures are real and will be fixed by #4668.

…into datasets/imagenet

Summary: * add prototype imagenet dataset * add missing checksums * fix mypy * add human readable categories * cleanup * sort categories ascending based on wnid * remove accidentally added file * cleanup category file generation * fix mypy Reviewed By: NicolasHug Differential Revision: D31916331 fbshipit-source-id: 38a598f951923342e488f0188f40c74d5b13108c

* add prototype imagenet dataset * add missing checksums * fix mypy * add human readable categories * cleanup * sort categories ascending based on wnid * remove accidentally added file * cleanup category file generation * fix mypy

add prototype imagenet dataset

f7149c9

pmeier added module: datasets new feature prototype labels Oct 18, 2021

pytorch-probot bot added the ciflow/default label Oct 18, 2021

facebook-github-bot added the cla signed label Oct 18, 2021

pmeier added 2 commits October 18, 2021 10:36

add missing checksums

7c390f4

Merge branch 'main' into datasets/imagenet

21a2619

pmeier commented Oct 18, 2021

View reviewed changes

fix mypy

30e3bd5

datumbox mentioned this pull request Oct 18, 2021

Make the labels of imagenet unique #4641

Merged

fmassa approved these changes Oct 18, 2021

View reviewed changes

pmeier added 3 commits October 19, 2021 08:47

add human readable categories

fd28b9f

cleanup

d509098

Merge branch 'main' into datasets/imagenet

e8ed231

pmeier commented Oct 19, 2021

View reviewed changes

pmeier requested a review from datumbox October 19, 2021 07:00

sort categories ascending based on wnid

0cc08b6

datumbox reviewed Oct 19, 2021

View reviewed changes

imagenet.categories Outdated Show resolved Hide resolved

pmeier added 5 commits October 19, 2021 10:42

remove accidentally added file

2dabcc2

Merge branch 'main' into datasets/imagenet

4acf89a

cleanup category file generation

c6d2b78

fix mypy

c021ec4

Merge branch 'main' into datasets/imagenet

1e8fac8

pmeier added 2 commits October 21, 2021 11:57

Merge branch 'main' into datasets/imagenet

6df6c78

Merge branch 'datasets/imagenet' of https://github.com/pmeier/vision …

ce4c225

…into datasets/imagenet

pmeier merged commit 58f313b into pytorch:main Oct 21, 2021

pmeier deleted the datasets/imagenet branch October 21, 2021 10:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add prototype imagenet dataset #4640

add prototype imagenet dataset #4640

Uh oh!

pmeier commented Oct 18, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

pmeier Oct 18, 2021

Uh oh!

Uh oh!

fmassa left a comment

Uh oh!

fmassa Oct 18, 2021

Uh oh!

pmeier Oct 19, 2021

Uh oh!

pmeier Oct 19, 2021

Uh oh!

pmeier Oct 19, 2021

Uh oh!

Uh oh!

pmeier commented Oct 20, 2021 •

edited

Loading

Uh oh!

Uh oh!

add prototype imagenet dataset #4640

add prototype imagenet dataset #4640

Uh oh!

Conversation

pmeier commented Oct 18, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier Oct 18, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa Oct 18, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 19, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier commented Oct 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pmeier commented Oct 18, 2021 •

edited by pytorch-probot bot

Loading

pmeier commented Oct 20, 2021 •

edited

Loading