-
Notifications
You must be signed in to change notification settings - Fork 7.1k
add prototype imagenet dataset #4640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if config.split == "train": | ||
images = HttpResource( | ||
"ILSVRC2012_img_train.tar", | ||
sha256="b08200a27a8e34218a0e58fde36b0fe8f73bc377f4acea2d91602057c3ca45bb", | ||
) | ||
else: # config.split == "val" | ||
images = HttpResource( | ||
"ILSVRC2012_img_val.tar", | ||
sha256="c7e06a6c0baccf06d8dbeb6577d71efff84673a5dbdd50633ab44f8ea0456ae0", | ||
) | ||
|
||
devkit = HttpResource( | ||
"ILSVRC2012_devkit_t12.tar.gz", | ||
sha256="b59243268c0d266621fd587d2018f69e906fb22875aca0e295b48cafaa927953", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although these files are not publicly accessible anymore, we can get away with it for now, since our download functionality is a no-op. I'll refactor this to include manual download instructions as soon as the torchdata
download API is stable-ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
path, buffer = image_data | ||
|
||
category = self.categories[label] | ||
label = torch.tensor(label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for the future: we might want to revisit if we want to store numbers as a 0d tensor or a raw number instead. It's generally much faster and smaller to rely on the raw python number, which might be a good thing if we want to minimize transfer / storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we wanted to return the labels as custom tensors, which hold the category string besides the numerical label. With that in mind, I've already wrapped the labels in torch.tensor
. Other numbers are left as is.
@property | ||
def category_to_wnid(self) -> Dict[str, str]: | ||
return self.info.extra.category_to_wnid | ||
|
||
@property | ||
def wnid_to_category(self) -> Dict[str, str]: | ||
return self.info.extra.wnid_to_category |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to cache self.info
, because otherwise we parse the categories file in every single step. I'll send a follow-up PR, because this also affects all the other datasets.
self.extra = FrozenBunch(extra or dict()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is added to enable each dataset to provide more static information beyond what the default DatasetInfo
holds. By default this will be an empty namespace.
Prototype test failures are real and will be fixed by #4668. |
Summary: * add prototype imagenet dataset * add missing checksums * fix mypy * add human readable categories * cleanup * sort categories ascending based on wnid * remove accidentally added file * cleanup category file generation * fix mypy Reviewed By: NicolasHug Differential Revision: D31916331 fbshipit-source-id: 38a598f951923342e488f0188f40c74d5b13108c
* add prototype imagenet dataset * add missing checksums * fix mypy * add human readable categories * cleanup * sort categories ascending based on wnid * remove accidentally added file * cleanup category file generation * fix mypy
cc @pmeier @bjuncek