[RFC] TorchVision with Batteries included - Phase 1

## 🚀 Feature

_Note: To track the progress of the project check out [this board](https://github.com/pytorch/vision/projects/1)._

Add popular primitives (Losses, Schedulers, Data Augmentations, Operators etc) which are often used to reproduce SOTA references and new popular highly accurate models with pre-trained weights to TorchVision.



## Motivation

Though TorchVision currently includes many common building blocks necessary for training CV models, it currently lacks popular primitives which are often used to reproduce SOTA. Some of these primitives are part of our reference scripts (Data utils, transforms etc) because previously did not want to commit to a specific API. Others are part of libraries from the broader ecosystem. Additionally, it does not provide some of the newer, popular architectures which currently achieve good results in a variety of vision tasks.

Adding support of such primitives and models to TorchVision will give a “batteries included” experience to its users. Researchers will be able to do SOTA research and reproduce papers by using common building blocks rather than rewriting their own while industry users will be able to adapt easier the models in their domains using SOTA techniques.


## Pitch

The addition of primitives should be done in several phases, iterating between trying to reproduce SOTA recipes, identifying accuracy gaps and implementing the necessary methods to close them. The progress of this project is tracked on [this board](https://github.com/pytorch/vision/projects/1).

During phase 1, add to TorchVision the following primitives and models:

- Losses #2980:
  - [x] [LabelSmoothing Loss](https://github.com/pytorch/pytorch/issues/7455) https://github.com/pytorch/pytorch/pull/63122 
  - [x] [SoftTarget CrossEntropy](https://github.com/pytorch/pytorch/issues/11959) https://github.com/pytorch/pytorch/pull/61044
- Schedulers:
  - [x] [ChainedScheduler](https://github.com/pytorch/pytorch/pull/26423#discussion_r329976246) https://github.com/pytorch/pytorch/pull/63491 https://github.com/pytorch/pytorch/pull/63457 https://github.com/pytorch/pytorch/pull/65034
  - [x] ConstantLR and LinearLR for [warmup](https://github.com/pytorch/pytorch/pull/60836) https://github.com/pytorch/pytorch/pull/64395
  - [x] [SequentialLR](https://github.com/pytorch/vision/issues/4281) https://github.com/pytorch/pytorch/pull/64037 https://github.com/pytorch/pytorch/pull/65035
- Models #2707:
  - [x] [EfficientNet (B0 to B7)](https://github.com/pytorch/vision/issues/980) #4293
  - [x] [RegNet](https://github.com/pytorch/vision/issues/2655) #4403
  - [x] [ViT](https://github.com/pytorch/vision/issues/4593) #4594
- Data Augmentations #3817: 
  - [x] [RandAugment](https://arxiv.org/abs/1909.13719v2) #4348
  - [x] [TrivialAugment](https://arxiv.org/abs/2103.10158) #4221 
  - [x] [MixUp](https://arxiv.org/abs/1710.09412) #4379
  - [x] [CutMix](https://arxiv.org/abs/1905.04899) #4379
- Operators:
  - [x] [Stochastic Depth](https://github.com/pytorch/vision/issues/4298) #4301
  - [x] [masks_to_boxes operator](https://github.com/pytorch/vision/issues/3960) #4290
- Training Recipes:
  - [x] [EMA model support](https://github.com/pytorch/vision/issues/4346) #4381 #4406 #4408
  - [x] [Updated reference scripts](https://github.com/pytorch/vision/issues/4281) #4335 #4411 #4444 #4493

Other potential primitives to be considered during phase 2:
* [Barron loss](https://arxiv.org/pdf/1701.03077.pdf) see [classy_vision](https://github.com/facebookresearch/ClassyVision/blob/main/classy_vision/losses/barron_loss.py).
* [Augmix + JSD loss](https://arxiv.org/abs/1912.02781)
* [FastAutoAugment](https://arxiv.org/abs/1905.00397)
* [Large Scale Jitter](https://github.com/facebookresearch/detectron2/blob/main/configs/new_baselines/mask_rcnn_R_50_FPN_100ep_LSJ.py#L44)
* [AutoDropout Layer](https://arxiv.org/abs/2101.01761)
* [DropBlock Layer](https://arxiv.org/abs/1810.12890)
* [DropConnect Layer](http://yann.lecun.com/exdb/publis/pdf/wan-icml-13.pdf)
* [ShakeDrop Layer](https://arxiv.org/abs/1802.02375)
* [Random Noise LR Scheduler](https://arxiv.org/abs/1810.01322)

Note that any of the suggested primitives that are not vision-specific should be added on PyTorch, so that all Domain libraries can benefit from them.

cc @vfdev-5 @fmassa @oke-aditya @jbschlosser @iramazanli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] TorchVision with Batteries included - Phase 1 #3911

🚀 Feature

Motivation

Pitch

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] TorchVision with Batteries included - Phase 1 #3911

Description

🚀 Feature

Motivation

Pitch

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions