Description
🚀 Feature
We currently manually handle CPU / CUDA / autograd dispatches in our wrapper function. We should instead use the dispatcher from PyTorch, which was built to do exactly that.
The work should closely follow the PR from @ezyang in #2366
Motivation
The dispatcher is a new mechanism in PyTorch that selects which kernel to run depending on properties of the input tensors that were passed. The dispatcher is thus a centralized place where cpu / cuda / autograd / autocast / quantized / xla / etc are handled.
One thing to keep an eye on is that currently we need to duplicate the input checks for both CPU and CUDA functions. This is something that @ezyang is working on in pytorch/pytorch#45277
Current support:
- nms
- roi_align
- deform_conv2d
- roi_pool
- ps_roi_align
- ps_roi_pool
Question for @ezyang : following our discussion in https://github.com/pytorch/vision/pull/2366/files#r447547554 , do you think we should be providing a fallback in PyTorch for registering ops without double backwards?