Description
🐛 Describe the bug
*box_iou()
functions should return a matrix of results for every possible pair (box1, box2), where box1 is a box from boxes1
and box2 is a box from boxes2
. box_iou()
and generalized_box_iou()
work this way, i.e. if boxes1
is an Nx4 matrix and boxes2
is an Mx4 matrix, the result is an NxM matrix. The recently added distance_box_iou()
and complete_box_iou()
don't work if there aren't as many boxes in boxes1
and boxes2
.
import torch
from torchvision.ops import box_iou, generalized_box_iou, distance_box_iou, complete_box_iou
N = 5
M = 6
boxes1 = torch.rand((N, 4))
boxes2 = torch.rand((M, 4))
print(box_iou(boxes1, boxes2).shape) # torch.Size([5, 6])
print(generalized_box_iou(boxes1, boxes2).shape) # torch.Size([5, 6])
print(distance_box_iou(boxes1, boxes2).shape) # RuntimeError
print(complete_box_iou(boxes1, boxes2).shape) # RuntimeError
When running the above code, distance_box_iou()
and complete_box_iou()
will fail with a RuntimeError
. The output is below:
torch.Size([5, 6])
torch.Size([5, 6])
Traceback (most recent call last):
File ".../test.py", line 10, in <module>
print(distance_box_iou(boxes1, boxes2).shape) # RuntimeError
File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 361, in distance_box_iou
diou, _ = _box_diou_iou(boxes1, boxes2)
File ".../lib/python3.9/site-packages/torchvision/ops/boxes.py", line 378, in _box_diou_iou
centers_distance_squared = (_upcast(x_p - x_g) ** 2) + (_upcast(y_p - y_g) ** 2)
RuntimeError: The size of tensor a (5) must match the size of tensor b (6) at non-singleton dimension 0
This is not caught by the unit tests, because there's no such case where there's a different number of boxes in the two sets.
The problem is in _box_diou_iou()
. It looks like iou
and diagonal_distance_squared
are calculated for every possible pair (by adding an empty dimension), but centers_distance_squared
is not.
As a side note, I personally feel it's confusing that these functions produce the output for every possible pair. By convention, PyTorch functions produce element-wise results. For example, torch.add(boxes1, boxes2)
only works if boxes1
and boxes2
contain the same number of boxes. If you want a pair-wise addition, you can easily call torch.add(boxes1[:, None, :], boxes2)
. The fact that *box_iou()
functions produce pair-wise results makes the implementation complicated. And the only way to get element-wise results is calling box_iou(boxes1, boxes2).diagonal()
, which is inefficient.
Versions
PyTorch version: 1.12.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.4 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1650 Ti with Max-Q Design
Nvidia driver version: 516.59
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.950
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.3
[pip3] pytorch-lightning==1.6.5
[pip3] pytorch-lightning-bolts==0.2.5
[pip3] pytorch-quantization==2.1.2
[pip3] torch==1.12.0+cu113
[pip3] torchmetrics==0.6.0
[pip3] torchtext==0.12.0
[pip3] torchvision==0.13.0+cu113
[conda] Could not collect