Description
🐛 Describe the bug
I'm trying to generate a TensorRT engine of RetinaNet
that uses GeneralizedRCNNTransform
. By bypassing a couple of layers, it works fine for a static batch size. However, when using a dynamic batch size, I get an issue that I don't know how to fix.
In class GeneralizedRCNNTransform(nn.Module)
, there is this part in the forward
method:
for i in range(len(images)):
image = images[i]
target_index = targets[i] if targets is not None else None
if image.dim() != 3:
raise ValueError("images is expected to be a list of 3d tensors "
"of shape [C, H, W], got {}".format(image.shape))
image = self.normalize(image) # here are where the sub and div nodes comes from
image, target_index = self.resize(image, target_index) # I'm bypassing this
images[i] = image
if targets is not None and target_index is not None:
targets[i] = target_index
When converting my model with:
torch.onnx.export(
model,
example,
onnx_model_path.split('.')[0] + '_dynamic.onnx',
verbose=False,
opset_version=11,
input_names=['input'],
dynamic_axes={
'input': {0: 'batch_size'}
}
)
I get the following tree:
Which is prefectly fine for the given example
batch size (3 in this case), but this tree will not work for any other batch size once it is converted to a TensorRT engine. I get this kind of error:
[12/10/2021-13:29:40] [TRT] [E] 7: [shapeMachine.cpp::execute::565] Error Code 7: Internal Error (Split_0_0: ISliceLayer has out of bounds access on axis 0
condition '<' violated
Instruction: CHECK_LESS 1 1
)
[12/10/2021-13:29:40] [TRT] [E] 2: [executionContext.cpp::enqueueInternal::366] Error Code 2: Internal Error (Could not resolve slots: )
Is there any way I can bypass this for
loop which is creating the split
and concat
nodes? Do I need to redefine the forward method? I can't really bypass the normalisation, or else I will loose in accuracy. Can I maybe normalize it all at once without that for
loop?
Thanks 😄
Versions
Collecting environment information...
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.6 LTS (aarch64)
GCC version: (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.10.2
Libc version: glibc-2.25
Python version: 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-4.9.253-tegra-aarch64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_adv_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_cnn_train.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_infer.so.8.2.1
/usr/lib/aarch64-linux-gnu/libcudnn_ops_train.so.8.2.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.10.0
[pip3] torch2trt==0.3.0
[pip3] torchvision==0.11.1
[conda] Could not collect