Trying to implement YoloV4 in ML.NET but running into some trouble (I'm quite new to all of this)

### System information

- **OS version/distro**: Windows 10 20H2
- **.NET Version (eg., dotnet --info)**: 3.1.9

### Introduction

I'm currently trying to use the YoloV4 model inside ML.NET but whenever I try to put an image through the model I get very strange results from the model. (It mainly finds backpacks, handbags, some apples and some other unexpected results). The image I'm using:
![dog](https://user-images.githubusercontent.com/2350015/97497833-66185d80-196b-11eb-8c96-ce7ed676cbd6.jpg)

I created a repository to reproduce my issue as well:
https://github.com/devedse/DeveMLNetTest

Disclaimer: I'm quite new to all of this :smile:

### Steps I've taken

I first wanted to obtain a pre-trained model which I was sure worked so I did the following steps:
1. Cloned: https://github.com/Tianxiaomo/pytorch-YOLOv4
1. Installed requirements + pytorch + cuda etc.
1. Downloaded the Yolov4_epoch1.pth model and put it in `checkpoints\Yolov4_epoch1.pth`
1. Ran `demo_pytorch2onnx.py` with debug enabled.

When you do this you can step through the code where the output from the model is parsed (this shows the labels for all boxes with a confidence higher then 0.6):
![image](https://user-images.githubusercontent.com/2350015/97498111-e212a580-196b-11eb-83b3-013f905eec5e.png)

When we look up these labels, we can see that (all +1 since it's a 0-index array):
```
1: bicycle
8: truck
16: dog
```

So the results in Python seem to be correct. This python script first converts the model to an ONNX model and then uses that model to do the inference.

When I now start using the model inside C# though and try to parse through the results in exactly the same way I find completely different results:
![image](https://user-images.githubusercontent.com/2350015/97498763-0622b680-196d-11eb-91b9-bb02374e7e39.png)

Could it be that the model is somehow creating garbage data?

Anyway, here's the code that I'm 

MLContext:
```
var pipeline = mlContext.Transforms.LoadImages(outputColumnName: "image", imageFolder: "", inputColumnName: nameof(ImageNetData.ImagePath))
                .Append(mlContext.Transforms.ResizeImages(outputColumnName: "image", imageWidth: ImageNetSettings.imageWidth, imageHeight: ImageNetSettings.imageHeight, inputColumnName: "image"))
                .Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input", inputColumnName: "image"))
                .Append(mlContext.Transforms.ApplyOnnxModel(
                    modelFile: modelLocation,
                    outputColumnNames: new[] { "boxes", "confs" },
                    inputColumnNames: new[] { "input" }
                    ));
```

Code to parse model output:
```
public IList<YoloBoundingBox> ParseOutputs(float[] boxes, float[] confs, float threshold = .6F)
{
	var boxesUnflattened = new List<BoundingBoxDimensions>();
	for (int i = 0; i < boxes.Length; i += 4)
	{
		boxesUnflattened.Add(new BoundingBoxDimensions()
		{
			X = boxes[i],
			Y = boxes[i + 1],
			Width = boxes[i + 2] - boxes[i],
			Height = boxes[i + 3] - boxes[i + 1],
			OriginalStuff = boxes.Skip(i).Take(4).ToArray()
		});
	}

	var confsUnflattened = new List<float[]>();
	for (int i = 0; i < confs.Length; i += 80)
	{
		confsUnflattened.Add(confs.Skip(i).Take(80).ToArray());
	}

	var maxConfidencePerBox = confsUnflattened.Select(t => t.Select((n, i) => (Number: n, Index: i)).Max()).ToList();
	var boxesNumbered = boxesUnflattened.Select((b, i) => (Box: b, Index: i)).ToList();

	var boxesIndexWhichHaveHighConfidence = maxConfidencePerBox.Where(t => t.Number > threshold).ToList();
	var allBoxesThemselvesWithHighConfidence = boxesIndexWhichHaveHighConfidence.Join(boxesNumbered, t => t.Index, t => t.Index, (l, r) => (Box: r, Conf: l)).ToList();


	Console.WriteLine("I would expect a bike, dog and car here");
	Console.WriteLine("Instead we got:");

	foreach (var b in allBoxesThemselvesWithHighConfidence)
	{
		var startString = $"{b.Conf.Number}: {labels[b.Conf.Index]}";
		Console.WriteLine($"{startString.PadRight(30, ' ')}({string.Join(",", b.Box.Box.OriginalStuff)})");
	}

	throw new InvalidOperationException("Everything below this doesn't work anyway");
```

Can someone give me some hints / ideas on why I could be running into this issue?

Also feel free to clone the repo and press F5. It should simply build / run.
https://github.com/devedse/DeveMLNetTest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Trying to implement YoloV4 in ML.NET but running into some trouble (I'm quite new to all of this) #5466

System information

Introduction

Steps I've taken

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Trying to implement YoloV4 in ML.NET but running into some trouble (I'm quite new to all of this) #5466

Description

System information

Introduction

Steps I've taken

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions