Description
System information
- OS version/distro: Windows 10 20H2
- .NET Version (eg., dotnet --info): 3.1.9
Introduction
I'm currently trying to use the YoloV4 model inside ML.NET but whenever I try to put an image through the model I get very strange results from the model. (It mainly finds backpacks, handbags, some apples and some other unexpected results). The image I'm using:
I created a repository to reproduce my issue as well:
https://github.com/devedse/DeveMLNetTest
Disclaimer: I'm quite new to all of this 😄
Steps I've taken
I first wanted to obtain a pre-trained model which I was sure worked so I did the following steps:
- Cloned: https://github.com/Tianxiaomo/pytorch-YOLOv4
- Installed requirements + pytorch + cuda etc.
- Downloaded the Yolov4_epoch1.pth model and put it in
checkpoints\Yolov4_epoch1.pth
- Ran
demo_pytorch2onnx.py
with debug enabled.
When you do this you can step through the code where the output from the model is parsed (this shows the labels for all boxes with a confidence higher then 0.6):
When we look up these labels, we can see that (all +1 since it's a 0-index array):
1: bicycle
8: truck
16: dog
So the results in Python seem to be correct. This python script first converts the model to an ONNX model and then uses that model to do the inference.
When I now start using the model inside C# though and try to parse through the results in exactly the same way I find completely different results:
Could it be that the model is somehow creating garbage data?
Anyway, here's the code that I'm
MLContext:
var pipeline = mlContext.Transforms.LoadImages(outputColumnName: "image", imageFolder: "", inputColumnName: nameof(ImageNetData.ImagePath))
.Append(mlContext.Transforms.ResizeImages(outputColumnName: "image", imageWidth: ImageNetSettings.imageWidth, imageHeight: ImageNetSettings.imageHeight, inputColumnName: "image"))
.Append(mlContext.Transforms.ExtractPixels(outputColumnName: "input", inputColumnName: "image"))
.Append(mlContext.Transforms.ApplyOnnxModel(
modelFile: modelLocation,
outputColumnNames: new[] { "boxes", "confs" },
inputColumnNames: new[] { "input" }
));
Code to parse model output:
public IList<YoloBoundingBox> ParseOutputs(float[] boxes, float[] confs, float threshold = .6F)
{
var boxesUnflattened = new List<BoundingBoxDimensions>();
for (int i = 0; i < boxes.Length; i += 4)
{
boxesUnflattened.Add(new BoundingBoxDimensions()
{
X = boxes[i],
Y = boxes[i + 1],
Width = boxes[i + 2] - boxes[i],
Height = boxes[i + 3] - boxes[i + 1],
OriginalStuff = boxes.Skip(i).Take(4).ToArray()
});
}
var confsUnflattened = new List<float[]>();
for (int i = 0; i < confs.Length; i += 80)
{
confsUnflattened.Add(confs.Skip(i).Take(80).ToArray());
}
var maxConfidencePerBox = confsUnflattened.Select(t => t.Select((n, i) => (Number: n, Index: i)).Max()).ToList();
var boxesNumbered = boxesUnflattened.Select((b, i) => (Box: b, Index: i)).ToList();
var boxesIndexWhichHaveHighConfidence = maxConfidencePerBox.Where(t => t.Number > threshold).ToList();
var allBoxesThemselvesWithHighConfidence = boxesIndexWhichHaveHighConfidence.Join(boxesNumbered, t => t.Index, t => t.Index, (l, r) => (Box: r, Conf: l)).ToList();
Console.WriteLine("I would expect a bike, dog and car here");
Console.WriteLine("Instead we got:");
foreach (var b in allBoxesThemselvesWithHighConfidence)
{
var startString = $"{b.Conf.Number}: {labels[b.Conf.Index]}";
Console.WriteLine($"{startString.PadRight(30, ' ')}({string.Join(",", b.Box.Box.OriginalStuff)})");
}
throw new InvalidOperationException("Everything below this doesn't work anyway");
Can someone give me some hints / ideas on why I could be running into this issue?
Also feel free to clone the repo and press F5. It should simply build / run.
https://github.com/devedse/DeveMLNetTest