My homeland of Trinidad and Tobago is known as “The Land of the Hummingbird”. So I decided awhile ago to try to build an image classifier for the 17 recorded species of hummingbirds found there.
The problem was (as usual) data, since finding such a specialised collection of data would mean curating these manually, and even then, the dataset was nowhere large enough to work with Deep Learning’s large dataset requirements for highly successful classifiers.
However, in Lesson 2 Jeremy introduced a downloads notebook, with a very quick method for building specialised image datasets (legally) using Google Image Search. So with this problem solved, I adapted the sample notebook initially for just 3 species and was getting about 25% error rate. This was a little high, but was about the same when compared to my results in a notebook using the Birds Species dataset so still I thought things were looking promising.
However this error rate got worse to about 37% when I was up to 9 species. This notebook shows this run with unpruned data from Google for hummingbird species.
An exploration of the data showed that the errors were in the images Google was retrieving. It was not that these images were not hummingbird images, but that the wrong species type was being retrieved from Google Images for the keyword searched.
The Lesson 2 notebook also introduced a helpful FileDelete tool for pruning unwanted images out of the Google retrieved data, but this wasn’t quite apt for pruning in this case as it only showed the image, not how it was classified, so a manual pruning step was needed.
Once the data was pruned, I also sub-divided species into male and females for some species where there were clear distinguishing characteristics between the two genders. So I ended up with 14 categories with pruned data and got back to about 25% error rate. This notebook shows pruned data with 14 categories of hummingbird by species and in some cases gender.
After seeing @simonw post and exploring his source code I pushed my own model into a similar docker image and deployed it to an azure website at https://hummingbirds.azurewebsites.net/ if anyone wants to try it out for themselves.
Hopefully as I explore more I can pretty up the UI for results a bit, and add the remaining species as I build the pruned training dataset over time.
Thanks to @sparalic for inspiring me to have enough confidence to share my work too as she shared hers.