Using the SSD-Lite and MobileNetV2 as a starting point, MobileNet-Tiny is an attempt to get a real time object detection algorithm on non-GPU computers and edge device such as Raspberry Pi. Since Raspberry Pi by itself does not have enought computing capabilites, it requires more powerful base station or cloud to process the image/video information captured and detect objects in real-time. When MobileNet-Tiny is used, it eliminates the requirement of the base station for real-time object detection.
MobileNet-Tiny network takes an RGB image of size 224 X 224 X 3 as an input and is passed through the convolution layers and Bottleneck Residual Blocks(BRB) to produce 7 X 7 X 320 feature map. This feature map along with other feature maps form BRB4, BRB5, BRB6 are then passed through SSDLite predictor layers to produce detections. These detections arefiltered by a Non-Maximum Suppression layer to producefinal detections and bounding boxes.
MobileNet-Tiny demonstrates the power of small articial neural networks with fast
non-GPU object detection capabilities.
It suggests that since batch normalization (BN)
in small neural networks increases the total number of parameters and the total number
of computations required and contribute very less in achieving high accuracy,
instead of performing batch normalization after convolution operation, Batch normalization layer
can be merged with the convolution layer to form a single layer which can drastically
increase the overall speed of the network without signicant loss of accuracy.
It also suggests that carefully optimizing the number of predictor layers and
aspect ratios for anchor boxes in SSD for small networks can result in
signicant improvement of detection speed.
MobileNet-Tiny can achieve 19.4 FPS on a Dell XPS 13. This is 3x faster than the original MobileNetV2. Our mean average precision is 52.1% on VOC 07+12 dataset and 19% on COCO dataset. You can read more about our project here and find our code here.
MobileNet-Tiny can achieve 4.5 FPS on Raspberry Pi. This is 7x faster than the original MobileNetV2 running on this device. Our mean average precision is 52.1% on VOC 07+12 dataset and 19% on COCO dataset.
California State University San Marcos
333 S. Twin Oaks Valley Rd,
San Marcos CA, 92096