Abstract
The Internet of Things (IoT), with smart sensors, collects and generates big data streams for a wide range of applications. One of the important applications in this regard is video analytics which includes object detection. It has been considered as an important research area particularly after the development of deep neural networks. We demonstrate the applications, effectiveness, and efficiency of the convolutional neural network algorithms, i.e., Faster-RCNN and Mask-RCNN, to facilitate video analytics in the IoT domain, for overhead view multiple object detection and segmentation. We used the Faster-RCNN and Mask-RCNN models trained on the frontal view data set. To evaluate the performance of both algorithms, we used a newly recorded overhead view data set containing images of different objects having variation in field of view, background, illumination condition, poses, scales, sizes, angles, height, aspect ratio, and camera resolutions. Although the overhead view appearance of an object is significantly different as compared to a frontal view, even then the experimental results show the potential of the deep learning models by achieving the promising results. For Faster-RCNN, we achieved a true-positive rate (TPR) of 94% with a false-positive rate (FPR) of 0.4% for the overhead view images of persons, while for other objects the maximum obtained TPR is 92%. The Mask-RCNN model produced TPR of 93% with FPR of 0.5% for person images and maximum TPR of 92% for other objects. Furthermore, the detailed discussion is made on output results which highlights the challenges and possible future directions.
Original language | English |
---|---|
Article number | 8891768 |
Pages (from-to) | 5737-5744 |
Number of pages | 8 |
Journal | IEEE Internet of Things Journal |
Volume | 7 |
Issue number | 7 |
DOIs | |
Publication status | Published - Jul 2020 |
Externally published | Yes |
Keywords
- Deep neural networks
- Faster-RCNN
- Mask-RCNN
- object detection
- overhead view