Deploying Multi Camera Multi Player Detection and Tracking Systems in Top View

Swetanjal Murati Dutta

Abstract

Object Detection and Tracking computer vision algorithms have remarkably progressed in the past few years, owing to the rapid progress in the field of deep learning. These algorithms find numerous applications in surveillance, robotics, autonomous driving, sports analytics, and human-computer interaction. They achieve near-perfect performance on the benchmark datasets they have been evaluated on. However, deploying such algorithms in real-world poses a large number of challenges, and they do not work as well as we expect them to work by looking at their performance on benchmark datasets. In this thesis, we present details of deploying one such Multi Camera Multi-Object Detection and Tracking system to track players in the bird’s eye (top) view in a sports event, specifically in the game of cricket. Our system is able to generate the top-view map of the fielders from cameras placed on crowd stands of a stadium, making it extremely cost-effective compared to using spider cameras. We present details on the challenges and hurdles we faced while deploying our systems and the solutions we designed to overcome these challenges. Ultimately, we tailored a neatly engineered end-to-end system that went live in the Asia Cup of 2022. Deploying such a multi-camera detection and tracking system poses many challenges. The first of them is related to camera placement. We had to devise strategies to place cameras optimally so that all the objects of interest could be captured by one or more of the cameras placed, at the same time ensuring that they do not appear so small that it becomes difficult for a detector to localize them. Constructing the bird’s eye view of the detected players required camera calibration. In the real world, we may not find reliable point correspondences to calibrate a camera. Even if we might find point correspondences, sometimes they may be really hard to see to accurately calibrate a camera. Detecting players in a setup like this proved to be really challenging because the objects of interest appeared very small in the camera views. Moreover, since the detections were coming from multiple cameras, algorithms had to be devised to associate them correctly across camera views. Tracking in a setting like this was even harder because we had to rely only on motion cues to track the objects of interest. Appearance features did not add any value because of the fact all the players being tracked wore jerseys of the same color. Each of these challenges became even more, harder to solve because of the real-time requirements of our use case. Finally, setting up the hardware architecture to receive the real-time live feeds from each camera in a synchronized manner and implementing a fast communication protocol for transmitting data between the various system components required careful design choices to be made, all of which have been presented in detail in this thesis.

Year of completion:	November 2023
Advisor :	Vineet Gandhi