In the recent past I have been doing some work related to automatic video annotation. Videos that you and I take can be annotated with data about the contents of the video. The contents of the video can mean: objects, their types, their shape, background scene (moving or static), number of objects, static and in-motion objects, color of objects etc. One would like to keep a track of objects as the video progresses. Tracking helps in knowing when an object appeared in the scene and when it disappeared. All of the prior work on automatic video annotation is not really completely automatic ,  etc.. They are semi-automatic at best and manual input and control is still required when annotating using these methods.
While doing this work, I developed a better understanding of some of the so called “automatic object tracking for surveillance” solutions out there in the market. None of these solutions can ensure a complete hands-off scenario for humans. Humans still need to be involved and there are reasons for that. At the same time, it is also possible to do everything in cloud (including human interaction) and claim it as “hands off for a user”. In this case, it is simply that the client is paying someone else to provide the service. It is not a stand-alone autopilot kind of system installed in a user’s premises. Real automatic video annotation is extremely hard, especially when the scene can change without any guarantees. If we add “video analytics” i.e. ability to analyse the video automatically to detect a certain set of activities, it again becomes very difficult to propose a general solution. So, assumptions are again made and these can be based on user requirements or can be domain specific (say tennis video analytics at Wimbledon). Here is a system which may be of interest to you: IBM’s Digital Video Surveillance Service and a few others described in the paper titled “Automated visual surveillance in realistic scenarios“.
Most of the research work makes certain assumptions either about the scenes or about the methods they use. These assumptions simply fail in real world scenarios. These methods may work under a “restricted real world view” made using a set of assumptions, but when assumptions fail, these methods become limited in applicability.
I believe this is a critical issue that many researchers who want to translate their work into usable products have to understand. This is where both strong theoretical and practical foundations in a discipline are needed: theory gives the methods and the tools, engineering tells you what can/cannot be done and the two can interact back and forth.