Companies are now building products using real-time computer vision and machine learning of video from various systems and processes. This requires a full stack system consisting of video ingestion and storage, live inferencing, post-processing of data generated, and use of the data in the business context or customer’s domain language. We demonstrate a preferred architecture and stack to build such a system, the challenges & the design choices and shall have elements useful to anyone building video analytics-based systems.
We’ll talk about some possible architectures for building a real-time video analytics inferencing system. We’ll cover:
- Problem, use cases - ie object or action detection & recognition. What makes this problem different & more challenging.
- Various high level industry approaches to them - right from 3D Conv, 2 stream spatio-temporal approaches, proposal generation to recent transformer based approaches.
- Video infra - preferred protocol choices for video ingestion, storage formats, libraries to process streaming video, streaming architectures to infer on them,
- Translating the model output into the customer context product, modeling customer domain & common frameworks for them, and converting the data into insights.
- We’ll also touch on some of the scalability and implementation challenges of this stack, identifying bottlenecks, prioritizing them and designing options to solve for scalability.
The talk leverages some of the speaker’s prior experience building scalable systems at Google, building B2B applications, and Drishti.
{{ gettext('Login to leave a comment') }}
{{ gettext('Post a comment…') }}{{ errorMsg }}
{{ gettext('No comments posted yet') }}