Enterprise Europe Network

German AI (Artificial Intelligence) startup offers next generation video understanding AI to analyze videos on a new level and is searching for industry partners

Country of origin:
External Id: 
Last update
Expiration date


Partner keyword: 
Artificial Intelligence (AI)
Databases, Database Management, Data Mining
Imaging, Image Processing, Pattern Recognition
Information Filtering, Semantics, Statistics
Description Image/Video Computing
Radio and TV broadcasting stations
Database and file management
Artificial intelligence related software
Movies, movie products and theatre operations
Machine vision software and systems
Computer programming activities
Computer consultancy activities
Other information technology and computer service activities
Data processing, hosting and related activities
Other professional, scientific and technical activities n.e.c.


A German Artificial Intelligence startup has developed an innovative AI for video understanding. The AI combines state-of-the-art Computer Vision and Natural Language Processing to analyze video on various levels for a deep understanding of the content. The vision is to create a European standard General Purpose Technology for video understanding. The startup is looking for industry partners to conclude commercial agreements with technical assistance, research or technical agreements.



In various industries, the real problem with creating value with AI is the lack of a suitable data basis. For example, media companies produce a lot of content in complex data formats (especially video), unstructured, and not sufficiently documented with meta information. Thus, they often have no overview of their content or transparency within their archives, and many potentials remain unused. At the moment, the best practice is to use the human workforce (up to 50 archivers) to annotate videos, which is very slow, expensive, and not scalable. Still, due to the overwhelming amount of content produced, they can only annotate a small fraction of their videos manually.

This is why the AI startup, based in the South of Germany, developed a novel AI-based video analysis platform for video assets. It utilizes state-of-the-art video captioning techniques to create meaningful, semantically deep annotations to describe what is actually happening in the videos in full sentences. The platform is based on a human-in-the-loop approach, where the AI annotates the videos and the human only refines the ones, where the AI was not perfect. Based on this human input, the AI constantly learns and improves itself, which is specialized to the customer's own, individual needs.

Compared to other solutions, the Unique Selling Point (USP) is that it is not only extracting simple objects, faces, etc., but rather understanding the multi-modal context of the video, and summarizing it in multiple scenes. By taking multiple views (like image and speech) into account, the AI can describe the actual gist, and in doing so also surpasses all current state-of-the-art video captioning models. Furthermore, instead of trying to replace the human with an out-of-the-box AI, he is integrated into an innovative, AI-supported workflow, to make his tasks more natural for humans, and to let the AI continuously learn and adapt to human-level intelligence.

The startup is interested in research and technical cooperation agreements as well as in a commercial agreement with technical assistance with partners from the industry who want to apply, further develop and leverage this technology to their industry-specific use cases. Especially, it would be important that the cooperation partner provides a relevant video dataset, in the best case already annotated with relevant labels.

Advantages & innovations

Cooperation plus value: 
The solution has three innovative USPs: 1. Increased efficiency through semi-automation: The human-in-the-loop approach combines the best of both worlds: high-quality human assessment with scalable AI automation. 2. Improved quality through context understanding: The AI leverages both visual and audio information to understand the video content better than any other AI model. 3. Fast adaptivity through continual learning: The AI continuously adapts to individual needs, and the processes get more and more efficient over time. This combination of USPs enables five distinct advantages: 1. Accelerated, scalable video tagging processes: The AI can accelerate the process close to 1:1 – resulting in a 10x faster tagging process. This frees up the time of the human labeler to let him focus on more important work. 2. Transparency over video archive: One can easily scale up the AI and apply it up to the whole video archive, leveraging the other 90% of created content and bringing a new level of transparency, which allows searching any piece of content effectively. 3. Cost reduction for content creation: With a well-annotated video archive, they can easily find the perfect scene or clip to re-use and enhance content creation. Also, they can create new products and services, like automatic video descriptions for visually impaired people. 4. Sustainable and increasing AI capabilities: The AI gets better on a daily basis by continuously watching and learning from the users. This is crucial in the fast-paced world of content creation and keeps raising the number of videos that can be handled. 5. Network effect between organizations: The AI can share learnings between different users, departments, and even whole organizations, to create synergies with the combined knowledge of everyone - without sharing their actual content.

Stage of development

Cooperation stage dev stage: 
Available for demonstration

Partner sought

Cooperation area: 
The vision is to create a GPT (General Purpose Technology) and set the European standard for general video understanding, similarly like GPT-3 is doing for Natural Language Processing. In order to achieve this, the startup is currently looking for collaboration partners from the industry who are interested in solving their video management, analytics, and tagging challenges through Artificial Intelligence. Industries and application fields can be diverse, for example: - Film and television, for structured archiving - Social media for content moderation - News for accelerated post-production - Sports for statistical analysis - Production for quality assurance - Market research for video interviews - Security for surveillance cameras - and any industry use case that is handling a lot of video content.. For a future collaboration, a research or a technical cooperation agreement is foreseen. But also a Commercial agreement with technical assistance could be possible.

Type and size

Cooperation task: 
>500 MNE,251-500