Seeking Advice: Training a TikTok Video Quality Assessment Model Inspired by Deepseek R1

SmartBotCat · February 26, 2025, 8:59am

Background

I’m a content creator who produces short videos. To improve my craft, I regularly study high-quality videos to learn techniques and gain inspiration. The short video industry is extremely competitive, and I’m looking to using AI technology to enhance my creative efficiency and video quality.

My Goals

Quality Assessment: I need an AI model that can identify high-quality TikTok videos and explain what makes them excellent (innovative filming techniques, interesting narrative structures, high visual quality, etc.)
Content Potential Recognition: I want the model to automatically identify high-potential content elements within videos, helping me quickly filter out the most valuable creative materials.

Current Resources

I have watched and manually annotated numerous videos, marking the reasons why certain videos are considered high quality.

Proposed Approach

I’m interested in developing a model inspired by Deepseek R1’s reasoning capabilities to evaluate TikTok videos. This model would need reflective and reasoning abilities since video quality standards aren’t strictly quantifiable. It should provide multi-dimensional evaluations covering aspects such as:

Content themes
Visual effects
Narrative structure
Audience engagement techniques
Emotional resonance
And more

Questions

What would be the most effective architecture for such a model?
How should I structure my training data?
What evaluation metrics would be appropriate?
Are there existing models I could fine-tune rather than build from scratch?
What technical challenges should I anticipate?
How much labeled data would I need for reasonable performance?

I appreciate any insights, suggestions, or references to similar projects. Thank you!

John6666 · February 28, 2025, 12:41pm

What would be the most effective architecture for such a model?

This is also the case when using LLM for analysis, such as DeepSeek-R1, but you will need to convert the video and audio information into text first. To convert audio into text, you can use ASR models such as Whisper. There are many examples using YouTube, etc., so I think this will be helpful.

For analyzing and converting video footage to text, you can use video-compatible VLM, etc. It is possible to have VLM itself create descriptions to a certain extent, but there are not many models that can analyze video, so I don’t know much about them either.

Anyway, I recommend exploring Spaces to collect models that you can use, and finding parts and ready-made products that you can use first. Create, adjust, and assemble the missing parts.

Topic		Replies	Views
Proposal: AI-Powered Video Generation from Single Images Using a Comprehensive Model Zoo Research	0	393	May 15, 2024
Fine-tuning "reasoning" models Models	1	1184	January 23, 2025
AI Deep Fake video analyzer Beginners	2	514	November 21, 2024
How To Fine-Tune Models for Better NSFW AI Detection? Beginners	7	2416	January 23, 2025
Best model for file scan and personality Models	1	80	March 14, 2025