Seeking Advice: Training a Novel Review Assistant Model for Web Fiction

SmartBotCat · February 26, 2025, 7:05am

Hello Hugging-face community,

I work as an editor for web novels and I’m looking to developing an AI assistant to help with my daily review tasks. I hope to get some advice on model selection and training approaches.

My Use Case

As an editor, I need to review new chapter updates to ensure they:

Have no grammatical errors
Comply with regulations and laws
Don’t promote inappropriate values or content.

I have a substantial dataset of novel passages with my review annotations and suggested modifications from my previous work.

My Plan

I’m considering fine-tuning a base model through Supervised Fine-Tuning (SFT) to create a specialized model for web novel content review that can help improve my workflow efficiency.

Questions

Is this approach feasible given my use case?
Which base models would you recommend for Chinese text review tasks?
Any suggestions on training methodology or alternative approaches?
How much data would I likely need for effective fine-tuning?
Any particular challenges I should be aware of when training for content review tasks?

I’d greatly appreciate any insights or recommendations from the community!

Thank you!

John6666 · February 26, 2025, 3:40pm

I don’t know anything about the training itself, so I’ll leave it to someone else.

Is this approach feasible given my use case?

I think so. In cases where the performance of the execution environment is extremely limited, it may be necessary to save resources by inserting lightweight normal programs or text classification models, etc. However, if it is possible to do with just one adjusted LLM and a simple agent, I think that is more reliable and easier.

Which base models would you recommend for Chinese text review tasks?

You can get a relatively accurate idea of performance for various tasks by looking at benchmarks such as the leaderboard, but to put it crudely, I recommend the Qwen 2.5 series. The basic education is really good, and it listens to commands relatively well.
DeepSeek is also good, of course, but the Qwen series has been trained by users on Hugging Face for a long time, so there are various variations on the hub, and there must be educational know-how on the internet. Perhaps there is a lot of information, especially if you search in Chinese. I can’t read it without translation…
I think it would be helpful to look for the Qwen that has been trained by everyone on the hub.

Topic		Replies	Views
Guidance on getting started with fine tuned uncensored model Beginners	2	1066	March 8, 2025
Seeking Advice: Developing an Open-Source AI Model for Semantic Analysis and Grading of Textual Responses Beginners	0	452	November 14, 2023
Total beginner on how to use a model exactly Beginners	0	437	July 25, 2023
Help with autotrain/LLM finetuning please Beginners	3	2141	August 11, 2023
Fine-Tuning Help for Personal Project Beginners	1	64	March 28, 2025

Seeking Advice: Training a Novel Review Assistant Model for Web Fiction

My Use Case

My Plan

Questions

Related topics