Introducing: The History of China Auto-Translation Project



Context

The History of China (中国通史) Auto-Translation Project is an independent effort to translate the eponymous 100-part documentary on Chinese history.

The series is published by CCTV6, a Chinese movie channel, and covers the comprehensive history of Chinese civilization from the ancient era up until the 1911 Revolution. At 45 minutes per episode, it is currently the most ambitious Chinese historical documentary ever produced.

A screenshot from a History of China episode

A screenshot from a History of China episode

Although the first episode includes English subtitles, the remaining 99 episodes are presented in Modern Standard Chinese only. Only 45 out of 4,500 minutes of content is accessible to non-Chinese speakers.

The Project

Over the past several months, I built a machine learning-driven engine to automatically translate the History of China series. The goal is to make this series accessible to anyone who is interested in Chinese history.

Dashboard and video on https://hoc.yifanchen.io/

Dashboard and video on https://hoc.yifanchen.io/

Behind the scenes, the translation system leverages computer vision to capture, parse, and translate Chinese subtitles to English. The underlying technical components include FFmpeg, OpenCV, and Tesseract.

The video platform features a catalogue of the series and a video player displaying the translated subtitles in real-time for each episode.

Today, the translations are comprehendible, but far from perfect. There is still much work to be done in fine-tuning the auto-translation model.

Ongoing Efforts

Improving this project is an ongoing effort to deliver the best viewer experience. From refining the auto-translation system to building new player features, there remains many ways to enhance the audience experience.

Working on the project has been an exercise in problem solving, technical self-study, and creating a great user experience. As the project continues, I plan to publish more posts to further expand upon the underlying technical details.

Website

History of China (中国通史) Auto-Translation Project

Contact

If you have feedback or are interested in how it works, contact me at yifan@yifanchen.io.