Land of the Rising Subs

by Matt Johnson  (ech0plex88@protonmail.com)

One day, as a child in the late 1980s, my father brought home a couple movies from our small Minnesota town's rental.  This was a period in my youth when I was absolutely obsessed with dinosaurs, and the movies reflected that.  They were Gammera the Invincible (1966)¹ and Legend of the Dinosaurs & Monster Birds (1977).².

Along with the films, he unknowingly brought home a strong introduction to Japanese kaiju cinema, and I was immediately hooked.  Giant monsters, loud noises, explosions, it was everything I'd wanted.  So what if their words didn't exactly fit their mouths...  Oh yeah, what was that about?

It was a short leap from these to my first Godzilla movie, Godzilla vs. Megalon (1973)³ in 1989.  By then, I understood that they were dubbed in English, but that didn't affect my appreciation of them.  This fascination with Japanese science fiction has continued to the present day.  While I still enjoy dubbed films, I have a greater appreciation for subtitles, and how much they preserve of the original audio tracks.

Once YouTube went online and the video-sharing community went global, it was easier to find Japanese-only video clips that were previously unavailable.  Despite no English language option, they were still fun to watch, and were mainly behind-the-scenes footage and bonus scenes.  The technology for self-generated subtitles wasn't there for me, so I'd note or collect these clips, and save them for later.  Discovering many "fan sub" communities working to make obscure media available for everyone, I decided to try my hand at contributions.

The first subtitle project I attempted was an edit of an existing file series.  In 2014, while stationed in Japan, I discovered Future War 198X.⁴

This 1982 animated techno thriller features detailed Cold War combat between the U.S. and U.S.S.R. and challenged the nuclear taboo by depicting graphic H-bomb destruction.  The film was only released in Australia and Europe, with the Australian version dubbed into English as a summarized narration 35 minutes shorter than the original.  On YouTube, I found someone was translating the film into 10-minute segments.  Taking the SRT subtitle files, I attempted to merge them and clarify or correct the inaccurate military details.  The original uploader stopped translating without completing the film.  It would be several years before I saw the full English-subtitled version.⁵

Years later, I watched the Sakyo Komatsu SF film Sayonara Jupiter⁶ and noticed several errors in the subtitles.  Poor grammar, incorrect timing, and missing lines - I took it as an opportunity to practice editing further.  Incidentally, Komatsu's writing was adapted into other films such as Japan Sinks (1973)⁷ and Virus (1980);⁸ all are worth watching.

On September 21, 2022, the OpenAI research organization announced the Whisper machine learning model.  Described as "an automatic speech recognition system, trained on 680,000 hours of multilingual and multitask supervised data," it is essentially speech-to-text with a translation function.⁹

Since release, it's progressed through several upgrades.  Version 2 was released on December 8, 2022.  Version 3 was released on November 6, 2023.¹⁰

Whisper offers five model sizes: Tiny, Base, Small, Medium, Large.  Accuracy increases with model size, at the cost of a slower transcription and translation.  However, this speed is dependent on hardware capabilities, and other tools to be discussed later.¹¹

Whisper's GitHub page describes an installation process via command line on Linux, Windows, and MacOS.  This is how I did it on Debian:
$ sudo apt install python3 python3-full ffmpeg pipx
$ pipx install open-ai whisper
$ pipx ensure path
I did it this way to ensure I didn't miss any Python components, but YMMV. The FFmpeg package is required for handling various media formats, and pipx lets you install and run end-user applications written in Python.

The FFmpeg tool is also useful for splitting an audio track into segments for easier translation. I've found that five minute segments are quickly processed with high accuracy. To split a single audio file into five minute segments on Debian, I opened a terminal in the audio file folder and ran this command:
$ ffmpeg -i SAMPLE.mp3 -f segment -segment_time 300 -c copy output_audio_file.mp3
With all that said, I believe there is an easier way to install and use Whisper. This is through the Whisper-Faster tool on GitHub. Created by Purfview, it is described as "Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python." In practice, it's excellent. I've processed audio samples in easily half the time of baseline Whisper, but again, YMMV.¹²

Simply download the latest release, extract it, and drop your audio file in the same folder. I run the following command in Debian to process Japanese audio:
$ ./whisper-faster --task translate --language ja --model large-v2 SAMPLE.mp3
This command will download the Large-v2 model, which is 2.9 GB.  Although Large-v3 is released, I've had accuracy issues with it, while enjoying great success with the prior version.

For all its usefulness, Whisper is imperfect.  Improved accuracy requires a larger model, and time stamps aren't always correct.  Aside from timing errors, the tool may "lose track" of what it hears, and either repeats a line of dialogue several times, or skips it entirely.  The language itself is challenging.  Slang, obscure cultural references, they all require a thorough review.  Several tools are available for editing and greatly increases subtitle quality.

VLC:  The famous media player is perfect for taking screenshots of on-screen text.  Documentaries have a lot of this.  After building a collection of screenshots, use Google Translate's Image function to complete the translation.¹³

Avidemux:  Great for simple video editing, but it excels in extracting audio tracks for processing.  While I review a video, I mark incorrect or missing translations for re-scanning.  Then, I can save a segment of audio and re-run it through Whisper quickly.¹⁴

Subtitle Composer:  The ultimate tool.  Before I found this, my early experiments involved estimating timestamps and manually merging and shifting disconnected files.  Definitely time-consuming and tedious.  Subtitle Composer opens the video and SRT file, giving you options to edit text and timing.  You can merge or separate lines, and watch a representation of the sound file to better match timing.  This tool has greatly improved both the time to complete a subtitle project, and my motivation in working through them.¹⁵

Since March 2023, I have completed 27 projects, either creating subtitles for untranslated films, or cleaning up existing translations.

Most have been "making of" documentaries for Godzilla, while others are obscure films such as Tetsurō Tamba's Spirit World trilogy.  I am particularly proud of creating subtitles for the 1987 film Tokyo Blackout,¹⁶ also based on a Sakyo Komatsu novel.  It was a challenge for years, because I'd enjoyed many of his adaptations, but that one had never been translated.

I frequently post these projects on Reddit, and they are hosted on archive.org.¹⁷

Great projects are collaborative, so I always encourage suggestions and corrections to everything I post.  AI is useful but imperfect, and I also don't speak Japanese, so at the very least I can give these projects a good head start towards the enthusiasm of other skilled fans.

Gamera, the Giant Monster

Legend of Dinosaurs & Monster Birds

Godzilla vs. Megalon

Future War 198X

Future War 198X - View Online

Bye-Bye Jupiter

Japan Sinks

Virus

Introducing Whisper

Whisper

Whisper GitHub

Whisper Standalone for Windows

VLC

Avidemux

Subtitle Composer

Tokyo Blackout - View Online

Obscure Archives  Matt Johnson's Japanese-to-English translation movie collection.

Return to $2600 Index