seconv: VobSub OCR + --time-codes-only for image-based subtitles by niksedk · Pull Request #11629 · SubtitleEdit/subtitleedit

niksedk · 2026-06-15T02:53:43Z

Addresses the request in #10068: a way to produce a timing-only output file from image-based subtitles without a full OCR.

`--time-codes-only`

Extracts time codes from image-based sources into any text format, skipping OCR entirely — each entry keeps its timing with empty text, and no OCR engine is created, so it works without Tesseract/Paddle/nOCR/etc. installed.

seconv movie.sup subrip --time-codes-only

1
00:00:01,000 --> 00:00:03,500

2
00:00:04,000 --> 00:00:06,200

The empty-text output re-opens cleanly in Subtitle Edit itself (verified against the actual SubRip and AdvancedSubStationAlpha parsers — both detect the format and reload all cues with timing intact). A few stricter third-party players may drop empty cues; switching the placeholder to e.g. - would be a one-line change if that's ever wanted.

VobSub wired into the text/OCR pipeline

VobSub previously errored with "use the Subtitle Edit UI for now". It's now supported in seconv for both full OCR and --time-codes-only, reusing the existing VobSub bitmap decoder:

.sub + .idx pairs (text target)
VobSub-in-MKV (S_VOBSUB)
VobSub-in-MP4 (handler subp)

seconv movie.sub subrip --ocr-engine:tesseract --ocr-language:eng   # .idx auto-detected
seconv movie.mkv subrip --time-codes-only                          # PGS + VobSub tracks, no OCR

`.sub` routing fix

A binary VobSub .sub with no .idx companion is now detected via its MPEG pack header (00 00 01 BA) and read directly — VobSubParser.OpenSubIdx already falls back to the stream's own PTS timing with a default palette — emitting a note rather than failing or being misparsed as MicroDVD. A genuine text MicroDVD .sub still routes to the text loader. (Without the .idx, colors use a default palette so OCR accuracy may be slightly lower; timing is accurate and --time-codes-only is unaffected.)

Tests

TimeCodesOnlyTest — .sup → SRT with timing and no recognised text, no OCR engine needed.
ContainerLoaderTest — replaced the old (CI-skipped) "OCRs PGS and skips VobSub" test with a deterministic --time-codes-only test proving both the PGS and VobSub tracks in container_image.mkv now convert.
VobSubRoutingTest — binary-vs-text .sub detection, and a MicroDVD .sub (no .idx) still converting as text.

163 tests pass, 0 skipped.

Note (out of scope here)

While testing I found a pre-existing latent bug: with --overwrite, two same-language tracks in one container resolve to the same output filename and the second silently clobbers the first (the track-number disambiguation only runs when !Overwrite). It affects text tracks too and is now easier to hit since VobSub tracks are no longer skipped. Happy to fix in a follow-up — the clean fix is to track output paths written within a single run and disambiguate even under --overwrite.

🤖 Generated with Claude Code

Add a --time-codes-only flag to seconv that extracts time codes from image-based subtitles into a text format without OCR: each entry keeps its timing with empty text and no OCR engine is created, so it works without Tesseract/Paddle/etc. installed. Verified that SE re-opens the resulting empty-text SRT/ASSA files (timing preserved). Wire VobSub into the text/OCR pipeline (previously "use the UI"): - .sub + .idx pairs (text target) - VobSub-in-MKV (S_VOBSUB) - VobSub-in-MP4 (handler subp) Both full OCR and --time-codes-only are supported for all of these, reusing the existing VobSub bitmap decoder. Fix .sub routing: a binary VobSub .sub with no .idx companion is now detected (MPEG pack header) and read directly (stream PTS timing + default palette, with a note) instead of falling through to the MicroDVD text loader; a genuine text MicroDVD .sub still routes to the text loader. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

niksedk merged commit 770c662 into main Jun 15, 2026
1 of 3 checks passed

niksedk deleted the feature/seconv-vobsub-ocr-and-timecodes-only branch June 15, 2026 02:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

seconv: VobSub OCR + --time-codes-only for image-based subtitles#11629

seconv: VobSub OCR + --time-codes-only for image-based subtitles#11629
niksedk merged 1 commit into
mainfrom
feature/seconv-vobsub-ocr-and-timecodes-only

niksedk commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

niksedk commented Jun 15, 2026

--time-codes-only

VobSub wired into the text/OCR pipeline

.sub routing fix

Tests

Note (out of scope here)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`--time-codes-only`

`.sub` routing fix