diff options
-rw-r--r-- | captioning.md | 141 |
1 files changed, 64 insertions, 77 deletions
diff --git a/captioning.md b/captioning.md index 895e732b..9a85080c 100644 --- a/captioning.md +++ b/captioning.md @@ -7,12 +7,12 @@ easier to understand and search. If you see a talk that you'd like to caption, feel free to download it and start working on it with your favourite subtitle editor. Let me know what you pick by e-mailing me at <sacha@sachachua.com> so that I -can update the index and try to avoid duplication of work. [Find talks that need captions here](https://emacsconf.org/help_with_main_captions). You can also help by [adding chapter markers to Q&A sessions](https://emacsconf.org/help_with_chapter_markers). +can update the backstage index and try to avoid duplication of work. [Find talks that need captions here](https://emacsconf.org/help_with_main_captions). You can also help by [adding chapter markers to Q&A sessions](https://emacsconf.org/help_with_chapter_markers). You're welcome to work with captions using your favourite tool. We've been using <https://github.com/sachac/subed> to caption things as VTT or SRT in Emacs, often starting with autogenerated captions from -OpenAI Whisper (the .vtt). +OpenAI Whisper or WhisperX (the .vtt file backstage). We'll be posting VTT files so that they can be included by the HTML5 video player (demo: <https://emacsconf.org/2021/talks/news/>), so if @@ -26,59 +26,13 @@ You can e-mail me the subtitles when you're done, and then I can merge it into the video. You might find it easier to start with the autogenerated captions -and then refer to any resources provided by the speaker in order to +and then refer to the video or any resources provided by the speaker in order to figure out spelling. Sometimes speakers provide pretty complete scripts, which is great, but they also tend to add extra words. -# Reflowing the text - -First, let's start with reflowing. We like to have one line of -captions about 60 characters long so that they'll display nicely in -the stream. If the captions haven't been reflowed yet, you can reflow -the captions at natural pausing points (ex: phrases) so that they're -displayed nicely. You don't have to worry too much about getting the -timestamps precisely. - -For example, instead of: - -- so i'm going to talk today about a -- fun rewrite i did of uh of the bindat -- package - -you can edit it to be more like: - -- So I'm going to talk today -- about a fun rewrite I did -- of the bindat package. - -You probably don't need to do this step if you're working with the VTT -files in the backstage area, since we try to reflow things before -people edit them, but we thought we'd demonstrate it in case people -are curious. - -We start with the text file that OpenAI Whisper generates. We set my -`fill-column` to 50 and use `display-fill-column-indicator-mode` to -give myself a goal column. A little over is fine too. Then we use -`emacsconf-reflow` from the -[emacsconf-el](git.emacsconf.org/emacsconf-el/) repository to quickly -split up the text into captions by looking for where we want to add -newlines and then typing the word or words. We type in ' to join lines. -Sometimes, if it splits at the wrong one, we just undo it and edit it -normally. - -It took about 4 minutes to reflow John Wiegley's 5-minute presentation. - -<video src="https://media.emacsconf.org/reflowing.webm" controls=""></video> - -The next step is to align it with -[aeneas](https://github.com/readbeyond/aeneas) to get the timestamps -for each line of text. `subed-align` from the subed package helps with that. - -<video src="https://media.emacsconf.org/alignment.webm" controls=""></video> - # Edit the VTT to fix misrecognized words -The next step is to edit these subtitles. VTT files are plain text, so +The first step is to edit misrecognized words. VTT files are plain text, so you can edit them with regular `text-mode` if you want to. If you're editing subtitles within Emacs, [subed](https://github.com/sachac/subed) can conveniently synchronize @@ -88,7 +42,7 @@ filename, but if it can't find it, you can use `C-c C-v` (`subed-mpv-find-media`) to play a file or `C-c C-u` to play a URL. Look for misrecognized words and edit them. We also like to change -things to follow Emacs keybinding conventions. We sometimes spell out +things to follow Emacs keybinding conventions (C-c instead of Control C). We sometimes spell out acronyms on first use or add extra information in brackets. The captions will be used in a transcript as well, so you can add punctuation, remove filler words, and try to make it read better. @@ -98,23 +52,58 @@ use `M-j` (`subed-jump-to-current-subtitle`) to jump to the caption if I'm not already on it, listen for the right spot, and maybe use `M-SPC` to toggle playback. Use `M-.` (`subed-split-subtitle`) to split a caption at the current MPV playing position and `M-m` -(`subed-merge-with-next`) to merge a subtitle with the next one. Times -don't need to be very precise. If you don't understand a word or -phrase, add two question marks (`[??]`) and move on. We'll ask the -speakers to review the subtitles and can sort that out then. +(`subed-merge-with-next`) to merge a subtitle with the next one. + +If you don't understand a word or phrase, add two +question marks (`[??]`) and move on. We'll ask the +speakers to review the subtitles and can sort that +out then. + +If there are multiple speakers, you can indicate switches between speakers +with a `[speaker-name]:` tag, or just leave it plain. -If there are multiple speakers, indicate switches between speakers -with a `[speaker-name]:` tag. <video src="https://media.emacsconf.org/editing.webm" controls=""></video> Once you've gotten the hang of things, it might take between 1x to 4x the video time to edit captions. +# Subtitle timing + +Times don't need to be very precise. If you notice +that the times are way out of whack and it's +getting in the way of your subtitling, we can +adjust the times using the [aeneas forced +alignment tool](https://www.readbeyond.it/aeneas/ +and `subed-align`). + +## Splitting and merging subtitles + +If you want to split and merge subtitles, you can +use `M-.` (`subed-split-subtitle`) and `M-m` +(`subed-merge-dwim`). If the playback position is +in the current subtitle, splitting will use the +playback position. If it isn't, it will guess an +appropriate time based on characters per second +for the current subtitle. + +## Splitting with word-level timing data + +If there is a `.json` or `.srv2` file with +word-level timing data, you can load it with +`subed-word-data-load-from-file` from +`subed-word-data.el` in the subed package. You can +then split with the usual `M-.` +(`subed-split-subtitle`), and it should use +word-level timestamps when available. + # Playing your subtitles together with the video -To load a specific subtitle file in MPV, use the `--sub-file=` or -`--sub-files=` command-line argument. +MPV should automatically load subtitle files if +they're in the same directory as the video. To +load a specific subtitle file in MPV, you can use +the `--sub-file=` or `--sub-files=` command-line +argument. If you're using subed, the video should autoplay if it's named the same as your subtitle file. If not, you can use `C-c C-v` @@ -125,14 +114,6 @@ point with `C-c ,` (`subed-toggle-sync-player-to-point`), and synchronizing point to player with `C-c .` (`subed-toggle-sync-point-to-player`). -# Using word-level timing data - -If there is a `.srv2` file with word-level timing data, you can load -it with `subed-word-data-load-from-file` from `subed-word-data.el` in -the subed package. You can then split with the usual `M-.` -(`subed-split-subtitle`), and it should use word-level timestamps when -available. - # Starting from a script Some talks don't have autogenerated captions, or you may prefer to @@ -148,29 +129,35 @@ If the speaker provided a script, I usually put the script under this heading. ``` If you're using subed, you can move to the point to a good stopping -point for a phrase, toggle playing with `M-SPC`, and then `M-.` +point for a phrase, use `M-SPC` to toggle pausing `M-.` (`subed-split-subtitle`) when the player reaches that point. If it's too fast, use `M-j` to repeat the current subtitle. # Starting from scratch -You can send us a text file with just the text transcript in it and -not worry about the timestamps. We can figure out the timing using +One option is to send us a text file with just the text transcript in it +and not worry about the timestamps. We can figure out the timing using [aeneas for forced alignment](https://www.readbeyond.it/aeneas/). -If you want to try timing as you go, you might find it easier to start -by making a VTT file with one subtitle spanning the whole video, like -this: +If you want to try timing as you go, you might +find it easier to start by making a VTT file with +one subtitle spanning the whole video (either +using the video duration or a very large +duration), like this: ```text WEBVTT -00:00:00.000 -> 00:39:07.000 +00:00:00.000 -> 24:00:00.000 ``` -Then start playback and type, using `M-.` (`subed-split-subtitle`) to -split after a reasonable length for a subtitle. If it's too fast, use -`M-j` to repeat the current subtitle. +Use `C-c C-p` (`subed-toggle-pause-while-typing`) +to automatically pause when typing. Then start +playback with `M-SPC` and type, using `M-.` +(`subed-split-subtitle`) to split after a +reasonable length for a subtitle. If it's too +fast, use `M-j` to repeat the current subtitle or +adjust `subed-mpv-plackback-speed`. # Chapter markers |