diff options
author | Sacha Chua <sacha@sachachua.com> | 2022-12-10 08:41:07 -0500 |
---|---|---|
committer | Sacha Chua <sacha@sachachua.com> | 2022-12-10 08:41:07 -0500 |
commit | d922ee9190c1a614f946c51145e312838855b372 (patch) | |
tree | 8855c5eb5db5275b2405cc83d655359cafc14bfe | |
parent | eab1593e06b8aeb0b0466b227890b4b30aa67816 (diff) | |
download | emacsconf-wiki-d922ee9190c1a614f946c51145e312838855b372.tar.xz emacsconf-wiki-d922ee9190c1a614f946c51145e312838855b372.zip |
update captioning tips
-rw-r--r-- | captioning.md | 120 |
1 files changed, 81 insertions, 39 deletions
diff --git a/captioning.md b/captioning.md index d7d0fdab..e4310816 100644 --- a/captioning.md +++ b/captioning.md @@ -25,26 +25,19 @@ again, if you prefer a more concise format. You can e-mail me the subtitles when you're done, and then I can merge it into the video. -# Formatting tips - You might find it easier to start with the autogenerated captions and then refer to any resources provided by the speaker in order to figure out spelling. Sometimes speakers provide pretty complete scripts, which is great, but they also tend to add extra words. -Emacs being Emacs, you can use some code ( -[example subed configuration](https://sachachua.com/dotemacs/#subed), see -`my-subed-fix-common-error` and `my-subed-common-edits`) to help with -capitalization and commonly misrecognized words. +# Reflowing the text -Please keep captions to one line each so that they can be displayed -without wrapping, as we plan to broadcast by resizing the video and -displaying open captions below. Maybe 60 characters max, but target -around 50 or so? Since the captions are also displayed as text on the -talk pages, you can omit filler words. If the captions haven't been -split yet, you can split the captions at natural pausing points (ex: -phrases) so that they're displayed nicely. You don't have to worry too -much about getting the timestamps precisely. +First, let's start with reflowing. We like to have one line of +captions about 60 characters long so that they'll display nicely in +the stream. If the captions haven't been reflowed yet, you can reflow +the captions at natural pausing points (ex: phrases) so that they're +displayed nicely. You don't have to worry too much about getting the +timestamps precisely. For example, instead of: @@ -58,16 +51,65 @@ you can edit it to be more like: - about a fun rewrite I did - of the bindat package. -If you don't understand a word or phrase, add two question marks (??) -and move on. We'll ask the speakers to review the subtitles and can -sort that out then. +You probably don't need to do this step if you're working with the VTT +files in the backstage area, since we try to reflow things before +people edit them, but we thought we'd demonstrate it in case people +are curious. + +We start with the text file that OpenAI Whisper generates. We set my +`fill-column` to 50 and use `display-fill-column-indicator-mode` to +give myself a goal column. A little over is fine too. Then we use +`emacsconf-reflow` from the +[emacsconf-el](git.emacsconf.org/emacsconf-el/) repository to quickly +split up the text into captions by looking for where we want to add +newlines and then typing the word or words. We type in ' to join lines. +Sometimes, if it splits at the wrong one, we just undo it and edit it +normally. + +It took about 4 minutes to reflow John Wiegley's 5-minute presentation. + +<video src="https://media.emacsconf.org/reflowing.webm" controls=""></video> + +The next step is to align it with +[aeneas](https://github.com/readbeyond/aeneas) to get the timestamps +for each line of text. `subed-align` from the subed package helps with that. + +<video src="https://media.emacsconf.org/alignment.webm" controls=""></video> + +# Edit the VTT to fix misrecognized words + +The next step is to edit these subtitles. VTT files are plain text, so +you can edit them with regular `text-mode` if you want to. If you're +editing subtitles within Emacs, +[subed](https://github.com/sachac/subed) can conveniently synchronize +video playback with subtitle editing, which makes it easier to figure +out technical words. subed tries to load the video based on the +filename, but if it can't find it, you can use `C-c C-v` +(`subed-mpv-find-media`) to play a file or `C-c C-u` to play a URL. + +Look for misrecognized words and edit them. We also like to change +things to follow Emacs keybinding conventions. We sometimes spell out +acronyms on first use or add extra information in brackets. The +captions will be used in a transcript as well, so you can add +punctuation, remove filler words, and try to make it read better. + +Sometimes you may want to tweak how the captions are split. You can +use `M-j` (`subed-jump-to-current-subtitle`) to jump to the caption if +I'm not already on it, listen for the right spot, and maybe use +`M-SPC` to toggle playback. Use `M-.` (`subed-split-subtitle`) to +split a caption at the current MPV playing position and `M-m` +(`subed-merge-with-next`) to merge a subtitle with the next one. Times +don't need to be very precise. If you don't understand a word or +phrase, add two question marks (`[??]`) and move on. We'll ask the +speakers to review the subtitles and can sort that out then. If there are multiple speakers, indicate switches between speakers with a `[speaker-name]:` tag. -During questions and answers, please introduce the question with a -`[question]:` tag. When the speaker answers, use a `[speaker-name]:` -tag to make clear who is talking. +<video src="https://media.emacsconf.org/editing.webm" controls=""></video> + +Once you've gotten the hang of things, it might take between 1x to 4x +the video time to edit captions. # Playing your subtitles together with the video @@ -76,24 +118,27 @@ To load a specific subtitle file in MPV, use the `--sub-file=` or If you're using subed, the video should autoplay if it's named the same as your subtitle file. If not, you can use `C-c C-v` -(`subed-mpv-play-from-file`) to load the video file. You can toggle looping over the current subtitle with `C-c C-l` (`subed-toggle-loop-over-current-subtitle`), synchronizing player to point with `C-c ,` (`subed-toggle-sync-player-to-point`), and synchronizing point to player with `C-c .` (`subed-toggle-sync-point-to-player`). +(`subed-mpv-play-from-file`) to load the video file. You can toggle +looping over the current subtitle with `C-c C-l` +(`subed-toggle-loop-over-current-subtitle`), synchronizing player to +point with `C-c ,` (`subed-toggle-sync-player-to-point`), and +synchronizing point to player with `C-c .` +(`subed-toggle-sync-point-to-player`). -# Editing autogenerated captions +# Using word-level timing data -If you want to take advantage of the autogenerated captions and the -word-level timing data from YouTube or Torchaudio, you can start with the VTT file -for the video you want, then use `my-caption-load-word-data` from -<https://sachachua.com/dotemacs/#word-level> to load the srv2 file -(also attached), and then use `my-caption-split` to split using the -word timing data if possible. You can bind this to a keystroke with -something like `M-x local-set-key M-' my-caption-split`. +If there is a `.srv2` file with word-level timing data, you can load +it with `subed-word-data-load-from-file` from `subed-word-data.el` in +the subed package. You can then split with the usual `M-.` +(`subed-split-subtitle`), and it should use word-level timestamps when +available. # Starting from a script -Some talks don't have autogenerated captions because YouTube didn't -produce any. Whenever the speaker has provided a script, you can use -that as a starting point. One way is to start by making a VTT file with -one subtitle spanning the whole video, like this: +Some talks don't have autogenerated captions, or you may prefer to +start from scratch. Whenever the speaker has provided a script, you +can use that as a starting point. One way is to start by making a VTT +file with one subtitle spanning the whole video, like this: ```text WEBVTT @@ -109,9 +154,6 @@ too fast, use `M-j` to repeat the current subtitle. # Starting from scratch -Sometimes there are no autogenerated captions and there's no script, -so we have to start from scratch. - You can send us a text file with just the text transcript in it and not worry about the timestamps. We can figure out the timing using [aeneas for forced alignment](https://www.readbeyond.it/aeneas/). @@ -158,10 +200,10 @@ https://git.emacsconf.org/emacsconf-el/tree/emacsconf-subed.el to create the chapter file. Alternatively, you can make chapter markers by making a copy of your -WebVTT file and then using ~subed-merge-dwim~ (bound to ~M-m~ by +WebVTT file and then using `subed-merge-dwim` (bound to `M-m` by default) on a region including the subtitles that you want to merge. -You can also use ~subed-set-subtitle-text~ or -~subed-merge-region-and-set-text~ - if you can think of good +You can also use `subed-set-subtitle-text` or +`subed-merge-region-and-set-text` - if you can think of good keybindings for those, please suggest them! Please let us know if you need any help! |