captioning.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201

[[!meta title="Captioning tips"]]
[[!meta copyright="Copyright &copy; 2021, 2022 Sacha Chua"]]

Captions are great for making videos (especially technical ones!)
easier to understand and search.

If you see a talk that you'd like to caption, feel free to download it
and start working on it with your favourite subtitle editor. Let me
know what you pick by e-mailing me at <sacha@sachachua.com> so that I
can update the index and try to avoid duplication of work. [Find talks that need captions here](https://emacsconf.org/help_with_main_captions).

We've been using <https://github.com/sachac/subed> to caption things
as VTT or SRT in Emacs, often starting with autogenerated captions
from OpenAI Whisper (the .vtt). You're welcome to work with captions
using your favourite tool.

We'll be posting VTT files so that they can be included by the HTML5
video player (demo: <https://emacsconf.org/2021/talks/news/>), so if
you use a different tool that produces another format, any format that
can be converted into that one (like SRT or ASS) is fine. The latest
version of `subed` has a `subed-convert` command that might be useful
for turning WebVTT files into tab-separated values (TSV) and back
again, if you prefer a more concise format.

You can e-mail me the subtitles when you're done, and then I can merge
it into the video.

You might find it easier to start with the autogenerated captions
and then refer to any resources provided by the speaker in order to
figure out spelling. Sometimes speakers provide pretty complete
scripts, which is great, but they also tend to add extra words. 

# Reflowing the text

First, let's start with reflowing. We like to have one line of
captions about 60 characters long so that they'll display nicely in
the stream. If the captions haven't been reflowed yet, you can reflow
the captions at natural pausing points (ex: phrases) so that they're
displayed nicely. You don't have to worry too much about getting the
timestamps precisely.

For example, instead of:

- so i'm going to talk today about a
- fun rewrite i did of uh of the bindat
- package

you can edit it to be more like:

- So I'm going to talk today
- about a fun rewrite I did
- of the bindat package.

You probably don't need to do this step if you're working with the VTT
files in the backstage area, since we try to reflow things before
people edit them, but we thought we'd demonstrate it in case people
are curious.

We start with the text file that OpenAI Whisper generates. We set my
`fill-column` to 50 and use `display-fill-column-indicator-mode` to
give myself a goal column. A little over is fine too. Then we use
`emacsconf-reflow` from the
[emacsconf-el](git.emacsconf.org/emacsconf-el/) repository to quickly
split up the text into captions by looking for where we want to add
newlines and then typing the word or words. We type in ' to join lines.
Sometimes, if it splits at the wrong one, we just undo it and edit it
normally.

It took about 4 minutes to reflow John Wiegley's 5-minute presentation. 

<video src="https://media.emacsconf.org/reflowing.webm" controls=""></video>

The next step is to align it with
[aeneas](https://github.com/readbeyond/aeneas) to get the timestamps
for each line of text. `subed-align` from the subed package helps with that.

<video src="https://media.emacsconf.org/alignment.webm" controls=""></video>

# Edit the VTT to fix misrecognized words

The next step is to edit these subtitles. VTT files are plain text, so
you can edit them with regular `text-mode` if you want to. If you're
editing subtitles within Emacs,
[subed](https://github.com/sachac/subed) can conveniently synchronize
video playback with subtitle editing, which makes it easier to figure
out technical words. subed tries to load the video based on the
filename, but if it can't find it, you can use `C-c C-v`
(`subed-mpv-find-media`) to play a file or `C-c C-u` to play a URL.

Look for misrecognized words and edit them. We also like to change
things to follow Emacs keybinding conventions. We sometimes spell out
acronyms on first use or add extra information in brackets. The
captions will be used in a transcript as well, so you can add
punctuation, remove filler words, and try to make it read better.

Sometimes you may want to tweak how the captions are split. You can
use `M-j` (`subed-jump-to-current-subtitle`) to jump to the caption if
I'm not already on it, listen for the right spot, and maybe use
`M-SPC` to toggle playback. Use `M-.` (`subed-split-subtitle`) to
split a caption at the current MPV playing position and `M-m`
(`subed-merge-with-next`) to merge a subtitle with the next one. Times
don't need to be very precise. If you don't understand a word or
phrase, add two question marks (`[??]`) and move on. We'll ask the
speakers to review the subtitles and can sort that out then.

If there are multiple speakers, indicate switches between speakers
with a `[speaker-name]:` tag.

<video src="https://media.emacsconf.org/editing.webm" controls=""></video>

Once you've gotten the hang of things, it might take between 1x to 4x
the video time to edit captions.

# Playing your subtitles together with the video

To load a specific subtitle file in MPV, use the `--sub-file=` or
`--sub-files=` command-line argument.

If you're using subed, the video should autoplay if it's named the
same as your subtitle file. If not, you can use `C-c C-v`
(`subed-mpv-play-from-file`) to load the video file. You can toggle
looping over the current subtitle with `C-c C-l`
(`subed-toggle-loop-over-current-subtitle`), synchronizing player to
point with `C-c ,` (`subed-toggle-sync-player-to-point`), and
synchronizing point to player with `C-c .`
(`subed-toggle-sync-point-to-player`).

# Using word-level timing data

If there is a `.srv2` file with word-level timing data, you can load
it with `subed-word-data-load-from-file` from `subed-word-data.el` in
the subed package. You can then split with the usual `M-.`
(`subed-split-subtitle`), and it should use word-level timestamps when
available.

# Starting from a script

Some talks don't have autogenerated captions, or you may prefer to
start from scratch. Whenever the speaker has provided a script, you
can use that as a starting point. One way is to start by making a VTT
file with one subtitle spanning the whole video, like this:

```text
WEBVTT

00:00:00.000 -> 00:39:07.000
If the speaker provided a script, I usually put the script under this heading.
```

If you're using subed, you can move to the point to a good stopping
point for a phrase, toggle playing with `M-SPC`, and then `M-.`
(`subed-split-subtitle`) when the player reaches that point. If it's
too fast, use `M-j` to repeat the current subtitle.

# Starting from scratch

You can send us a text file with just the text transcript in it and
not worry about the timestamps. We can figure out the timing using
[aeneas for forced alignment](https://www.readbeyond.it/aeneas/). 

If you want to try timing as you go, you might find it easier to start
by making a VTT file with one subtitle spanning the whole video, like
this:

```text
WEBVTT

00:00:00.000 -> 00:39:07.000
```

Then start playback and type, using `M-.` (`subed-split-subtitle`) to
split after a reasonable length for a subtitle. If it's too fast, use
`M-j` to repeat the current subtitle.

# Chapter markers

In addition to the captions, you may also want to add chapter markers.
An easy way to do that is to add a =NOTE Chapter heading= before the
subtitle that starts the chapter. For example: 

```text
...
00:05:13.880 --> 00:05:20.119
So yeah, like that's currently the problem.

NOTE Embeddings

00:05:20.120 --> 00:05:23.399
So I want to talk about embeddings.
...
```

We can then extract those with
`emacsconf-subed-make-chapter-file-based-on-comments`.

For an example of how chapter markers allow people to quickly navigate
videos, see <https://emacsconf.org/2021/talks/bindat/> .

Please let us know if you need any help!

Sacha <sacha@sachachua.com>