captioning.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

[[!meta title="Captioning tips"]]
[[!meta copyright="Copyright &copy; 2021, 2022 Sacha Chua"]]

Captions are great for making videos (especially technical ones!)
easier to understand and search.

If you see a talk that you'd like to caption, feel free to download it
and start working on it with your favourite subtitle editor. Let me
know what you pick by e-mailing me at <sacha@sachachua.com> so that I
can update the index and try to avoid duplication of work. [Find talks that need captions here](https://emacsconf.org/help_with_main_captions).

We've been using <https://github.com/sachac/subed> to caption things
as VTT or SRT in Emacs, often starting with autogenerated captions
from OpenAI Whisper (the .vtt). You're welcome to work with captions
using your favourite tool.

We'll be posting VTT files so that they can be included by the HTML5
video player (demo: <https://emacsconf.org/2021/talks/news/>), so if
you use a different tool that produces another format, any format that
can be converted into that one (like SRT or ASS) is fine. The latest
version of `subed` has a `subed-convert` command that might be useful
for turning WebVTT files into tab-separated values (TSV) and back
again, if you prefer a more concise format.

You can e-mail me the subtitles when you're done, and then I can merge
it into the video.

# Formatting tips

You might find it easier to start with the autogenerated captions
and then refer to any resources provided by the speaker in order to
figure out spelling. Sometimes speakers provide pretty complete
scripts, which is great, but they also tend to add extra words. 

Emacs being Emacs, you can use some code (
[example subed configuration](https://sachachua.com/dotemacs/#subed), see
`my-subed-fix-common-error` and `my-subed-common-edits`) to help with
capitalization and commonly misrecognized words.

Please keep captions to one line each so that they can be displayed
without wrapping, as we plan to broadcast by resizing the video and
displaying open captions below. Maybe 60 characters max, but target
around 50 or so? Since the captions are also displayed as text on the
talk pages, you can omit filler words. If the captions haven't been
split yet, you can split the captions at natural pausing points (ex:
phrases) so that they're displayed nicely. You don't have to worry too
much about getting the timestamps precisely.

For example, instead of:

- so i'm going to talk today about a
- fun rewrite i did of uh of the bindat
- package

you can edit it to be more like:

- So I'm going to talk today
- about a fun rewrite I did
- of the bindat package.

If you don't understand a word or phrase, add two question marks (??)
and move on. We'll ask the speakers to review the subtitles and can
sort that out then.

If there are multiple speakers, indicate switches between speakers
with a `[speaker-name]:` tag.

During questions and answers, please introduce the question with a
`[question]:` tag. When the speaker answers, use a `[speaker-name]:`
tag to make clear who is talking.

# Playing your subtitles together with the video

To load a specific subtitle file in MPV, use the `--sub-file=` or
`--sub-files=` command-line argument.

If you're using subed, the video should autoplay if it's named the
same as your subtitle file. If not, you can use `C-c C-v`
(`subed-mpv-play-from-file`) to load the video file. You can toggle looping over the current subtitle with `C-c C-l` (`subed-toggle-loop-over-current-subtitle`), synchronizing player to point with `C-c ,` (`subed-toggle-sync-player-to-point`), and synchronizing point to player with `C-c .` (`subed-toggle-sync-point-to-player`).

# Editing autogenerated captions

If you want to take advantage of the autogenerated captions and the
word-level timing data from YouTube or Torchaudio, you can start with the VTT file
for the video you want, then use `my-caption-load-word-data` from
<https://sachachua.com/dotemacs/#word-level> to load the srv2 file
(also attached), and then use `my-caption-split` to split using the
word timing data if possible. You can bind this to a keystroke with
something like `M-x local-set-key M-' my-caption-split`.

# Starting from a script

Some talks don't have autogenerated captions because YouTube didn't
produce any. Whenever the speaker has provided a script, you can use
that as a starting point. One way is to start by making a VTT file with
one subtitle spanning the whole video, like this:

```text
WEBVTT

00:00:00.000 -> 00:39:07.000
If the speaker provided a script, I usually put the script under this heading.
```

If you're using subed, you can move to the point to a good stopping
point for a phrase, toggle playing with `M-SPC`, and then `M-.`
(`subed-split-subtitle`) when the player reaches that point. If it's
too fast, use `M-j` to repeat the current subtitle.

# Starting from scratch

Sometimes there are no autogenerated captions and there's no script,
so we have to start from scratch.

You can send us a text file with just the text transcript in it and
not worry about the timestamps. We can figure out the timing using
[aeneas for forced alignment](https://www.readbeyond.it/aeneas/). 

If you want to try timing as you go, you might find it easier to start
by making a VTT file with one subtitle spanning the whole video, like
this:

```text
WEBVTT

00:00:00.000 -> 00:39:07.000
```

Then start playback and type, using `M-.` (`subed-split-subtitle`) to
split after a reasonable length for a subtitle. If it's too fast, use
`M-j` to repeat the current subtitle.

# Chapter markers

In addition to the captions, you may also want to create a separate
file noting chapter marks for use in the video player. You can send
chapter markers as timestamps and text (hh:mm:ss note, one per line)
or as WebVTT files that look something like this:

```text
WEBVTT

00:00:01.360 --> 00:02:06.006
Introduction

00:02:06.007 --> 00:05:27.537
What is BinDat?
```

For an example of how chapter markers allow people to quickly navigate
videos, see <https://emacsconf.org/2021/talks/bindat/> .

If you're using subed, you can make chapter markers by adding NOTE
comments with the chapter headings before the subtitles in the
chapter. Then we can use
`emacsconf-subed-make-chapter-file-based-on-comments` from
https://git.emacsconf.org/emacsconf-el/tree/emacsconf-subed.el to
create the chapter file.

Alternatively, you can make chapter markers by making a copy of your
WebVTT file and then using ~subed-merge-dwim~ (bound to ~M-m~ by
default) on a region including the subtitles that you want to merge.
You can also use ~subed-set-subtitle-text~ or
~subed-merge-region-and-set-text~ - if you can think of good
keybindings for those, please suggest them!

Please let us know if you need any help!

Sacha <sacha@sachachua.com>