summaryrefslogtreecommitdiffstats
path: root/2021/captions/emacsconf-2021-molecular--reproducible-molecular-graphics-with-org-mode--blaine-mooers--main.vtt
blob: 06d92f3ab0c169dd3069e7f406814e0497d021c6 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
WEBVTT

00:00.880 --> 00:00:02.446
Hi, I'm Blaine Mooers.

00:00:02.446 --> 00:00:04.160
I'm going to be talking about

00:00:04.160 --> 00:00:07.919
the use of molecular graphics in Org

00:07.919 --> 00:00:08.880
for the purpose of doing 

00:00:08.880 --> 00:00:11.840
reproducible research in structural biology.

00:00:11.840 --> 00:00:13.722
I'm an associate professor of biochemistry

00:00:13.722 --> 00:00:15.768
and microbiology at the University of Oklahoma 

00:00:15.768 --> 00:00:17.760
Health Sciences Center in Oklahoma City.

00:00:17.760 --> 00:00:19.600
My laboratory uses X-ray crystallography

00:00:19.600 --> 00:00:21.920
to determine the atomic structures

00:00:21.920 --> 00:00:23.439
of proteins like this one

00:00:23.439 --> 00:00:26.080
in the lower left, and of nucleic acids

00:26.080 --> 00:27.840
important in human health.

00:27.840 --> 00:00:29.591
This is a crystal of an RNA,

00:00:29.591 --> 00:00:31.359
which we have placed in this

00:00:31.359 --> 00:00:33.200
X-ray diffraction instrument.

00:00:33.200 --> 00:00:35.600
And after rotating the crystal

00:00:35.600 --> 00:00:38.000
in the X-ray beam for two degrees,

00:00:38.000 --> 00:00:40.480
we obtain this following diffraction pattern,

00:00:40.480 --> 00:00:43.280
which has thousands of spots on it.

00:43.280 --> 00:00:47.840
We rotate the crystal for over 180 degrees,

00:47.840 --> 00:00:51.760
collecting 90 images to obtain all the data.

00:00:51.760 --> 00:00:56.000
We then process those images

00:56.000 --> 00:00:57.752
and do an inverse Fourier transform

00:00:57.752 --> 00:00:59.920
to obtain the electron density.

00:00:59.920 --> 00:01:01.888
This electron density map has been

00:01:01.888 --> 00:01:04.344
contoured at the one-sigma level.

00:01:04.344 --> 00:01:06.116
That level's being shown by

00:01:06.116 --> 00:01:08.640
this blue chicken wire mesh.

00:01:08.640 --> 00:01:10.152
Atomic models have been fitted

00:01:10.152 --> 00:01:11.119
to this chicken wire.

00:01:11.119 --> 00:01:14.240
These lines represent bonds between atoms,

00:01:14.240 --> 00:01:16.240
atoms are being represented by points.

00:01:16.240 --> 00:01:18.640
And atoms are colored by atom type,

00:01:18.640 --> 00:01:21.280
red for oxygen, blue for nitrogen,

00:01:21.280 --> 00:01:23.040
and then in this case,

01:23.040 --> 00:01:24.720
carbon is colored cyan.

00:01:24.720 --> 00:01:27.203
We have fitted a drug molecule

00:01:27.203 --> 00:01:29.360
to the central blob of electron density

00:01:29.360 --> 00:01:32.400
which corresponds to that active site

01:32.400 --> 00:01:35.759
of this protein, which is RET Kinase.

00:01:35.759 --> 00:01:37.439
It's important in lung cancer.

00:01:37.439 --> 00:01:40.079
When we're finished with model building,

00:01:40.079 --> 00:01:41.339
we will then examine 

00:01:41.339 --> 00:01:43.006
the result of the final structure

00:01:43.006 --> 00:01:45.200
to prepare images for publication

00:01:45.200 --> 00:01:47.439
using molecular graphics program.

01:47.439 --> 00:01:48.108
In this case,

00:01:48.108 --> 00:01:50.000
we've overlaid a number of structures,

00:01:50.000 --> 00:01:53.600
and we're examining the distance between

01:53.600 --> 00:01:55.680
the side chain of an alanine

00:01:55.680 --> 00:01:58.880
and one or two drug molecules.

00:01:58.880 --> 00:02:00.719
This alanine sidechain actually blocks

00:02:00.719 --> 00:02:02.159
the binding of one of these drugs.

00:02:02.159 --> 00:02:03.439
The most popular program

02:03.439 --> 02:06.320
for doing this kind of analysis

02:06.320 --> 00:02:07.280
and for preparing images

00:02:07.280 --> 00:02:09.520
for publication is PyMOL.

02:09.520 --> 02:11.440
PyMOL was used to prepare these images

02:11.440 --> 02:14.720
on the covers of these featured journals.

02:14.720 --> 00:02:17.520
PyMOL is favored because

00:02:17.520 --> 00:02:19.520
it has 500 commands

00:02:19.520 --> 00:02:22.128
and 600 parameter settings

00:02:22.128 --> 00:02:23.360
that provide exquisite control

00:02:23.360 --> 00:02:24.959
over the appearance of the output.

00:02:24.959 --> 00:02:28.480
PyMOL has over 100,000 users,

02:28.480 --> 00:02:30.000
reflecting its popularity.

00:02:30.000 --> 00:02:31.599
This is the GUI for PyMOL.

00:02:31.599 --> 00:02:35.120
It shows in white the viewport area

00:02:35.120 --> 00:02:36.080
where one interacts 

00:02:36.080 --> 00:02:37.840
with the loaded molecular object.

00:02:37.840 --> 00:02:41.920
We have rendered the same RET kinase

02:41.920 --> 00:02:49.788
with a set of preset parameters

00:02:49.788 --> 00:02:51.200
that have been named "publication".

00:02:51.200 --> 00:02:52.720
The other way of applying

02:52.720 --> 00:02:54.319
parameter settings and commands

00:02:54.319 --> 00:02:56.720
is to enter them at the PyMOL prompt.

00:02:56.720 --> 00:03:00.159
Then the third way is to load and run scripts.

00:03:00.159 --> 00:03:03.120
PyMOL is actually written in C for speed,

00:03:03.120 --> 00:03:06.159
but it is wrapped in Python for extensibility.

03:06.159 --> 03:09.680
In fact, there are over 100 articles

03:09.680 --> 00:03:11.599
about various plugins and scripts 

00:03:11.599 --> 00:03:12.400
that people have developed

00:03:12.400 --> 00:03:15.120
to extend PyMOL for years.

03:15.120 --> 00:03:16.480
Here's some examples 

00:03:16.480 --> 00:03:18.959
from the snippet library that I developed.

03:18.959 --> 03:21.280
On the left is a default

03:21.280 --> 03:24.640
cartoon representation of a RNA hairpin.

03:24.640 --> 03:27.040
I find this reduced representation

03:27.040 --> 00:03:30.799
of the RNA hairpin to be too stark.

03:30.799 --> 00:03:32.319
I prefer these alternate ones

00:03:32.319 --> 00:03:33.840
that I developed.

03:33.840 --> 03:37.519
So, these three to the right of this one

03:37.519 --> 00:03:39.519
are not available through

00:03:39.519 --> 00:03:40.720
pull downs in PyMOL.

00:03:40.720 --> 00:03:42.748
So why developed a PyMOL

00:03:42.748 --> 00:03:44.879
snippet library for Org?

03:44.879 --> 00:03:47.040
Well, Org provides great support 

00:03:47.040 --> 00:03:48.560
for literate programming,

00:03:48.560 --> 00:03:49.840
where you have code blocks

00:03:49.840 --> 00:03:52.000
that contain code that's executable,

00:03:52.000 --> 00:03:53.040
and the output is shown

00:03:53.040 --> 00:03:54.959
below that code block.

03:54.959 --> 00:03:56.720
And then you can fill 

00:03:56.720 --> 00:03:58.959
the surrounding area in the document

03:58.959 --> 00:04:00.799
with the explanatory prose.

00:04:00.799 --> 00:04:02.000
Org has great support 

00:04:02.000 --> 00:04:04.480
for editing that explanatory prose.

00:04:04.480 --> 00:04:08.080
Org can run PyMOL through PyMOL's Python API.

04:08.080 --> 00:04:11.280
One of the uses of such an Org document

00:04:11.280 --> 00:04:14.487
is to assemble a gallery of draft images.

00:04:14.487 --> 00:04:16.563
We often have to look at

00:04:16.563 --> 00:04:19.840
dozens of candidate images 

00:04:19.840 --> 00:04:22.000
with the molecule in different orientations,

00:04:22.000 --> 00:04:23.520
different zoom settings,

04:23.520 --> 00:04:25.032
different representations,

00:04:25.032 --> 00:04:27.280
different colors, and so on.

00:04:27.280 --> 00:04:30.639
And to have those images along with…,

00:04:30.639 --> 00:04:31.840
adjacent to the code

04:31.840 --> 00:04:33.680
that was used to generate them,

00:04:33.680 --> 00:04:37.199
can be very effective for

04:37.199 --> 00:04:39.680
further editing the code

00:04:39.680 --> 00:04:40.880
and improving the images.

00:04:40.880 --> 00:04:44.080
Once the final images have been selected,

04:44.080 --> 00:04:46.320
one can submit the code

00:04:46.320 --> 00:04:48.479
as part of the supplemental material.

00:04:48.479 --> 00:04:52.400
Finally, one can use the journal package

04:52.400 --> 00:04:54.608
to use the Org files as

00:04:54.608 --> 00:04:57.120
an electronic laboratory notebook,

00:04:57.120 --> 00:04:59.600
which is illustrated with molecular images.

00:04:59.600 --> 00:05:01.039
This can be very useful

00:05:01.039 --> 00:05:04.080
when assembling manuscripts

05:04.080 --> 00:05:05.440
months or years later.

00:05:05.440 --> 00:05:08.320
This shows the YASnippet pull down

05:08.320 --> 00:05:12.720
after my library has been installed.

00:05:12.720 --> 00:05:15.360
I have an Org file open,

00:05:15.360 --> 00:05:17.120
so I'm in Org mode.

05:17.120 --> 00:05:20.880
We have the Org mode submenu,

05:20.880 --> 00:05:23.919
and under it, all my snippets

00:05:23.919 --> 00:05:26.880
are located in these sub-sub-menus

05:26.880 --> 00:05:30.880
that are prepended with pymolpy.

00:05:30.880 --> 00:05:33.840
Under the molecular representations menu,

00:05:33.840 --> 00:05:36.479
there is a listing of snippets.

00:05:36.479 --> 00:05:38.563
The top one is for the ambient occlusion effect,

00:05:38.563 --> 00:05:39.840
which we're going to apply

00:05:39.840 --> 00:05:41.039
in this Org file.

00:05:41.039 --> 00:05:44.240
So these lines of code were inserted after,

00:05:44.240 --> 00:05:48.479
as well as these flanking lines

05:48.479 --> 00:05:50.240
that define the source block,

00:05:50.240 --> 00:05:53.280
were inserted by clicking on that line.

05:53.280 --> 00:05:55.120
Then I've added some additional code.

00:05:55.120 --> 00:05:56.880
So, the first line defines

00:05:56.880 --> 00:05:59.039
the language that we're using.

00:05:59.039 --> 00:05:59.768
We're going to use

00:05:59.768 --> 00:06:02.639
the jupyter-python language.

06:02.639 --> 00:06:04.560
Then you can define the session,

00:06:04.560 --> 00:06:06.400
and the name of this is arbitrary.

00:06:06.400 --> 00:06:09.680
Then the kernel is our means

00:06:09.680 --> 00:06:11.360
by which we gain access

00:06:11.360 --> 00:06:14.880
to the Python API of PyMOL.

06:14.880 --> 00:06:17.039
The remaining settings apply to the output.

00:06:17.039 --> 00:06:18.319
To execute this code 

00:06:18.319 --> 00:06:21.199
and to get the resulting image,

00:06:21.199 --> 00:06:25.120
you put the cursor inside this code block,

00:06:25.120 --> 00:06:26.560
or on the top line,

00:06:26.560 --> 00:06:29.840
and enter Control c Control c (C-c C-c).

06:29.840 --> 00:06:32.240
This shows the resulting image

00:06:32.240 --> 00:06:33.600
has been loaded up.

00:06:33.600 --> 00:06:37.280
It takes about 10 seconds for this to appear.

06:37.280 --> 00:06:38.479
So the downside of this is

00:06:38.479 --> 00:06:40.729
if you have a large number of these,

00:06:40.729 --> 00:06:43.919
the Org file can lag quite a bit

00:06:43.919 --> 00:06:45.120
when you try to scroll through it,

00:06:45.120 --> 00:06:48.319
so you need to close up these result drawers,

00:06:48.319 --> 00:06:50.960
and only open up the ones 

00:06:50.960 --> 00:06:53.199
that you're currently examining.

00:06:53.199 --> 00:06:54.319
These are features I think

06:54.319 --> 06:56.240
are important in practical work.

06:56.240 --> 00:06:59.840
So, the plus is, a feature that's present,

00:06:59.840 --> 00:07:01.120
minus is absent.

00:07:01.120 --> 00:07:03.199
I think tab stops and tab triggers

00:07:03.199 --> 00:07:04.800
are really important.

07:04.800 --> 00:07:05.680
Triggers are important for

00:07:05.680 --> 00:07:06.720
the fast assertion code,

00:07:06.720 --> 00:07:08.639
tab stops are important for

07:08.639 --> 00:07:10.560
complete, accurate editing of code.

00:07:10.560 --> 00:07:12.735
I already addressed the rendering speed

00:07:12.735 --> 00:07:14.560
and scrolling issue.

00:07:14.560 --> 00:07:15.759
I think the way around this

00:07:15.759 --> 00:07:19.199
is just to export the Org document to a PDF file

00:07:19.199 --> 00:07:23.360
and do your evaluation of different images 

00:07:23.360 --> 00:07:25.199
by examining them in the PDF 

00:07:25.199 --> 00:07:26.560
rather than the Org file.

00:07:26.560 --> 00:07:30.400
The path to PDF is lightning fast in Emacs

00:07:30.400 --> 00:07:32.240
compared to Jupyter,

00:07:32.240 --> 00:07:35.280
where it's cumbersome in comparison.

00:07:35.280 --> 00:07:38.400
This is a snapshot of my initialization file.

00:07:38.400 --> 00:07:41.840
These parts are relevant to doing this work.

00:07:41.840 --> 00:07:43.039
A full description of them

00:07:43.039 --> 00:07:46.319
can be found in the README file

07:46.319 --> 00:07:48.639
of this repository on GitHub.

00:07:48.639 --> 00:07:49.456
I'd like to thank the

00:07:49.456 --> 00:07:51.840
Nathan Shock Data Science Workshop

00:07:51.840 --> 00:07:54.319
for feedback during presentations

00:07:54.319 --> 00:07:56.160
I've made about this work.

00:07:56.160 --> 00:07:57.628
And I would also like to thank

00:07:57.628 --> 00:08:00.240
the following funding sources for support.

00:08:00.240 --> 00:08:03.879
I will now take questions. Thank you.

00:08:03.879 --> 00:08:03.986
[captions by Blaine Mooers and Bhavin Gandhi]