1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
|
WEBVTT captioned by sachac
NOTE Intro
00:00:00.000 --> 00:00:10.799
Hi everyone, this is EmacsConf 2024. I'm Colin, and today
00:00:10.800 --> 00:00:17.319
I'll be talking about transducers.
00:00:17.320 --> 00:00:21.879
After introducing them, I'll share a bit of history about
00:00:21.880 --> 00:00:25.359
transducers and the problems that they solve, some basics
00:00:25.360 --> 00:00:28.879
about how we can use them, how they work, like how they're
00:00:28.880 --> 00:00:32.399
implemented, some demonstrations of how we can actually
00:00:32.400 --> 00:00:36.959
use them in the wild, and then some other discussions about
00:00:36.960 --> 00:00:41.519
issues that they have.
NOTE What are transducers?
00:00:41.520 --> 00:00:46.399
Okay, let's get right in. What are transducers?
00:00:46.400 --> 00:00:49.679
Transducers are a way to do streaming iteration with a
00:00:49.680 --> 00:00:55.679
modern API.
00:00:55.680 --> 00:01:00.359
Who are transducers for, and thereby, who is
00:01:00.360 --> 00:01:05.599
this talk for? Well, it's for people who want to do streamed
00:01:05.600 --> 00:01:10.519
data processing in Emacs. It's for people who perhaps
00:01:10.520 --> 00:01:14.199
aren't satisfied with the existing APIs, for example, the
00:01:14.200 --> 00:01:19.359
seq API, or some other common libraries that provide
00:01:19.360 --> 00:01:23.719
similar functionality. Maybe you're not a fan of the loop
00:01:23.720 --> 00:01:29.079
macro. Some people find it difficult to understand. Or
00:01:29.080 --> 00:01:32.719
maybe you've done a bunch of Clojure before, and you'd like
00:01:32.720 --> 00:01:36.879
more aspects of Clojure in your Emacs Lisp. Or maybe you're
00:01:36.880 --> 00:01:40.239
just interested in transducers in general, because the
00:01:40.240 --> 00:01:48.839
pattern has now been ported to multiple different Lisps.
00:01:48.840 --> 00:01:55.039
So I'm Colin. I'm fosskers on everything online, and I do
00:01:55.040 --> 00:01:58.519
mainly back-end programming work and a lot of open source
00:01:58.520 --> 00:02:05.159
software. I wrote Haskell for a long time, both as a hobbyist
00:02:05.160 --> 00:02:09.079
and professionally. Since the COVID years, I've been
00:02:09.080 --> 00:02:13.439
writing Rust, both open source and professionally. But now
00:02:13.440 --> 00:02:19.719
I find that in my spare time, I'm mostly writing Common Lisp.
00:02:19.720 --> 00:02:22.719
Some things I learned from my years of Haskell was that a lot
00:02:22.720 --> 00:02:27.519
of programming is just altering the shape of data. You know,
00:02:27.520 --> 00:02:31.359
sometimes we work through our algorithm line by line. We're
00:02:31.360 --> 00:02:36.239
trying to just tell the computer exactly what to do. But if we
00:02:36.240 --> 00:02:39.639
step back, a lot of the time we're just getting in data of some
00:02:39.640 --> 00:02:44.119
shape, changing it, and then passing it along. A lot of
00:02:44.120 --> 00:02:49.279
these patterns are common, identified
00:02:49.280 --> 00:02:53.639
decades ago. For instance, we have some collection, and we
00:02:53.640 --> 00:02:56.999
want to transform every element of that collection and then
00:02:57.000 --> 00:03:01.199
pass it on. Or maybe we're trying to filter out bad elements
00:03:01.200 --> 00:03:04.799
in that collection. Or maybe we're looking for a specific
00:03:04.800 --> 00:03:07.759
element in that collection. Yes, you could write all that
00:03:07.760 --> 00:03:11.839
with for loops, but these kind of common patterns were
00:03:11.840 --> 00:03:18.559
identified and given names decades ago. So why not use them?
00:03:18.560 --> 00:03:21.879
They say that there are two major problems in computer
00:03:21.880 --> 00:03:25.759
science, one being cache validation and the other being
00:03:25.760 --> 00:03:27.589
naming things.
NOTE Common issues
00:03:27.590 --> 00:03:29.799
I've identified five other problems that
00:03:29.800 --> 00:03:33.199
come up when we're trying to deal with collections of data,
00:03:33.200 --> 00:03:40.599
or big streams of data. One is that if we were trying to
00:03:40.600 --> 00:03:45.279
load a file all into memory all at once and process the whole
00:03:45.280 --> 00:03:48.279
thing, sometimes we can have memory problems. You've
00:03:48.280 --> 00:03:54.999
probably seen out-of-memory errors or such things.
00:03:55.000 --> 00:03:58.199
A second issue that comes up is that if we were looking at a
00:03:58.200 --> 00:04:01.799
giant for loop, in particular a nested for loop or such
00:04:01.800 --> 00:04:06.079
things, it can be hard to tell just by looking at the code what
00:04:06.080 --> 00:04:11.039
it's trying to do, what it intends. If we don't go character
00:04:11.040 --> 00:04:16.439
by character or line by line, it can be hard to understand it.
00:04:16.440 --> 00:04:20.039
Furthermore, and this is particularly an issue with Emacs
00:04:20.040 --> 00:04:26.399
Lisp, is that if one call, for instance, to seq-map, then
00:04:26.400 --> 00:04:29.319
piped into seq-filter, for instance, will have an
00:04:29.320 --> 00:04:33.599
intermediate allocation, the map will take the source
00:04:33.600 --> 00:04:37.639
container, allocate a new one, and then the filter will
00:04:37.640 --> 00:04:40.319
operate over the second one. This is wasteful.
00:04:40.320 --> 00:04:48.879
Furthermore, it can often be difficult to abort a stream.
00:04:48.880 --> 00:04:53.199
For instance, if we were filtering through our collection,
00:04:53.200 --> 00:04:57.319
but we knew we only wanted to go halfway, for instance, for
00:04:57.320 --> 00:05:01.759
some reason, we have no way to stop it halfway through. We
00:05:01.760 --> 00:05:05.479
just have to process the whole thing, even if we know we don't
00:05:05.480 --> 00:05:11.919
need to. Another issue is that for languages that have
00:05:11.920 --> 00:05:18.039
traits, or in Haskell they're called type classes, if you
00:05:18.040 --> 00:05:22.399
are defining what it means to map over something, you often
00:05:22.400 --> 00:05:27.039
have to redefine that for every kind of container or thing
00:05:27.040 --> 00:05:31.239
that you're iterating over. Wouldn't it be nice if we could
00:05:31.240 --> 00:05:34.719
define things like map just once and then reuse them
00:05:34.720 --> 00:05:39.839
everywhere? Now, transducers solve all five of these,
00:05:39.840 --> 00:05:44.039
without the addition of new language features, and with
00:05:44.040 --> 00:05:47.279
little more than plain old function composition.
NOTE Transducers
00:05:47.280 --> 00:05:53.119
If this is your first time hearing of transducers, yeah,
00:05:53.120 --> 00:05:57.439
no problem. They were originally invented in Clojure by
00:05:57.440 --> 00:06:01.039
Rich Hickey, and this is a quote from him. He thinks
00:06:01.040 --> 00:06:05.439
transducers are a fundamental primitive that decouple
00:06:05.440 --> 00:06:10.079
critical logic from list or sequence processing, and if he
00:06:10.080 --> 00:06:13.999
had to do Clojure all over, he'd put them at the bottom, at the
00:06:14.000 --> 00:06:19.279
very bottom of all the fundamental primitives. Now, that's
00:06:19.280 --> 00:06:24.599
Rich speaking quite highly of them. And I think he has a point
00:06:24.600 --> 00:06:25.159
here.
00:06:25.160 --> 00:06:32.399
They were invented originally in Clojure. In more
00:06:32.400 --> 00:06:34.772
recent years, they were brought over to Scheme
00:06:34.773 --> 00:06:38.774
via SRFI 171. That's where I found them
00:06:38.775 --> 00:06:41.521
when I was learning the Guile language.
00:06:41.522 --> 00:06:43.919
In the process of submitting a patch, I realized
00:06:43.920 --> 00:06:48.199
that there were other things to be improved. So I ported the
00:06:48.200 --> 00:06:51.399
pattern to Common Lisp, then Fennel, and then more
00:06:51.400 --> 00:06:56.639
recently, Emacs Lisp. The Common Lisp and Emacs Lisp APIs
00:06:56.640 --> 00:07:01.199
are identical. And the Fennel one is not identical, but
00:07:01.200 --> 00:07:05.799
fairly similar. Overall, everywhere you find
00:07:05.800 --> 00:07:10.279
transducers, they should basically be fairly uniform.
00:07:10.280 --> 00:07:15.759
When I originally made the Common Lisp variant first, I
00:07:15.760 --> 00:07:18.799
sampled the APIs from a number of different languages and
00:07:18.800 --> 00:07:23.439
came up with what I believed to be a representative sample of
00:07:23.440 --> 00:07:27.959
what most people would want out of such a library. I gave
00:07:27.960 --> 00:07:32.439
functions their common modern names. For instance, map
00:07:32.440 --> 00:07:35.279
is map and filter is filter and so on.
NOTE Using transducers
00:07:35.280 --> 00:07:42.599
What does the usage of transducers look like? Well,
00:07:42.600 --> 00:07:48.959
these examples will all be the Emacs Lisp variant, but the
00:07:48.960 --> 00:07:52.359
Common Lisp will look basically exactly the same, minus
00:07:52.360 --> 00:07:54.079
this little t- prefix.
00:07:54.080 --> 00:08:00.919
Running transducers requires three things. It requires a
00:08:00.920 --> 00:08:06.439
source. This could be an obvious thing like a list or a
00:08:06.440 --> 00:08:11.479
vector, but it could be other things like a file, or in Emacs
00:08:11.480 --> 00:08:16.348
list in particular, a buffer.
00:08:16.349 --> 00:08:20.112
A reducer is a function. It's something like
00:08:20.113 --> 00:08:22.639
the + operator or the * operator,
00:08:22.640 --> 00:08:26.785
or certain constructors of various containers.
00:08:26.786 --> 00:08:32.125
It takes values and collates them into some final version.
00:08:32.126 --> 00:08:33.946
Now, finally, we have what we're calling here
00:08:33.947 --> 00:08:37.567
a transducer chain. This could be one transducer function
00:08:37.568 --> 00:08:43.479
or it could be multiple composed together. These are the
00:08:43.480 --> 00:08:47.079
functions that actually take data and transform them
00:08:47.080 --> 00:08:55.279
somehow. For instance, this. We have a list of three
00:08:55.280 --> 00:09:04.199
elements. We want to reduce it into a vector. How we are
00:09:04.200 --> 00:09:07.519
going to transform the elements along the way: we are doing
00:09:07.520 --> 00:09:13.359
plus one to each of them. If this syntax is new to you, just
00:09:13.360 --> 00:09:18.039
know that this #' just means that this thing that
00:09:18.040 --> 00:09:22.079
comes after it is the name of the function. In Common Lisp and
00:09:22.080 --> 00:09:26.079
Emacs Lisp, this is necessary, but for Clojure and Scheme,
00:09:26.080 --> 00:09:32.719
it is not. So we can see here that just this example is not much
00:09:32.720 --> 00:09:36.119
different than any other normal map call you might see made,
00:09:36.120 --> 00:09:40.239
but if nothing else, it's a handy way to convert a list to a
00:09:40.240 --> 00:09:44.999
vector or anything else. There are many, many reducers
00:09:45.000 --> 00:09:48.239
available and many different forms that we can
00:09:48.240 --> 00:09:52.624
collate the final value into.
NOTE A more involved example with comp
00:09:52.625 --> 00:09:55.086
Let's see a more involved example.
00:09:55.087 --> 00:09:58.049
Okay, now we've got some more meat here.
00:09:58.050 --> 00:10:01.772
Here we can see usage of the comp function
00:10:01.773 --> 00:10:05.255
and a custom source, ints.
00:10:05.256 --> 00:10:11.079
Ints is an infinite generator of integer values. That's not
00:10:11.080 --> 00:10:14.783
like a list or a file. It will generate infinitely.
00:10:14.784 --> 00:10:19.439
Comp is letting us compose multiple transducer functions
00:10:19.440 --> 00:10:23.759
together. Notice that this is the opposite order of what
00:10:23.760 --> 00:10:28.079
we'd usually be used to from a function like comp. The order
00:10:28.080 --> 00:10:32.679
here is top to bottom, basically, so that the map goes first,
00:10:32.680 --> 00:10:37.839
then the filter, and then the take. So effectively is what
00:10:37.840 --> 00:10:40.919
we're doing is taking all the integers that exist,
00:10:40.920 --> 00:10:45.399
positive, adding one to them, filtering out only the even
00:10:45.400 --> 00:10:50.039
ones, but then just taking 10. Cons here is a function that
00:10:50.040 --> 00:10:57.039
just produces the ending result as a list. So what happens
00:10:57.040 --> 00:11:00.479
here specifically is how we are avoiding intermediate
00:11:00.480 --> 00:11:04.238
allocations. First, the number 0 will come through.
00:11:04.239 --> 00:11:07.879
It will be pulled out of this source internally by transduce.
00:11:07.880 --> 00:11:10.919
It will make its way into the map. The map will add it. Then it
00:11:10.920 --> 00:11:15.799
will immediately go into this filter step. So it's not like
00:11:15.800 --> 00:11:19.119
all the maps occur, and then all the filters occur. We do
00:11:19.120 --> 00:11:24.039
everything for each element. So the 0 comes in, now it's 1.
00:11:24.040 --> 00:11:27.559
The filter would occur. Well, it's going to fail that
00:11:27.560 --> 00:11:31.119
because it's not even, so it will just bail there. Now we'll
00:11:31.120 --> 00:11:35.239
go to the next one. Now 1 will come, it will become 2, then
00:11:35.240 --> 00:11:39.119
it will be saved by this evenp call, and then the take will
00:11:39.120 --> 00:11:42.599
capture it, because we only want 10 values here. You can
00:11:42.600 --> 00:11:45.239
see 2, 4, 6, 8, and so on is the result that we
00:11:45.240 --> 00:11:49.332
expect. So let's play around a little bit.
NOTE In Emacs
00:11:49.333 --> 00:11:53.336
Let's jump into Emacs and see what we can do.
00:11:53.337 --> 00:11:58.500
Alright, you should see my Emacs screen here.
00:11:58.501 --> 00:12:04.359
These are the actual notes for the actual
00:12:04.360 --> 00:12:08.959
presentation done in Org Mode. I'll boost that up in size for
00:12:08.960 --> 00:12:12.639
a little bit. That should be more than big enough for you.
00:12:12.640 --> 00:12:17.719
Just by changing the reducer, we can change the result.
00:12:17.720 --> 00:12:21.079
Okay, now it's a vector. Well, what else can we do to it? Well,
00:12:21.080 --> 00:12:25.959
let's just add up the results. Maybe we just want to count the
00:12:25.960 --> 00:12:30.919
results. Oh, indeed, there were 10. What if we want to find
00:12:30.920 --> 00:12:36.959
the average of the results? What if we want to find the median
00:12:36.960 --> 00:12:40.959
of the results? And so on. Here's some more interesting
00:12:40.960 --> 00:12:45.839
things that we could do. We could add different steps. So
00:12:45.840 --> 00:12:51.239
here we have all the integers. Let's add, hmm, okay, we'll
00:12:51.240 --> 00:12:57.399
keep that. We're going to add t-enumerate. What enumerate does
00:12:57.400 --> 00:13:00.879
is for each item that comes through, it is
00:13:00.880 --> 00:13:06.039
going to add a sort of index to it and make it a pair. In this
00:13:06.040 --> 00:13:08.719
case, it's going to be equal to what came in here. Well, we can
00:13:08.720 --> 00:13:12.399
change it. If we start this at 1, now it will be different.
00:13:12.400 --> 00:13:15.519
1 will be paired with 0, and then 2 would be paired
00:13:15.520 --> 00:13:19.559
with 1, and so on. We'll accept that the even call will change
00:13:19.560 --> 00:13:24.039
that a little bit. Why we're doing this is because we want
00:13:24.040 --> 00:13:27.279
to form a hash table. Let's move that down to 3, maybe
00:13:27.280 --> 00:13:31.439
we'll get a better result. What do we see? Okay, here now the
00:13:31.440 --> 00:13:37.359
result is a hash table. What are its values? Well, 0 seems
00:13:37.360 --> 00:13:40.479
to have... The key of 0 seems to be paired with 2, the key of
00:13:40.480 --> 00:13:42.909
1 seems to be paired with 4,
00:13:42.910 --> 00:13:47.411
and 2 seems to be paired with 6.
00:13:47.412 --> 00:13:51.293
Maybe let's jazz that up even a little bit more.
00:13:51.294 --> 00:13:52.973
We're going to start from a string
00:13:52.974 --> 00:13:57.943
and we'll call it hello.
00:13:57.944 --> 00:13:59.564
That's not going to work anymore
00:13:59.565 --> 00:14:02.585
and neither is that, but what we could do is
00:14:02.586 --> 00:14:05.498
we could say t-map #'string.
00:14:05.499 --> 00:14:08.627
I believe we'll do that.
00:14:08.628 --> 00:14:08.959
Let's see if that works. It did. So that's
00:14:08.960 --> 00:14:13.589
going to convert a character into a string.
00:14:13.590 --> 00:14:14.679
Let's just go two
00:14:14.680 --> 00:14:18.399
just to make it a little easier. Now you can see that we've
00:14:18.400 --> 00:14:21.919
constructed a hash table here. The key of 0 is mapped to the
00:14:21.920 --> 00:14:27.079
string of h and 1 is mapped to e. Now, I really like having
00:14:27.080 --> 00:14:29.468
this reducer in particular.
NOTE Hash tables
00:14:29.469 --> 00:14:30.639
Know that hash tables are
00:14:30.640 --> 00:14:34.199
also legal sources. I find that both in Emacs Lisp and in
00:14:34.200 --> 00:14:37.119
Common Lisp, dealing with hash tables--like creating them
00:14:37.120 --> 00:14:41.599
and altering them--can be a bit of a pain. Having them
00:14:41.600 --> 00:14:45.679
immediately available like this with transducers is very
00:14:45.680 --> 00:14:49.079
handy, I find. We can work with something that wasn't a hash
00:14:49.080 --> 00:14:53.279
table. We can construct it in a way that makes it amenable to
00:14:53.280 --> 00:14:56.199
that, and then reduce it down into a hash table, and here you
00:14:56.200 --> 00:14:58.039
go. Very handy.
NOTE Clarity
00:14:58.040 --> 00:15:06.399
One last point is that you can see very clearly what
00:15:06.400 --> 00:15:10.479
this is attempting to do, as opposed to, say, a for loop. It's
00:15:10.480 --> 00:15:12.719
very clear what that step is doing, and then you can see what
00:15:12.720 --> 00:15:15.119
that is doing, and you know that the result is going to be two.
00:15:15.120 --> 00:15:18.559
Each line is kind of its own declarative step, and it should
00:15:18.560 --> 00:15:22.159
be clear, just by staring at this, basically what you're
00:15:22.160 --> 00:15:25.399
going to get out. This is one main difference from other
00:15:25.400 --> 00:15:29.599
languages that have things--say, for instance, Rust's
00:15:29.600 --> 00:15:35.439
iterator API--is the difference between the transducers
00:15:35.440 --> 00:15:41.639
and the reducers. If we go up here, for example, the
00:15:41.640 --> 00:15:44.679
difference between the transducers and the reducers and
00:15:44.680 --> 00:15:48.119
the sources is not explicitly laid out, whereas with
00:15:48.120 --> 00:15:53.119
transducers, it is. You have to be aware of how these things
00:15:53.120 --> 00:15:55.799
are different. I think that that helps clarity.
NOTE How do transducers work?
00:15:55.800 --> 00:16:01.999
Moving on. How do transducers work? Well,
00:16:02.000 --> 00:16:09.857
we want to go see the README.
00:16:09.858 --> 00:16:11.399
So, what we're going to do is
00:16:11.400 --> 00:16:19.102
we're going to go to here.
00:16:19.103 --> 00:16:21.959
You should still be able to see this.
00:16:21.960 --> 00:16:28.583
This is the CL example, actually.
00:16:28.584 --> 00:16:32.279
Let's go to transducers.el.
00:16:32.280 --> 00:16:37.744
Their APIs and READMEs are the same,
00:16:37.745 --> 00:16:39.919
but just for the sake of it, we will go see
00:16:39.920 --> 00:16:45.726
how this looks on the Emacs side,
00:16:45.727 --> 00:16:48.046
just so that nothing is a surprise.
00:16:48.047 --> 00:16:50.239
But recall that the APIs are essentially the same
00:16:50.240 --> 00:16:53.679
between the two. If you go to this section, writing your
00:16:53.680 --> 00:16:56.839
own primitives, you can read about how transducers are
00:16:56.840 --> 00:17:00.999
actually formed, whether or not you want to write them
00:17:01.000 --> 00:17:06.799
yourself or not. We can see here t-map. We accept the
00:17:06.800 --> 00:17:10.239
function that you want to operate with. Then you've got
00:17:10.240 --> 00:17:13.319
this extra little lambda here that's coming in, and it's
00:17:13.320 --> 00:17:17.079
receiving a thing that is named reducer. Now, while here
00:17:17.080 --> 00:17:20.439
we're calling it reducer, it's actually the chain of all the
00:17:20.440 --> 00:17:25.159
composed functions together. It's all those main
00:17:25.160 --> 00:17:28.479
transducer steps. Finally, it's the reducer all
00:17:28.480 --> 00:17:31.879
composed together with normal function composition.
00:17:31.880 --> 00:17:35.877
That will matter very soon. Now here's the actual meat.
00:17:35.878 --> 00:17:40.519
We can see the accumulative result that's coming in with the
00:17:40.520 --> 00:17:45.739
current element. Now we need to operate on this.
00:17:45.740 --> 00:17:47.840
Were it normally mapped, we would see us
00:17:47.841 --> 00:17:49.919
applying the F to the input.
00:17:49.920 --> 00:17:53.519
But here, you can see us applying the F to the input and then
00:17:53.520 --> 00:17:58.679
continuing on. So us calling the rest of the composed chain
00:17:58.680 --> 00:18:03.159
here is the effect of, in the previous slide, moving to the
00:18:03.160 --> 00:18:07.156
next step. We could ignore this line for now.
00:18:07.157 --> 00:18:13.819
If you're curious, please read the README in detail.
00:18:13.820 --> 00:18:15.579
Now, what about reducers?
00:18:15.580 --> 00:18:18.879
What do those look like? Well, let's just scroll
00:18:18.880 --> 00:18:22.439
down here. Recall that a reducer is a function that's
00:18:22.440 --> 00:18:26.959
consuming a stream, right? Zoom that up for you a little bit.
00:18:26.960 --> 00:18:33.919
Now, in the case of count, recall that this is how it's
00:18:33.920 --> 00:18:37.679
working, how we saw a moment ago. So clearly this list of five
00:18:37.680 --> 00:18:42.199
elements only has five things in it. Well, a reducer by
00:18:42.200 --> 00:18:47.599
structure is a function of two, one, or zero arguments. So we
00:18:47.600 --> 00:18:50.639
can see here in the case of two, this is the normal iterative
00:18:50.640 --> 00:18:54.519
case. We don't care about the input for count, we just care
00:18:54.520 --> 00:18:58.559
about the current accumulated count that we're doing, and
00:18:58.560 --> 00:19:02.879
we add one to it, and that's it. This then goes back to
00:19:02.880 --> 00:19:06.359
the loop and the whole process starts again with the next
00:19:06.360 --> 00:19:10.879
element. In this kind of done case, this is used internal to
00:19:10.880 --> 00:19:16.879
that sort of the supervising function transduce. It's just
00:19:16.880 --> 00:19:19.639
confirming the final result. Sometimes some
00:19:19.640 --> 00:19:21.839
post-processing is necessary here, but in the case of
00:19:21.840 --> 00:19:26.039
count, as it is so simple, that is not necessary. And now
00:19:26.040 --> 00:19:29.359
here's the base case. This is also used within that
00:19:29.360 --> 00:19:34.319
supervising transduce function at the very top. Well, if
00:19:34.320 --> 00:19:36.679
you're counting, you have to start from somewhere, right?
00:19:36.680 --> 00:19:37.349
In this case, well, what you're starting with is zero.
00:19:37.350 --> 00:19:40.251
In the case of cons, you'd be starting with an empty list.
00:19:40.252 --> 00:19:44.434
In the case of vector, you'd be starting
00:19:44.435 --> 00:19:53.999
with an empty vector and so on.
00:19:54.000 --> 00:19:56.799
Once again, if you are more curious, please take a look at
00:19:56.800 --> 00:19:57.679
the README.
NOTE Transducers in the wild - CSV
00:20:00.520 --> 00:20:06.039
Okay, transducers in the wild. Well, let's go take a look at
00:20:06.040 --> 00:20:07.639
processing some CSV data.
00:20:07.640 --> 00:20:21.319
We're going to open up a new Emacs Lisp bracket here. So I have
00:20:21.320 --> 00:20:28.839
a file. And in this file, let's just go look at C-x b right
00:20:28.840 --> 00:20:34.839
there, you will see that we've got some bank transaction
00:20:34.840 --> 00:20:37.879
information. It's got these transactions from a whole
00:20:37.880 --> 00:20:40.199
bunch of different people into different accounts,
00:20:40.200 --> 00:20:43.879
whether it's money coming in, money going out, and then a
00:20:43.880 --> 00:20:47.839
basic description. How's your Latin? But for this little
00:20:47.840 --> 00:20:53.679
test, what we want to do is we want to find Bob's final bank
00:20:53.680 --> 00:20:59.679
balance. Let's get on to it. First of all, let's
00:20:59.680 --> 00:21:04.444
just confirm, let's do some basic stuff.
00:21:04.445 --> 00:21:10.844
with-current-buffer, find-file-noselect.
00:21:10.845 --> 00:21:15.542
What's the name of that file?
00:21:15.543 --> 00:21:17.439
This is pre-organized, so you
00:21:17.440 --> 00:21:20.879
will just see it right here.
00:21:20.880 --> 00:21:26.999
t-transduce and t-comp. We don't know what we're going to comp
00:21:27.000 --> 00:21:33.039
yet. Actually, I'll just pass to show you. And then we will
00:21:33.040 --> 00:21:36.999
see, let's just do a little t-count just to confirm. What's
00:21:37.000 --> 00:21:45.112
our source? Well, our source is a buffer, t-buffer-read.
00:21:45.113 --> 00:21:50.153
And note that because we're using with-current-buffer,
00:21:50.154 --> 00:21:55.079
if we go like this, if we go current-buffer, this will just work. So
00:21:55.080 --> 00:21:59.919
now let's... Well, that was odd. I should have done it like
00:21:59.920 --> 00:22:02.159
that. There we go. So now we should make that a little smaller
00:22:02.160 --> 00:22:04.799
so you can see what it is. Now if we hit RET, we should get the
00:22:04.800 --> 00:22:09.559
right result. Okay, so there are 50,001 lines in this file,
00:22:09.560 --> 00:22:13.516
but the one extra one is the name of the headers, right?
00:22:13.517 --> 00:22:18.079
We want to process this file in more detail. So how can we do
00:22:18.080 --> 00:22:22.079
that? Well, let's start by just automatically
00:22:22.080 --> 00:22:28.799
interpreting the results as CSV. If we do that, okay, well
00:22:28.800 --> 00:22:31.559
now we only have 50,000 entries as we expected, right?
00:22:31.560 --> 00:22:36.759
Because it's going to pull out the header line. If we now say
00:22:36.760 --> 00:22:42.679
we want to just filter out, you know, We only want Bob, right?
00:22:42.680 --> 00:22:53.679
So if... gethash, it was in the row of name. Each line here is
00:22:53.680 --> 00:22:57.079
made into, at least by default, is made into a hash map. So if
00:22:57.080 --> 00:23:02.759
we go like this, we should see that. Okay, so 12,000 of these
00:23:02.760 --> 00:23:05.639
lines or thereabout belong to Bob.
00:23:05.640 --> 00:23:13.839
Let's just move that over a little bit. Actually, I suppose we don't even
00:23:13.840 --> 00:23:17.799
need that anymore. I'll just keep that full size for you.
00:23:17.800 --> 00:23:24.399
Okay, so all right, there's about 12,000 results for Bob of
00:23:24.400 --> 00:23:32.479
the 50,000. What's next? Well, we want to confirm,
00:23:32.480 --> 00:23:40.039
we want to pull out everything,
00:23:40.040 --> 00:23:43.079
all of the in and the out entries.
00:23:43.080 --> 00:23:56.279
Thank you. So, string to number, because we know that
00:23:56.280 --> 00:24:01.239
everything came in as strings. Unfortunately, the from-csv
00:24:01.240 --> 00:24:03.799
doesn't try to be smart at all, it's just pulling everything
00:24:03.800 --> 00:24:09.479
in as string values. If you want actual things to be
00:24:09.480 --> 00:24:13.399
numbers or whatever, that is up to you to do the parsing
00:24:13.400 --> 00:24:20.679
yourself. Okay, so we have those two values now. We know
00:24:20.680 --> 00:24:23.879
that we saw from the data just a moment ago that you're only
00:24:23.880 --> 00:24:26.999
going to have a value in one column or the other. It's either
00:24:27.000 --> 00:24:29.119
going to be 0 in the empty one, or you're going to have some
00:24:29.120 --> 00:24:32.159
number in the other. So we know that we can just naively add
00:24:32.160 --> 00:24:35.479
them. If it was in, it would always be positive. So we'll just
00:24:35.480 --> 00:24:41.519
add that. But in the negative case, we want to just make it
00:24:41.520 --> 00:24:45.279
negative really briefly before we add them all together.
00:24:45.280 --> 00:24:50.519
let's now just prove to ourselves that we are sane here. What
00:24:50.520 --> 00:24:52.479
we're going to do is we're going to quickly go say take
00:24:52.480 --> 00:24:57.039
5 just to convince ourselves, and we'll go cons, and let's
00:24:57.040 --> 00:24:59.839
see if we get kind of results that make sense. Okay, these
00:24:59.840 --> 00:25:02.799
sort of make sense. It looks like you know Bob's got some big
00:25:02.800 --> 00:25:07.679
expenses here. If we take say 15, does it look any better?
00:25:07.680 --> 00:25:10.319
Okay, looks like he had a payday. All right, good job Bob.
00:25:10.320 --> 00:25:15.439
Let's get back in there. Now we only really care about
00:25:15.440 --> 00:25:20.119
adding the final result, right? So there we go. Add that all
00:25:20.120 --> 00:25:24.559
together and we'll see what we get in a moment. Okay, wow,
00:25:24.560 --> 00:25:27.519
Bob's rich. Okay, so it looks like in his 12,000
00:25:27.520 --> 00:25:32.279
transaction, Bob has an overall net worth of $8.5 million.
00:25:32.280 --> 00:25:34.439
Looking pretty good.
00:25:34.440 --> 00:25:38.999
So here's an example of how you can, particularly in Emacs
00:25:39.000 --> 00:25:42.959
Lisp, how you can very easily just get a file, consider it the
00:25:42.960 --> 00:25:45.879
current buffer, and then just do whatever you want to it.
00:25:45.880 --> 00:25:50.359
Note that there is sort of first-class support for both CSV
00:25:50.360 --> 00:25:54.359
and JSON, and then you have, and both of those bring in their
00:25:54.360 --> 00:25:57.719
values as hash maps, and then you're just free to do whatever
00:25:57.720 --> 00:26:00.439
you want and process them, potentially both writing them
00:26:00.440 --> 00:26:03.239
back out as CSV or JSON once again.
NOTE Issues and next steps
00:26:03.240 --> 00:26:10.719
Some issues with transducers that can come up is
00:26:10.720 --> 00:26:14.919
that one, a zip operator is missing, but I'm working on it.
00:26:14.920 --> 00:26:19.399
Two is that performance, particularly in Emacs Lisp, isn't
00:26:19.400 --> 00:26:24.119
that great. It could be due to the sort of nested lambda calls
00:26:24.120 --> 00:26:27.759
that have to occur internally, but the common Lisp
00:26:27.760 --> 00:26:32.239
implementation is quite good. and there's yet no support
00:26:32.240 --> 00:26:35.399
for parallelism. You can imagine that a lot of those steps
00:26:35.400 --> 00:26:38.559
you could potentially perform in parallel depending on the
00:26:38.560 --> 00:26:44.399
platform, but research has not yet gotten that far. Okay,
00:26:44.400 --> 00:26:47.639
that's all. Thank you very much. If you have any questions,
00:26:47.640 --> 00:26:51.240
please contact me.
|