summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorEmacsConf <emacsconf-org@gnu.org>2024-12-08 16:43:55 -0500
committerEmacsConf <emacsconf-org@gnu.org>2024-12-08 16:43:55 -0500
commit750b25423ff3a55ef05a7e4611bcab288a2436e4 (patch)
tree6ade1fcc868de0d475c9590bacbba6e9377914e4
parent8c171541ee0e6e1ba39015648cd00d1bcfc5da34 (diff)
downloademacsconf-wiki-750b25423ff3a55ef05a7e4611bcab288a2436e4.tar.xz
emacsconf-wiki-750b25423ff3a55ef05a7e4611bcab288a2436e4.zip
update
Diffstat (limited to '')
-rw-r--r--2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt38
-rw-r--r--2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt1141
2 files changed, 1179 insertions, 0 deletions
diff --git a/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt
new file mode 100644
index 00000000..cf0a6206
--- /dev/null
+++ b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt
@@ -0,0 +1,38 @@
+WEBVTT
+
+
+00:00:00.000 --> 00:00:41.519
+Intro
+
+00:00:41.520 --> 00:03:27.589
+What are transducers?
+
+00:03:27.590 --> 00:05:47.279
+Common issues
+
+00:05:47.280 --> 00:07:35.279
+Transducers
+
+00:07:35.280 --> 00:09:52.624
+Using transducers
+
+00:09:52.625 --> 00:11:49.332
+A more involved example with comp
+
+00:11:49.333 --> 00:14:29.468
+In Emacs
+
+00:14:29.469 --> 00:14:58.039
+Hash tables
+
+00:14:58.040 --> 00:15:55.799
+Clarity
+
+00:15:55.800 --> 00:19:57.679
+How do transducers work?
+
+00:20:00.520 --> 00:26:03.239
+Transducers in the wild - CSV
+
+00:26:03.240 --> 00:26:51.240
+Issues and next steps
diff --git a/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt
new file mode 100644
index 00000000..b0083b86
--- /dev/null
+++ b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt
@@ -0,0 +1,1141 @@
+WEBVTT captioned by sachac
+
+NOTE Intro
+
+00:00:00.000 --> 00:00:10.799
+Hi everyone, this is EmacsConf 2024. I'm Colin, and today
+
+00:00:10.800 --> 00:00:17.319
+I'll be talking about transducers.
+
+00:00:17.320 --> 00:00:21.879
+After introducing them, I'll share a bit of history about
+
+00:00:21.880 --> 00:00:25.359
+transducers and the problems that they solve, some basics
+
+00:00:25.360 --> 00:00:28.879
+about how we can use them, how they work, like how they're
+
+00:00:28.880 --> 00:00:32.399
+implemented, some demonstrations of how we can actually
+
+00:00:32.400 --> 00:00:36.959
+use them in the wild, and then some other discussions about
+
+00:00:36.960 --> 00:00:41.519
+issues that they have.
+
+NOTE What are transducers?
+
+00:00:41.520 --> 00:00:46.399
+Okay, let's get right in. What are transducers?
+
+00:00:46.400 --> 00:00:49.679
+Transducers are a way to do streaming iteration with a
+
+00:00:49.680 --> 00:00:55.679
+modern API.
+
+00:00:55.680 --> 00:01:00.359
+Who are transducers for, and thereby, who is
+
+00:01:00.360 --> 00:01:05.599
+this talk for? Well, it's for people who want to do streamed
+
+00:01:05.600 --> 00:01:10.519
+data processing in Emacs. It's for people who perhaps
+
+00:01:10.520 --> 00:01:14.199
+aren't satisfied with the existing APIs, for example, the
+
+00:01:14.200 --> 00:01:19.359
+seq API, or some other common libraries that provide
+
+00:01:19.360 --> 00:01:23.719
+similar functionality. Maybe you're not a fan of the loop
+
+00:01:23.720 --> 00:01:29.079
+macro. Some people find it difficult to understand. Or
+
+00:01:29.080 --> 00:01:32.719
+maybe you've done a bunch of Clojure before, and you'd like
+
+00:01:32.720 --> 00:01:36.879
+more aspects of Clojure in your Emacs Lisp. Or maybe you're
+
+00:01:36.880 --> 00:01:40.239
+just interested in transducers in general, because the
+
+00:01:40.240 --> 00:01:48.839
+pattern has now been ported to multiple different Lisps.
+
+00:01:48.840 --> 00:01:55.039
+So I'm Colin. I'm fosskers on everything online, and I do
+
+00:01:55.040 --> 00:01:58.519
+mainly back-end programming work and a lot of open source
+
+00:01:58.520 --> 00:02:05.159
+software. I wrote Haskell for a long time, both as a hobbyist
+
+00:02:05.160 --> 00:02:09.079
+and professionally. Since the COVID years, I've been
+
+00:02:09.080 --> 00:02:13.439
+writing Rust, both open source and professionally. But now
+
+00:02:13.440 --> 00:02:19.719
+I find that in my spare time, I'm mostly writing Common Lisp.
+
+00:02:19.720 --> 00:02:22.719
+Some things I learned from my years of Haskell was that a lot
+
+00:02:22.720 --> 00:02:27.519
+of programming is just altering the shape of data. You know,
+
+00:02:27.520 --> 00:02:31.359
+sometimes we work through our algorithm line by line. We're
+
+00:02:31.360 --> 00:02:36.239
+trying to just tell the computer exactly what to do. But if we
+
+00:02:36.240 --> 00:02:39.639
+step back, a lot of the time we're just getting in data of some
+
+00:02:39.640 --> 00:02:44.119
+shape, changing it, and then passing it along. A lot of
+
+00:02:44.120 --> 00:02:49.279
+these patterns are common, identified
+
+00:02:49.280 --> 00:02:53.639
+decades ago. For instance, we have some collection, and we
+
+00:02:53.640 --> 00:02:56.999
+want to transform every element of that collection and then
+
+00:02:57.000 --> 00:03:01.199
+pass it on. Or maybe we're trying to filter out bad elements
+
+00:03:01.200 --> 00:03:04.799
+in that collection. Or maybe we're looking for a specific
+
+00:03:04.800 --> 00:03:07.759
+element in that collection. Yes, you could write all that
+
+00:03:07.760 --> 00:03:11.839
+with for loops, but these kind of common patterns were
+
+00:03:11.840 --> 00:03:18.559
+identified and given names decades ago. So why not use them?
+
+00:03:18.560 --> 00:03:21.879
+They say that there are two major problems in computer
+
+00:03:21.880 --> 00:03:25.759
+science, one being cache validation and the other being
+
+00:03:25.760 --> 00:03:27.589
+naming things.
+
+NOTE Common issues
+
+00:03:27.590 --> 00:03:29.799
+I've identified five other problems that
+
+00:03:29.800 --> 00:03:33.199
+come up when we're trying to deal with collections of data,
+
+00:03:33.200 --> 00:03:40.599
+or big streams of data. One is that if we were trying to
+
+00:03:40.600 --> 00:03:45.279
+load a file all into memory all at once and process the whole
+
+00:03:45.280 --> 00:03:48.279
+thing, sometimes we can have memory problems. You've
+
+00:03:48.280 --> 00:03:54.999
+probably seen out-of-memory errors or such things.
+
+00:03:55.000 --> 00:03:58.199
+A second issue that comes up is that if we were looking at a
+
+00:03:58.200 --> 00:04:01.799
+giant for loop, in particular a nested for loop or such
+
+00:04:01.800 --> 00:04:06.079
+things, it can be hard to tell just by looking at the code what
+
+00:04:06.080 --> 00:04:11.039
+it's trying to do, what it intends. If we don't go character
+
+00:04:11.040 --> 00:04:16.439
+by character or line by line, it can be hard to understand it.
+
+00:04:16.440 --> 00:04:20.039
+Furthermore, and this is particularly an issue with Emacs
+
+00:04:20.040 --> 00:04:26.399
+Lisp, is that if one call, for instance, to seq-map, then
+
+00:04:26.400 --> 00:04:29.319
+piped into seq-filter, for instance, will have an
+
+00:04:29.320 --> 00:04:33.599
+intermediate allocation, the map will take the source
+
+00:04:33.600 --> 00:04:37.639
+container, allocate a new one, and then the filter will
+
+00:04:37.640 --> 00:04:40.319
+operate over the second one. This is wasteful.
+
+00:04:40.320 --> 00:04:48.879
+Furthermore, it can often be difficult to abort a stream.
+
+00:04:48.880 --> 00:04:53.199
+For instance, if we were filtering through our collection,
+
+00:04:53.200 --> 00:04:57.319
+but we knew we only wanted to go halfway, for instance, for
+
+00:04:57.320 --> 00:05:01.759
+some reason, we have no way to stop it halfway through. We
+
+00:05:01.760 --> 00:05:05.479
+just have to process the whole thing, even if we know we don't
+
+00:05:05.480 --> 00:05:11.919
+need to. Another issue is that for languages that have
+
+00:05:11.920 --> 00:05:18.039
+traits, or in Haskell they're called type classes, if you
+
+00:05:18.040 --> 00:05:22.399
+are defining what it means to map over something, you often
+
+00:05:22.400 --> 00:05:27.039
+have to redefine that for every kind of container or thing
+
+00:05:27.040 --> 00:05:31.239
+that you're iterating over. Wouldn't it be nice if we could
+
+00:05:31.240 --> 00:05:34.719
+define things like map just once and then reuse them
+
+00:05:34.720 --> 00:05:39.839
+everywhere? Now, transducers solve all five of these,
+
+00:05:39.840 --> 00:05:44.039
+without the addition of new language features, and with
+
+00:05:44.040 --> 00:05:47.279
+little more than plain old function composition.
+
+NOTE Transducers
+
+00:05:47.280 --> 00:05:53.119
+If this is your first time hearing of transducers, yeah,
+
+00:05:53.120 --> 00:05:57.439
+no problem. They were originally invented in Clojure by
+
+00:05:57.440 --> 00:06:01.039
+Rich Hickey, and this is a quote from him. He thinks
+
+00:06:01.040 --> 00:06:05.439
+transducers are a fundamental primitive that decouple
+
+00:06:05.440 --> 00:06:10.079
+critical logic from list or sequence processing, and if he
+
+00:06:10.080 --> 00:06:13.999
+had to do Clojure all over, he'd put them at the bottom, at the
+
+00:06:14.000 --> 00:06:19.279
+very bottom of all the fundamental primitives. Now, that's
+
+00:06:19.280 --> 00:06:24.599
+Rich speaking quite highly of them. And I think he has a point
+
+00:06:24.600 --> 00:06:25.159
+here.
+
+00:06:25.160 --> 00:06:32.399
+They were invented originally in Clojure. In more
+
+00:06:32.400 --> 00:06:34.772
+recent years, they were brought over to Scheme
+
+00:06:34.773 --> 00:06:38.774
+via SRFI 171. That's where I found them
+
+00:06:38.775 --> 00:06:41.521
+when I was learning the Guile language.
+
+00:06:41.522 --> 00:06:43.919
+In the process of submitting a patch, I realized
+
+00:06:43.920 --> 00:06:48.199
+that there were other things to be improved. So I ported the
+
+00:06:48.200 --> 00:06:51.399
+pattern to Common Lisp, then Fennel, and then more
+
+00:06:51.400 --> 00:06:56.639
+recently, Emacs Lisp. The Common Lisp and Emacs Lisp APIs
+
+00:06:56.640 --> 00:07:01.199
+are identical. And the Fennel one is not identical, but
+
+00:07:01.200 --> 00:07:05.799
+fairly similar. Overall, everywhere you find
+
+00:07:05.800 --> 00:07:10.279
+transducers, they should basically be fairly uniform.
+
+00:07:10.280 --> 00:07:15.759
+When I originally made the Common Lisp variant first, I
+
+00:07:15.760 --> 00:07:18.799
+sampled the APIs from a number of different languages and
+
+00:07:18.800 --> 00:07:23.439
+came up with what I believed to be a representative sample of
+
+00:07:23.440 --> 00:07:27.959
+what most people would want out of such a library. I gave
+
+00:07:27.960 --> 00:07:32.439
+functions their common modern names. For instance, map
+
+00:07:32.440 --> 00:07:35.279
+is map and filter is filter and so on.
+
+NOTE Using transducers
+
+00:07:35.280 --> 00:07:42.599
+What does the usage of transducers look like? Well,
+
+00:07:42.600 --> 00:07:48.959
+these examples will all be the Emacs Lisp variant, but the
+
+00:07:48.960 --> 00:07:52.359
+Common Lisp will look basically exactly the same, minus
+
+00:07:52.360 --> 00:07:54.079
+this little t- prefix.
+
+00:07:54.080 --> 00:08:00.919
+Running transducers requires three things. It requires a
+
+00:08:00.920 --> 00:08:06.439
+source. This could be an obvious thing like a list or a
+
+00:08:06.440 --> 00:08:11.479
+vector, but it could be other things like a file, or in Emacs
+
+00:08:11.480 --> 00:08:16.348
+list in particular, a buffer.
+
+00:08:16.349 --> 00:08:20.112
+A reducer is a function. It's something like
+
+00:08:20.113 --> 00:08:22.639
+the + operator or the * operator,
+
+00:08:22.640 --> 00:08:26.785
+or certain constructors of various containers.
+
+00:08:26.786 --> 00:08:32.125
+It takes values and collates them into some final version.
+
+00:08:32.126 --> 00:08:33.946
+Now, finally, we have what we're calling here
+
+00:08:33.947 --> 00:08:37.567
+a transducer chain. This could be one transducer function
+
+00:08:37.568 --> 00:08:43.479
+or it could be multiple composed together. These are the
+
+00:08:43.480 --> 00:08:47.079
+functions that actually take data and transform them
+
+00:08:47.080 --> 00:08:55.279
+somehow. For instance, this. We have a list of three
+
+00:08:55.280 --> 00:09:04.199
+elements. We want to reduce it into a vector. How we are
+
+00:09:04.200 --> 00:09:07.519
+going to transform the elements along the way: we are doing
+
+00:09:07.520 --> 00:09:13.359
+plus one to each of them. If this syntax is new to you, just
+
+00:09:13.360 --> 00:09:18.039
+know that this #' just means that this thing that
+
+00:09:18.040 --> 00:09:22.079
+comes after it is the name of the function. In Common Lisp and
+
+00:09:22.080 --> 00:09:26.079
+Emacs Lisp, this is necessary, but for Clojure and Scheme,
+
+00:09:26.080 --> 00:09:32.719
+it is not. So we can see here that just this example is not much
+
+00:09:32.720 --> 00:09:36.119
+different than any other normal map call you might see made,
+
+00:09:36.120 --> 00:09:40.239
+but if nothing else, it's a handy way to convert a list to a
+
+00:09:40.240 --> 00:09:44.999
+vector or anything else. There are many, many reducers
+
+00:09:45.000 --> 00:09:48.239
+available and many different forms that we can
+
+00:09:48.240 --> 00:09:52.624
+collate the final value into.
+
+NOTE A more involved example with comp
+
+00:09:52.625 --> 00:09:55.086
+Let's see a more involved example.
+
+00:09:55.087 --> 00:09:58.049
+Okay, now we've got some more meat here.
+
+00:09:58.050 --> 00:10:01.772
+Here we can see usage of the comp function
+
+00:10:01.773 --> 00:10:05.255
+and a custom source, ints.
+
+00:10:05.256 --> 00:10:11.079
+Ints is an infinite generator of integer values. That's not
+
+00:10:11.080 --> 00:10:14.783
+like a list or a file. It will generate infinitely.
+
+00:10:14.784 --> 00:10:19.439
+Comp is letting us compose multiple transducer functions
+
+00:10:19.440 --> 00:10:23.759
+together. Notice that this is the opposite order of what
+
+00:10:23.760 --> 00:10:28.079
+we'd usually be used to from a function like comp. The order
+
+00:10:28.080 --> 00:10:32.679
+here is top to bottom, basically, so that the map goes first,
+
+00:10:32.680 --> 00:10:37.839
+then the filter, and then the take. So effectively is what
+
+00:10:37.840 --> 00:10:40.919
+we're doing is taking all the integers that exist,
+
+00:10:40.920 --> 00:10:45.399
+positive, adding one to them, filtering out only the even
+
+00:10:45.400 --> 00:10:50.039
+ones, but then just taking 10. Cons here is a function that
+
+00:10:50.040 --> 00:10:57.039
+just produces the ending result as a list. So what happens
+
+00:10:57.040 --> 00:11:00.479
+here specifically is how we are avoiding intermediate
+
+00:11:00.480 --> 00:11:04.238
+allocations. First, the number 0 will come through.
+
+00:11:04.239 --> 00:11:07.879
+It will be pulled out of this source internally by transduce.
+
+00:11:07.880 --> 00:11:10.919
+It will make its way into the map. The map will add it. Then it
+
+00:11:10.920 --> 00:11:15.799
+will immediately go into this filter step. So it's not like
+
+00:11:15.800 --> 00:11:19.119
+all the maps occur, and then all the filters occur. We do
+
+00:11:19.120 --> 00:11:24.039
+everything for each element. So the 0 comes in, now it's 1.
+
+00:11:24.040 --> 00:11:27.559
+The filter would occur. Well, it's going to fail that
+
+00:11:27.560 --> 00:11:31.119
+because it's not even, so it will just bail there. Now we'll
+
+00:11:31.120 --> 00:11:35.239
+go to the next one. Now 1 will come, it will become 2, then
+
+00:11:35.240 --> 00:11:39.119
+it will be saved by this evenp call, and then the take will
+
+00:11:39.120 --> 00:11:42.599
+capture it, because we only want 10 values here. You can
+
+00:11:42.600 --> 00:11:45.239
+see 2, 4, 6, 8, and so on is the result that we
+
+00:11:45.240 --> 00:11:49.332
+expect. So let's play around a little bit.
+
+NOTE In Emacs
+
+00:11:49.333 --> 00:11:53.336
+Let's jump into Emacs and see what we can do.
+
+00:11:53.337 --> 00:11:58.500
+Alright, you should see my Emacs screen here.
+
+00:11:58.501 --> 00:12:04.359
+These are the actual notes for the actual
+
+00:12:04.360 --> 00:12:08.959
+presentation done in Org Mode. I'll boost that up in size for
+
+00:12:08.960 --> 00:12:12.639
+a little bit. That should be more than big enough for you.
+
+00:12:12.640 --> 00:12:17.719
+Just by changing the reducer, we can change the result.
+
+00:12:17.720 --> 00:12:21.079
+Okay, now it's a vector. Well, what else can we do to it? Well,
+
+00:12:21.080 --> 00:12:25.959
+let's just add up the results. Maybe we just want to count the
+
+00:12:25.960 --> 00:12:30.919
+results. Oh, indeed, there were 10. What if we want to find
+
+00:12:30.920 --> 00:12:36.959
+the average of the results? What if we want to find the median
+
+00:12:36.960 --> 00:12:40.959
+of the results? And so on. Here's some more interesting
+
+00:12:40.960 --> 00:12:45.839
+things that we could do. We could add different steps. So
+
+00:12:45.840 --> 00:12:51.239
+here we have all the integers. Let's add, hmm, okay, we'll
+
+00:12:51.240 --> 00:12:57.399
+keep that. We're going to add t-enumerate. What enumerate does
+
+00:12:57.400 --> 00:13:00.879
+is for each item that comes through, it is
+
+00:13:00.880 --> 00:13:06.039
+going to add a sort of index to it and make it a pair. In this
+
+00:13:06.040 --> 00:13:08.719
+case, it's going to be equal to what came in here. Well, we can
+
+00:13:08.720 --> 00:13:12.399
+change it. If we start this at 1, now it will be different.
+
+00:13:12.400 --> 00:13:15.519
+1 will be paired with 0, and then 2 would be paired
+
+00:13:15.520 --> 00:13:19.559
+with 1, and so on. We'll accept that the even call will change
+
+00:13:19.560 --> 00:13:24.039
+that a little bit. Why we're doing this is because we want
+
+00:13:24.040 --> 00:13:27.279
+to form a hash table. Let's move that down to 3, maybe
+
+00:13:27.280 --> 00:13:31.439
+we'll get a better result. What do we see? Okay, here now the
+
+00:13:31.440 --> 00:13:37.359
+result is a hash table. What are its values? Well, 0 seems
+
+00:13:37.360 --> 00:13:40.479
+to have... The key of 0 seems to be paired with 2, the key of
+
+00:13:40.480 --> 00:13:42.909
+1 seems to be paired with 4,
+
+00:13:42.910 --> 00:13:47.411
+and 2 seems to be paired with 6.
+
+00:13:47.412 --> 00:13:51.293
+Maybe let's jazz that up even a little bit more.
+
+00:13:51.294 --> 00:13:52.973
+We're going to start from a string
+
+00:13:52.974 --> 00:13:57.943
+and we'll call it hello.
+
+00:13:57.944 --> 00:13:59.564
+That's not going to work anymore
+
+00:13:59.565 --> 00:14:02.585
+and neither is that, but what we could do is
+
+00:14:02.586 --> 00:14:05.498
+we could say t-map #'string.
+
+00:14:05.499 --> 00:14:08.627
+I believe we'll do that.
+
+00:14:08.628 --> 00:14:08.959
+Let's see if that works. It did. So that's
+
+00:14:08.960 --> 00:14:13.589
+going to convert a character into a string.
+
+00:14:13.590 --> 00:14:14.679
+Let's just go two
+
+00:14:14.680 --> 00:14:18.399
+just to make it a little easier. Now you can see that we've
+
+00:14:18.400 --> 00:14:21.919
+constructed a hash table here. The key of 0 is mapped to the
+
+00:14:21.920 --> 00:14:27.079
+string of h and 1 is mapped to e. Now, I really like having
+
+00:14:27.080 --> 00:14:29.468
+this reducer in particular.
+
+NOTE Hash tables
+
+00:14:29.469 --> 00:14:30.639
+Know that hash tables are
+
+00:14:30.640 --> 00:14:34.199
+also legal sources. I find that both in Emacs Lisp and in
+
+00:14:34.200 --> 00:14:37.119
+Common Lisp, dealing with hash tables--like creating them
+
+00:14:37.120 --> 00:14:41.599
+and altering them--can be a bit of a pain. Having them
+
+00:14:41.600 --> 00:14:45.679
+immediately available like this with transducers is very
+
+00:14:45.680 --> 00:14:49.079
+handy, I find. We can work with something that wasn't a hash
+
+00:14:49.080 --> 00:14:53.279
+table. We can construct it in a way that makes it amenable to
+
+00:14:53.280 --> 00:14:56.199
+that, and then reduce it down into a hash table, and here you
+
+00:14:56.200 --> 00:14:58.039
+go. Very handy.
+
+NOTE Clarity
+
+00:14:58.040 --> 00:15:06.399
+One last point is that you can see very clearly what
+
+00:15:06.400 --> 00:15:10.479
+this is attempting to do, as opposed to, say, a for loop. It's
+
+00:15:10.480 --> 00:15:12.719
+very clear what that step is doing, and then you can see what
+
+00:15:12.720 --> 00:15:15.119
+that is doing, and you know that the result is going to be two.
+
+00:15:15.120 --> 00:15:18.559
+Each line is kind of its own declarative step, and it should
+
+00:15:18.560 --> 00:15:22.159
+be clear, just by staring at this, basically what you're
+
+00:15:22.160 --> 00:15:25.399
+going to get out. This is one main difference from other
+
+00:15:25.400 --> 00:15:29.599
+languages that have things--say, for instance, Rust's
+
+00:15:29.600 --> 00:15:35.439
+iterator API--is the difference between the transducers
+
+00:15:35.440 --> 00:15:41.639
+and the reducers. If we go up here, for example, the
+
+00:15:41.640 --> 00:15:44.679
+difference between the transducers and the reducers and
+
+00:15:44.680 --> 00:15:48.119
+the sources is not explicitly laid out, whereas with
+
+00:15:48.120 --> 00:15:53.119
+transducers, it is. You have to be aware of how these things
+
+00:15:53.120 --> 00:15:55.799
+are different. I think that that helps clarity.
+
+NOTE How do transducers work?
+
+00:15:55.800 --> 00:16:01.999
+Moving on. How do transducers work? Well,
+
+00:16:02.000 --> 00:16:09.857
+we want to go see the README.
+
+00:16:09.858 --> 00:16:11.399
+So, what we're going to do is
+
+00:16:11.400 --> 00:16:19.102
+we're going to go to here.
+
+00:16:19.103 --> 00:16:21.959
+You should still be able to see this.
+
+00:16:21.960 --> 00:16:28.583
+This is the CL example, actually.
+
+00:16:28.584 --> 00:16:32.279
+Let's go to transducers.el.
+
+00:16:32.280 --> 00:16:37.744
+Their APIs and READMEs are the same,
+
+00:16:37.745 --> 00:16:39.919
+but just for the sake of it, we will go see
+
+00:16:39.920 --> 00:16:45.726
+how this looks on the Emacs side,
+
+00:16:45.727 --> 00:16:48.046
+just so that nothing is a surprise.
+
+00:16:48.047 --> 00:16:50.239
+But recall that the APIs are essentially the same
+
+00:16:50.240 --> 00:16:53.679
+between the two. If you go to this section, writing your
+
+00:16:53.680 --> 00:16:56.839
+own primitives, you can read about how transducers are
+
+00:16:56.840 --> 00:17:00.999
+actually formed, whether or not you want to write them
+
+00:17:01.000 --> 00:17:06.799
+yourself or not. We can see here t-map. We accept the
+
+00:17:06.800 --> 00:17:10.239
+function that you want to operate with. Then you've got
+
+00:17:10.240 --> 00:17:13.319
+this extra little lambda here that's coming in, and it's
+
+00:17:13.320 --> 00:17:17.079
+receiving a thing that is named reducer. Now, while here
+
+00:17:17.080 --> 00:17:20.439
+we're calling it reducer, it's actually the chain of all the
+
+00:17:20.440 --> 00:17:25.159
+composed functions together. It's all those main
+
+00:17:25.160 --> 00:17:28.479
+transducer steps. Finally, it's the reducer all
+
+00:17:28.480 --> 00:17:31.879
+composed together with normal function composition.
+
+00:17:31.880 --> 00:17:35.877
+That will matter very soon. Now here's the actual meat.
+
+00:17:35.878 --> 00:17:40.519
+We can see the accumulative result that's coming in with the
+
+00:17:40.520 --> 00:17:45.739
+current element. Now we need to operate on this.
+
+00:17:45.740 --> 00:17:47.840
+Were it normally mapped, we would see us
+
+00:17:47.841 --> 00:17:49.919
+applying the F to the input.
+
+00:17:49.920 --> 00:17:53.519
+But here, you can see us applying the F to the input and then
+
+00:17:53.520 --> 00:17:58.679
+continuing on. So us calling the rest of the composed chain
+
+00:17:58.680 --> 00:18:03.159
+here is the effect of, in the previous slide, moving to the
+
+00:18:03.160 --> 00:18:07.156
+next step. We could ignore this line for now.
+
+00:18:07.157 --> 00:18:13.819
+If you're curious, please read the README in detail.
+
+00:18:13.820 --> 00:18:15.579
+Now, what about reducers?
+
+00:18:15.580 --> 00:18:18.879
+What do those look like? Well, let's just scroll
+
+00:18:18.880 --> 00:18:22.439
+down here. Recall that a reducer is a function that's
+
+00:18:22.440 --> 00:18:26.959
+consuming a stream, right? Zoom that up for you a little bit.
+
+00:18:26.960 --> 00:18:33.919
+Now, in the case of count, recall that this is how it's
+
+00:18:33.920 --> 00:18:37.679
+working, how we saw a moment ago. So clearly this list of five
+
+00:18:37.680 --> 00:18:42.199
+elements only has five things in it. Well, a reducer by
+
+00:18:42.200 --> 00:18:47.599
+structure is a function of two, one, or zero arguments. So we
+
+00:18:47.600 --> 00:18:50.639
+can see here in the case of two, this is the normal iterative
+
+00:18:50.640 --> 00:18:54.519
+case. We don't care about the input for count, we just care
+
+00:18:54.520 --> 00:18:58.559
+about the current accumulated count that we're doing, and
+
+00:18:58.560 --> 00:19:02.879
+we add one to it, and that's it. This then goes back to
+
+00:19:02.880 --> 00:19:06.359
+the loop and the whole process starts again with the next
+
+00:19:06.360 --> 00:19:10.879
+element. In this kind of done case, this is used internal to
+
+00:19:10.880 --> 00:19:16.879
+that sort of the supervising function transduce. It's just
+
+00:19:16.880 --> 00:19:19.639
+confirming the final result. Sometimes some
+
+00:19:19.640 --> 00:19:21.839
+post-processing is necessary here, but in the case of
+
+00:19:21.840 --> 00:19:26.039
+count, as it is so simple, that is not necessary. And now
+
+00:19:26.040 --> 00:19:29.359
+here's the base case. This is also used within that
+
+00:19:29.360 --> 00:19:34.319
+supervising transduce function at the very top. Well, if
+
+00:19:34.320 --> 00:19:36.679
+you're counting, you have to start from somewhere, right?
+
+00:19:36.680 --> 00:19:37.349
+In this case, well, what you're starting with is zero.
+
+00:19:37.350 --> 00:19:40.251
+In the case of cons, you'd be starting with an empty list.
+
+00:19:40.252 --> 00:19:44.434
+In the case of vector, you'd be starting
+
+00:19:44.435 --> 00:19:53.999
+with an empty vector and so on.
+
+00:19:54.000 --> 00:19:56.799
+Once again, if you are more curious, please take a look at
+
+00:19:56.800 --> 00:19:57.679
+the README.
+
+NOTE Transducers in the wild - CSV
+
+00:20:00.520 --> 00:20:06.039
+Okay, transducers in the wild. Well, let's go take a look at
+
+00:20:06.040 --> 00:20:07.639
+processing some CSV data.
+
+00:20:07.640 --> 00:20:21.319
+We're going to open up a new Emacs Lisp bracket here. So I have
+
+00:20:21.320 --> 00:20:28.839
+a file. And in this file, let's just go look at C-x b right
+
+00:20:28.840 --> 00:20:34.839
+there, you will see that we've got some bank transaction
+
+00:20:34.840 --> 00:20:37.879
+information. It's got these transactions from a whole
+
+00:20:37.880 --> 00:20:40.199
+bunch of different people into different accounts,
+
+00:20:40.200 --> 00:20:43.879
+whether it's money coming in, money going out, and then a
+
+00:20:43.880 --> 00:20:47.839
+basic description. How's your Latin? But for this little
+
+00:20:47.840 --> 00:20:53.679
+test, what we want to do is we want to find Bob's final bank
+
+00:20:53.680 --> 00:20:59.679
+balance. Let's get on to it. First of all, let's
+
+00:20:59.680 --> 00:21:04.444
+just confirm, let's do some basic stuff.
+
+00:21:04.445 --> 00:21:10.844
+with-current-buffer, find-file-noselect.
+
+00:21:10.845 --> 00:21:15.542
+What's the name of that file?
+
+00:21:15.543 --> 00:21:17.439
+This is pre-organized, so you
+
+00:21:17.440 --> 00:21:20.879
+will just see it right here.
+
+00:21:20.880 --> 00:21:26.999
+t-transduce and t-comp. We don't know what we're going to comp
+
+00:21:27.000 --> 00:21:33.039
+yet. Actually, I'll just pass to show you. And then we will
+
+00:21:33.040 --> 00:21:36.999
+see, let's just do a little t-count just to confirm. What's
+
+00:21:37.000 --> 00:21:45.112
+our source? Well, our source is a buffer, t-buffer-read.
+
+00:21:45.113 --> 00:21:50.153
+And note that because we're using with-current-buffer,
+
+00:21:50.154 --> 00:21:55.079
+if we go like this, if we go current-buffer, this will just work. So
+
+00:21:55.080 --> 00:21:59.919
+now let's... Well, that was odd. I should have done it like
+
+00:21:59.920 --> 00:22:02.159
+that. There we go. So now we should make that a little smaller
+
+00:22:02.160 --> 00:22:04.799
+so you can see what it is. Now if we hit RET, we should get the
+
+00:22:04.800 --> 00:22:09.559
+right result. Okay, so there are 50,001 lines in this file,
+
+00:22:09.560 --> 00:22:13.516
+but the one extra one is the name of the headers, right?
+
+00:22:13.517 --> 00:22:18.079
+We want to process this file in more detail. So how can we do
+
+00:22:18.080 --> 00:22:22.079
+that? Well, let's start by just automatically
+
+00:22:22.080 --> 00:22:28.799
+interpreting the results as CSV. If we do that, okay, well
+
+00:22:28.800 --> 00:22:31.559
+now we only have 50,000 entries as we expected, right?
+
+00:22:31.560 --> 00:22:36.759
+Because it's going to pull out the header line. If we now say
+
+00:22:36.760 --> 00:22:42.679
+we want to just filter out, you know, We only want Bob, right?
+
+00:22:42.680 --> 00:22:53.679
+So if... gethash, it was in the row of name. Each line here is
+
+00:22:53.680 --> 00:22:57.079
+made into, at least by default, is made into a hash map. So if
+
+00:22:57.080 --> 00:23:02.759
+we go like this, we should see that. Okay, so 12,000 of these
+
+00:23:02.760 --> 00:23:05.639
+lines or thereabout belong to Bob.
+
+00:23:05.640 --> 00:23:13.839
+Let's just move that over a little bit. Actually, I suppose we don't even
+
+00:23:13.840 --> 00:23:17.799
+need that anymore. I'll just keep that full size for you.
+
+00:23:17.800 --> 00:23:24.399
+Okay, so all right, there's about 12,000 results for Bob of
+
+00:23:24.400 --> 00:23:32.479
+the 50,000. What's next? Well, we want to confirm,
+
+00:23:32.480 --> 00:23:40.039
+we want to pull out everything,
+
+00:23:40.040 --> 00:23:43.079
+all of the in and the out entries.
+
+00:23:43.080 --> 00:23:56.279
+Thank you. So, string to number, because we know that
+
+00:23:56.280 --> 00:24:01.239
+everything came in as strings. Unfortunately, the from-csv
+
+00:24:01.240 --> 00:24:03.799
+doesn't try to be smart at all, it's just pulling everything
+
+00:24:03.800 --> 00:24:09.479
+in as string values. If you want actual things to be
+
+00:24:09.480 --> 00:24:13.399
+numbers or whatever, that is up to you to do the parsing
+
+00:24:13.400 --> 00:24:20.679
+yourself. Okay, so we have those two values now. We know
+
+00:24:20.680 --> 00:24:23.879
+that we saw from the data just a moment ago that you're only
+
+00:24:23.880 --> 00:24:26.999
+going to have a value in one column or the other. It's either
+
+00:24:27.000 --> 00:24:29.119
+going to be 0 in the empty one, or you're going to have some
+
+00:24:29.120 --> 00:24:32.159
+number in the other. So we know that we can just naively add
+
+00:24:32.160 --> 00:24:35.479
+them. If it was in, it would always be positive. So we'll just
+
+00:24:35.480 --> 00:24:41.519
+add that. But in the negative case, we want to just make it
+
+00:24:41.520 --> 00:24:45.279
+negative really briefly before we add them all together.
+
+00:24:45.280 --> 00:24:50.519
+let's now just prove to ourselves that we are sane here. What
+
+00:24:50.520 --> 00:24:52.479
+we're going to do is we're going to quickly go say take
+
+00:24:52.480 --> 00:24:57.039
+5 just to convince ourselves, and we'll go cons, and let's
+
+00:24:57.040 --> 00:24:59.839
+see if we get kind of results that make sense. Okay, these
+
+00:24:59.840 --> 00:25:02.799
+sort of make sense. It looks like you know Bob's got some big
+
+00:25:02.800 --> 00:25:07.679
+expenses here. If we take say 15, does it look any better?
+
+00:25:07.680 --> 00:25:10.319
+Okay, looks like he had a payday. All right, good job Bob.
+
+00:25:10.320 --> 00:25:15.439
+Let's get back in there. Now we only really care about
+
+00:25:15.440 --> 00:25:20.119
+adding the final result, right? So there we go. Add that all
+
+00:25:20.120 --> 00:25:24.559
+together and we'll see what we get in a moment. Okay, wow,
+
+00:25:24.560 --> 00:25:27.519
+Bob's rich. Okay, so it looks like in his 12,000
+
+00:25:27.520 --> 00:25:32.279
+transaction, Bob has an overall net worth of $8.5 million.
+
+00:25:32.280 --> 00:25:34.439
+Looking pretty good.
+
+00:25:34.440 --> 00:25:38.999
+So here's an example of how you can, particularly in Emacs
+
+00:25:39.000 --> 00:25:42.959
+Lisp, how you can very easily just get a file, consider it the
+
+00:25:42.960 --> 00:25:45.879
+current buffer, and then just do whatever you want to it.
+
+00:25:45.880 --> 00:25:50.359
+Note that there is sort of first-class support for both CSV
+
+00:25:50.360 --> 00:25:54.359
+and JSON, and then you have, and both of those bring in their
+
+00:25:54.360 --> 00:25:57.719
+values as hash maps, and then you're just free to do whatever
+
+00:25:57.720 --> 00:26:00.439
+you want and process them, potentially both writing them
+
+00:26:00.440 --> 00:26:03.239
+back out as CSV or JSON once again.
+
+NOTE Issues and next steps
+
+00:26:03.240 --> 00:26:10.719
+Some issues with transducers that can come up is
+
+00:26:10.720 --> 00:26:14.919
+that one, a zip operator is missing, but I'm working on it.
+
+00:26:14.920 --> 00:26:19.399
+Two is that performance, particularly in Emacs Lisp, isn't
+
+00:26:19.400 --> 00:26:24.119
+that great. It could be due to the sort of nested lambda calls
+
+00:26:24.120 --> 00:26:27.759
+that have to occur internally, but the common Lisp
+
+00:26:27.760 --> 00:26:32.239
+implementation is quite good. and there's yet no support
+
+00:26:32.240 --> 00:26:35.399
+for parallelism. You can imagine that a lot of those steps
+
+00:26:35.400 --> 00:26:38.559
+you could potentially perform in parallel depending on the
+
+00:26:38.560 --> 00:26:44.399
+platform, but research has not yet gotten that far. Okay,
+
+00:26:44.400 --> 00:26:47.639
+that's all. Thank you very much. If you have any questions,
+
+00:26:47.640 --> 00:26:51.240
+please contact me.