diff options
author | EmacsConf <emacsconf-org@gnu.org> | 2024-12-08 16:43:55 -0500 |
---|---|---|
committer | EmacsConf <emacsconf-org@gnu.org> | 2024-12-08 16:43:55 -0500 |
commit | 750b25423ff3a55ef05a7e4611bcab288a2436e4 (patch) | |
tree | 6ade1fcc868de0d475c9590bacbba6e9377914e4 | |
parent | 8c171541ee0e6e1ba39015648cd00d1bcfc5da34 (diff) | |
download | emacsconf-wiki-750b25423ff3a55ef05a7e4611bcab288a2436e4.tar.xz emacsconf-wiki-750b25423ff3a55ef05a7e4611bcab288a2436e4.zip |
update
Diffstat (limited to '')
2 files changed, 1179 insertions, 0 deletions
diff --git a/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt new file mode 100644 index 00000000..cf0a6206 --- /dev/null +++ b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main--chapters.vtt @@ -0,0 +1,38 @@ +WEBVTT + + +00:00:00.000 --> 00:00:41.519 +Intro + +00:00:41.520 --> 00:03:27.589 +What are transducers? + +00:03:27.590 --> 00:05:47.279 +Common issues + +00:05:47.280 --> 00:07:35.279 +Transducers + +00:07:35.280 --> 00:09:52.624 +Using transducers + +00:09:52.625 --> 00:11:49.332 +A more involved example with comp + +00:11:49.333 --> 00:14:29.468 +In Emacs + +00:14:29.469 --> 00:14:58.039 +Hash tables + +00:14:58.040 --> 00:15:55.799 +Clarity + +00:15:55.800 --> 00:19:57.679 +How do transducers work? + +00:20:00.520 --> 00:26:03.239 +Transducers in the wild - CSV + +00:26:03.240 --> 00:26:51.240 +Issues and next steps diff --git a/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt new file mode 100644 index 00000000..b0083b86 --- /dev/null +++ b/2024/captions/emacsconf-2024-transducers--transducers-finally-ergonomic-data-processing-for-emacs--colin-woodbury--main.vtt @@ -0,0 +1,1141 @@ +WEBVTT captioned by sachac + +NOTE Intro + +00:00:00.000 --> 00:00:10.799 +Hi everyone, this is EmacsConf 2024. I'm Colin, and today + +00:00:10.800 --> 00:00:17.319 +I'll be talking about transducers. + +00:00:17.320 --> 00:00:21.879 +After introducing them, I'll share a bit of history about + +00:00:21.880 --> 00:00:25.359 +transducers and the problems that they solve, some basics + +00:00:25.360 --> 00:00:28.879 +about how we can use them, how they work, like how they're + +00:00:28.880 --> 00:00:32.399 +implemented, some demonstrations of how we can actually + +00:00:32.400 --> 00:00:36.959 +use them in the wild, and then some other discussions about + +00:00:36.960 --> 00:00:41.519 +issues that they have. + +NOTE What are transducers? + +00:00:41.520 --> 00:00:46.399 +Okay, let's get right in. What are transducers? + +00:00:46.400 --> 00:00:49.679 +Transducers are a way to do streaming iteration with a + +00:00:49.680 --> 00:00:55.679 +modern API. + +00:00:55.680 --> 00:01:00.359 +Who are transducers for, and thereby, who is + +00:01:00.360 --> 00:01:05.599 +this talk for? Well, it's for people who want to do streamed + +00:01:05.600 --> 00:01:10.519 +data processing in Emacs. It's for people who perhaps + +00:01:10.520 --> 00:01:14.199 +aren't satisfied with the existing APIs, for example, the + +00:01:14.200 --> 00:01:19.359 +seq API, or some other common libraries that provide + +00:01:19.360 --> 00:01:23.719 +similar functionality. Maybe you're not a fan of the loop + +00:01:23.720 --> 00:01:29.079 +macro. Some people find it difficult to understand. Or + +00:01:29.080 --> 00:01:32.719 +maybe you've done a bunch of Clojure before, and you'd like + +00:01:32.720 --> 00:01:36.879 +more aspects of Clojure in your Emacs Lisp. Or maybe you're + +00:01:36.880 --> 00:01:40.239 +just interested in transducers in general, because the + +00:01:40.240 --> 00:01:48.839 +pattern has now been ported to multiple different Lisps. + +00:01:48.840 --> 00:01:55.039 +So I'm Colin. I'm fosskers on everything online, and I do + +00:01:55.040 --> 00:01:58.519 +mainly back-end programming work and a lot of open source + +00:01:58.520 --> 00:02:05.159 +software. I wrote Haskell for a long time, both as a hobbyist + +00:02:05.160 --> 00:02:09.079 +and professionally. Since the COVID years, I've been + +00:02:09.080 --> 00:02:13.439 +writing Rust, both open source and professionally. But now + +00:02:13.440 --> 00:02:19.719 +I find that in my spare time, I'm mostly writing Common Lisp. + +00:02:19.720 --> 00:02:22.719 +Some things I learned from my years of Haskell was that a lot + +00:02:22.720 --> 00:02:27.519 +of programming is just altering the shape of data. You know, + +00:02:27.520 --> 00:02:31.359 +sometimes we work through our algorithm line by line. We're + +00:02:31.360 --> 00:02:36.239 +trying to just tell the computer exactly what to do. But if we + +00:02:36.240 --> 00:02:39.639 +step back, a lot of the time we're just getting in data of some + +00:02:39.640 --> 00:02:44.119 +shape, changing it, and then passing it along. A lot of + +00:02:44.120 --> 00:02:49.279 +these patterns are common, identified + +00:02:49.280 --> 00:02:53.639 +decades ago. For instance, we have some collection, and we + +00:02:53.640 --> 00:02:56.999 +want to transform every element of that collection and then + +00:02:57.000 --> 00:03:01.199 +pass it on. Or maybe we're trying to filter out bad elements + +00:03:01.200 --> 00:03:04.799 +in that collection. Or maybe we're looking for a specific + +00:03:04.800 --> 00:03:07.759 +element in that collection. Yes, you could write all that + +00:03:07.760 --> 00:03:11.839 +with for loops, but these kind of common patterns were + +00:03:11.840 --> 00:03:18.559 +identified and given names decades ago. So why not use them? + +00:03:18.560 --> 00:03:21.879 +They say that there are two major problems in computer + +00:03:21.880 --> 00:03:25.759 +science, one being cache validation and the other being + +00:03:25.760 --> 00:03:27.589 +naming things. + +NOTE Common issues + +00:03:27.590 --> 00:03:29.799 +I've identified five other problems that + +00:03:29.800 --> 00:03:33.199 +come up when we're trying to deal with collections of data, + +00:03:33.200 --> 00:03:40.599 +or big streams of data. One is that if we were trying to + +00:03:40.600 --> 00:03:45.279 +load a file all into memory all at once and process the whole + +00:03:45.280 --> 00:03:48.279 +thing, sometimes we can have memory problems. You've + +00:03:48.280 --> 00:03:54.999 +probably seen out-of-memory errors or such things. + +00:03:55.000 --> 00:03:58.199 +A second issue that comes up is that if we were looking at a + +00:03:58.200 --> 00:04:01.799 +giant for loop, in particular a nested for loop or such + +00:04:01.800 --> 00:04:06.079 +things, it can be hard to tell just by looking at the code what + +00:04:06.080 --> 00:04:11.039 +it's trying to do, what it intends. If we don't go character + +00:04:11.040 --> 00:04:16.439 +by character or line by line, it can be hard to understand it. + +00:04:16.440 --> 00:04:20.039 +Furthermore, and this is particularly an issue with Emacs + +00:04:20.040 --> 00:04:26.399 +Lisp, is that if one call, for instance, to seq-map, then + +00:04:26.400 --> 00:04:29.319 +piped into seq-filter, for instance, will have an + +00:04:29.320 --> 00:04:33.599 +intermediate allocation, the map will take the source + +00:04:33.600 --> 00:04:37.639 +container, allocate a new one, and then the filter will + +00:04:37.640 --> 00:04:40.319 +operate over the second one. This is wasteful. + +00:04:40.320 --> 00:04:48.879 +Furthermore, it can often be difficult to abort a stream. + +00:04:48.880 --> 00:04:53.199 +For instance, if we were filtering through our collection, + +00:04:53.200 --> 00:04:57.319 +but we knew we only wanted to go halfway, for instance, for + +00:04:57.320 --> 00:05:01.759 +some reason, we have no way to stop it halfway through. We + +00:05:01.760 --> 00:05:05.479 +just have to process the whole thing, even if we know we don't + +00:05:05.480 --> 00:05:11.919 +need to. Another issue is that for languages that have + +00:05:11.920 --> 00:05:18.039 +traits, or in Haskell they're called type classes, if you + +00:05:18.040 --> 00:05:22.399 +are defining what it means to map over something, you often + +00:05:22.400 --> 00:05:27.039 +have to redefine that for every kind of container or thing + +00:05:27.040 --> 00:05:31.239 +that you're iterating over. Wouldn't it be nice if we could + +00:05:31.240 --> 00:05:34.719 +define things like map just once and then reuse them + +00:05:34.720 --> 00:05:39.839 +everywhere? Now, transducers solve all five of these, + +00:05:39.840 --> 00:05:44.039 +without the addition of new language features, and with + +00:05:44.040 --> 00:05:47.279 +little more than plain old function composition. + +NOTE Transducers + +00:05:47.280 --> 00:05:53.119 +If this is your first time hearing of transducers, yeah, + +00:05:53.120 --> 00:05:57.439 +no problem. They were originally invented in Clojure by + +00:05:57.440 --> 00:06:01.039 +Rich Hickey, and this is a quote from him. He thinks + +00:06:01.040 --> 00:06:05.439 +transducers are a fundamental primitive that decouple + +00:06:05.440 --> 00:06:10.079 +critical logic from list or sequence processing, and if he + +00:06:10.080 --> 00:06:13.999 +had to do Clojure all over, he'd put them at the bottom, at the + +00:06:14.000 --> 00:06:19.279 +very bottom of all the fundamental primitives. Now, that's + +00:06:19.280 --> 00:06:24.599 +Rich speaking quite highly of them. And I think he has a point + +00:06:24.600 --> 00:06:25.159 +here. + +00:06:25.160 --> 00:06:32.399 +They were invented originally in Clojure. In more + +00:06:32.400 --> 00:06:34.772 +recent years, they were brought over to Scheme + +00:06:34.773 --> 00:06:38.774 +via SRFI 171. That's where I found them + +00:06:38.775 --> 00:06:41.521 +when I was learning the Guile language. + +00:06:41.522 --> 00:06:43.919 +In the process of submitting a patch, I realized + +00:06:43.920 --> 00:06:48.199 +that there were other things to be improved. So I ported the + +00:06:48.200 --> 00:06:51.399 +pattern to Common Lisp, then Fennel, and then more + +00:06:51.400 --> 00:06:56.639 +recently, Emacs Lisp. The Common Lisp and Emacs Lisp APIs + +00:06:56.640 --> 00:07:01.199 +are identical. And the Fennel one is not identical, but + +00:07:01.200 --> 00:07:05.799 +fairly similar. Overall, everywhere you find + +00:07:05.800 --> 00:07:10.279 +transducers, they should basically be fairly uniform. + +00:07:10.280 --> 00:07:15.759 +When I originally made the Common Lisp variant first, I + +00:07:15.760 --> 00:07:18.799 +sampled the APIs from a number of different languages and + +00:07:18.800 --> 00:07:23.439 +came up with what I believed to be a representative sample of + +00:07:23.440 --> 00:07:27.959 +what most people would want out of such a library. I gave + +00:07:27.960 --> 00:07:32.439 +functions their common modern names. For instance, map + +00:07:32.440 --> 00:07:35.279 +is map and filter is filter and so on. + +NOTE Using transducers + +00:07:35.280 --> 00:07:42.599 +What does the usage of transducers look like? Well, + +00:07:42.600 --> 00:07:48.959 +these examples will all be the Emacs Lisp variant, but the + +00:07:48.960 --> 00:07:52.359 +Common Lisp will look basically exactly the same, minus + +00:07:52.360 --> 00:07:54.079 +this little t- prefix. + +00:07:54.080 --> 00:08:00.919 +Running transducers requires three things. It requires a + +00:08:00.920 --> 00:08:06.439 +source. This could be an obvious thing like a list or a + +00:08:06.440 --> 00:08:11.479 +vector, but it could be other things like a file, or in Emacs + +00:08:11.480 --> 00:08:16.348 +list in particular, a buffer. + +00:08:16.349 --> 00:08:20.112 +A reducer is a function. It's something like + +00:08:20.113 --> 00:08:22.639 +the + operator or the * operator, + +00:08:22.640 --> 00:08:26.785 +or certain constructors of various containers. + +00:08:26.786 --> 00:08:32.125 +It takes values and collates them into some final version. + +00:08:32.126 --> 00:08:33.946 +Now, finally, we have what we're calling here + +00:08:33.947 --> 00:08:37.567 +a transducer chain. This could be one transducer function + +00:08:37.568 --> 00:08:43.479 +or it could be multiple composed together. These are the + +00:08:43.480 --> 00:08:47.079 +functions that actually take data and transform them + +00:08:47.080 --> 00:08:55.279 +somehow. For instance, this. We have a list of three + +00:08:55.280 --> 00:09:04.199 +elements. We want to reduce it into a vector. How we are + +00:09:04.200 --> 00:09:07.519 +going to transform the elements along the way: we are doing + +00:09:07.520 --> 00:09:13.359 +plus one to each of them. If this syntax is new to you, just + +00:09:13.360 --> 00:09:18.039 +know that this #' just means that this thing that + +00:09:18.040 --> 00:09:22.079 +comes after it is the name of the function. In Common Lisp and + +00:09:22.080 --> 00:09:26.079 +Emacs Lisp, this is necessary, but for Clojure and Scheme, + +00:09:26.080 --> 00:09:32.719 +it is not. So we can see here that just this example is not much + +00:09:32.720 --> 00:09:36.119 +different than any other normal map call you might see made, + +00:09:36.120 --> 00:09:40.239 +but if nothing else, it's a handy way to convert a list to a + +00:09:40.240 --> 00:09:44.999 +vector or anything else. There are many, many reducers + +00:09:45.000 --> 00:09:48.239 +available and many different forms that we can + +00:09:48.240 --> 00:09:52.624 +collate the final value into. + +NOTE A more involved example with comp + +00:09:52.625 --> 00:09:55.086 +Let's see a more involved example. + +00:09:55.087 --> 00:09:58.049 +Okay, now we've got some more meat here. + +00:09:58.050 --> 00:10:01.772 +Here we can see usage of the comp function + +00:10:01.773 --> 00:10:05.255 +and a custom source, ints. + +00:10:05.256 --> 00:10:11.079 +Ints is an infinite generator of integer values. That's not + +00:10:11.080 --> 00:10:14.783 +like a list or a file. It will generate infinitely. + +00:10:14.784 --> 00:10:19.439 +Comp is letting us compose multiple transducer functions + +00:10:19.440 --> 00:10:23.759 +together. Notice that this is the opposite order of what + +00:10:23.760 --> 00:10:28.079 +we'd usually be used to from a function like comp. The order + +00:10:28.080 --> 00:10:32.679 +here is top to bottom, basically, so that the map goes first, + +00:10:32.680 --> 00:10:37.839 +then the filter, and then the take. So effectively is what + +00:10:37.840 --> 00:10:40.919 +we're doing is taking all the integers that exist, + +00:10:40.920 --> 00:10:45.399 +positive, adding one to them, filtering out only the even + +00:10:45.400 --> 00:10:50.039 +ones, but then just taking 10. Cons here is a function that + +00:10:50.040 --> 00:10:57.039 +just produces the ending result as a list. So what happens + +00:10:57.040 --> 00:11:00.479 +here specifically is how we are avoiding intermediate + +00:11:00.480 --> 00:11:04.238 +allocations. First, the number 0 will come through. + +00:11:04.239 --> 00:11:07.879 +It will be pulled out of this source internally by transduce. + +00:11:07.880 --> 00:11:10.919 +It will make its way into the map. The map will add it. Then it + +00:11:10.920 --> 00:11:15.799 +will immediately go into this filter step. So it's not like + +00:11:15.800 --> 00:11:19.119 +all the maps occur, and then all the filters occur. We do + +00:11:19.120 --> 00:11:24.039 +everything for each element. So the 0 comes in, now it's 1. + +00:11:24.040 --> 00:11:27.559 +The filter would occur. Well, it's going to fail that + +00:11:27.560 --> 00:11:31.119 +because it's not even, so it will just bail there. Now we'll + +00:11:31.120 --> 00:11:35.239 +go to the next one. Now 1 will come, it will become 2, then + +00:11:35.240 --> 00:11:39.119 +it will be saved by this evenp call, and then the take will + +00:11:39.120 --> 00:11:42.599 +capture it, because we only want 10 values here. You can + +00:11:42.600 --> 00:11:45.239 +see 2, 4, 6, 8, and so on is the result that we + +00:11:45.240 --> 00:11:49.332 +expect. So let's play around a little bit. + +NOTE In Emacs + +00:11:49.333 --> 00:11:53.336 +Let's jump into Emacs and see what we can do. + +00:11:53.337 --> 00:11:58.500 +Alright, you should see my Emacs screen here. + +00:11:58.501 --> 00:12:04.359 +These are the actual notes for the actual + +00:12:04.360 --> 00:12:08.959 +presentation done in Org Mode. I'll boost that up in size for + +00:12:08.960 --> 00:12:12.639 +a little bit. That should be more than big enough for you. + +00:12:12.640 --> 00:12:17.719 +Just by changing the reducer, we can change the result. + +00:12:17.720 --> 00:12:21.079 +Okay, now it's a vector. Well, what else can we do to it? Well, + +00:12:21.080 --> 00:12:25.959 +let's just add up the results. Maybe we just want to count the + +00:12:25.960 --> 00:12:30.919 +results. Oh, indeed, there were 10. What if we want to find + +00:12:30.920 --> 00:12:36.959 +the average of the results? What if we want to find the median + +00:12:36.960 --> 00:12:40.959 +of the results? And so on. Here's some more interesting + +00:12:40.960 --> 00:12:45.839 +things that we could do. We could add different steps. So + +00:12:45.840 --> 00:12:51.239 +here we have all the integers. Let's add, hmm, okay, we'll + +00:12:51.240 --> 00:12:57.399 +keep that. We're going to add t-enumerate. What enumerate does + +00:12:57.400 --> 00:13:00.879 +is for each item that comes through, it is + +00:13:00.880 --> 00:13:06.039 +going to add a sort of index to it and make it a pair. In this + +00:13:06.040 --> 00:13:08.719 +case, it's going to be equal to what came in here. Well, we can + +00:13:08.720 --> 00:13:12.399 +change it. If we start this at 1, now it will be different. + +00:13:12.400 --> 00:13:15.519 +1 will be paired with 0, and then 2 would be paired + +00:13:15.520 --> 00:13:19.559 +with 1, and so on. We'll accept that the even call will change + +00:13:19.560 --> 00:13:24.039 +that a little bit. Why we're doing this is because we want + +00:13:24.040 --> 00:13:27.279 +to form a hash table. Let's move that down to 3, maybe + +00:13:27.280 --> 00:13:31.439 +we'll get a better result. What do we see? Okay, here now the + +00:13:31.440 --> 00:13:37.359 +result is a hash table. What are its values? Well, 0 seems + +00:13:37.360 --> 00:13:40.479 +to have... The key of 0 seems to be paired with 2, the key of + +00:13:40.480 --> 00:13:42.909 +1 seems to be paired with 4, + +00:13:42.910 --> 00:13:47.411 +and 2 seems to be paired with 6. + +00:13:47.412 --> 00:13:51.293 +Maybe let's jazz that up even a little bit more. + +00:13:51.294 --> 00:13:52.973 +We're going to start from a string + +00:13:52.974 --> 00:13:57.943 +and we'll call it hello. + +00:13:57.944 --> 00:13:59.564 +That's not going to work anymore + +00:13:59.565 --> 00:14:02.585 +and neither is that, but what we could do is + +00:14:02.586 --> 00:14:05.498 +we could say t-map #'string. + +00:14:05.499 --> 00:14:08.627 +I believe we'll do that. + +00:14:08.628 --> 00:14:08.959 +Let's see if that works. It did. So that's + +00:14:08.960 --> 00:14:13.589 +going to convert a character into a string. + +00:14:13.590 --> 00:14:14.679 +Let's just go two + +00:14:14.680 --> 00:14:18.399 +just to make it a little easier. Now you can see that we've + +00:14:18.400 --> 00:14:21.919 +constructed a hash table here. The key of 0 is mapped to the + +00:14:21.920 --> 00:14:27.079 +string of h and 1 is mapped to e. Now, I really like having + +00:14:27.080 --> 00:14:29.468 +this reducer in particular. + +NOTE Hash tables + +00:14:29.469 --> 00:14:30.639 +Know that hash tables are + +00:14:30.640 --> 00:14:34.199 +also legal sources. I find that both in Emacs Lisp and in + +00:14:34.200 --> 00:14:37.119 +Common Lisp, dealing with hash tables--like creating them + +00:14:37.120 --> 00:14:41.599 +and altering them--can be a bit of a pain. Having them + +00:14:41.600 --> 00:14:45.679 +immediately available like this with transducers is very + +00:14:45.680 --> 00:14:49.079 +handy, I find. We can work with something that wasn't a hash + +00:14:49.080 --> 00:14:53.279 +table. We can construct it in a way that makes it amenable to + +00:14:53.280 --> 00:14:56.199 +that, and then reduce it down into a hash table, and here you + +00:14:56.200 --> 00:14:58.039 +go. Very handy. + +NOTE Clarity + +00:14:58.040 --> 00:15:06.399 +One last point is that you can see very clearly what + +00:15:06.400 --> 00:15:10.479 +this is attempting to do, as opposed to, say, a for loop. It's + +00:15:10.480 --> 00:15:12.719 +very clear what that step is doing, and then you can see what + +00:15:12.720 --> 00:15:15.119 +that is doing, and you know that the result is going to be two. + +00:15:15.120 --> 00:15:18.559 +Each line is kind of its own declarative step, and it should + +00:15:18.560 --> 00:15:22.159 +be clear, just by staring at this, basically what you're + +00:15:22.160 --> 00:15:25.399 +going to get out. This is one main difference from other + +00:15:25.400 --> 00:15:29.599 +languages that have things--say, for instance, Rust's + +00:15:29.600 --> 00:15:35.439 +iterator API--is the difference between the transducers + +00:15:35.440 --> 00:15:41.639 +and the reducers. If we go up here, for example, the + +00:15:41.640 --> 00:15:44.679 +difference between the transducers and the reducers and + +00:15:44.680 --> 00:15:48.119 +the sources is not explicitly laid out, whereas with + +00:15:48.120 --> 00:15:53.119 +transducers, it is. You have to be aware of how these things + +00:15:53.120 --> 00:15:55.799 +are different. I think that that helps clarity. + +NOTE How do transducers work? + +00:15:55.800 --> 00:16:01.999 +Moving on. How do transducers work? Well, + +00:16:02.000 --> 00:16:09.857 +we want to go see the README. + +00:16:09.858 --> 00:16:11.399 +So, what we're going to do is + +00:16:11.400 --> 00:16:19.102 +we're going to go to here. + +00:16:19.103 --> 00:16:21.959 +You should still be able to see this. + +00:16:21.960 --> 00:16:28.583 +This is the CL example, actually. + +00:16:28.584 --> 00:16:32.279 +Let's go to transducers.el. + +00:16:32.280 --> 00:16:37.744 +Their APIs and READMEs are the same, + +00:16:37.745 --> 00:16:39.919 +but just for the sake of it, we will go see + +00:16:39.920 --> 00:16:45.726 +how this looks on the Emacs side, + +00:16:45.727 --> 00:16:48.046 +just so that nothing is a surprise. + +00:16:48.047 --> 00:16:50.239 +But recall that the APIs are essentially the same + +00:16:50.240 --> 00:16:53.679 +between the two. If you go to this section, writing your + +00:16:53.680 --> 00:16:56.839 +own primitives, you can read about how transducers are + +00:16:56.840 --> 00:17:00.999 +actually formed, whether or not you want to write them + +00:17:01.000 --> 00:17:06.799 +yourself or not. We can see here t-map. We accept the + +00:17:06.800 --> 00:17:10.239 +function that you want to operate with. Then you've got + +00:17:10.240 --> 00:17:13.319 +this extra little lambda here that's coming in, and it's + +00:17:13.320 --> 00:17:17.079 +receiving a thing that is named reducer. Now, while here + +00:17:17.080 --> 00:17:20.439 +we're calling it reducer, it's actually the chain of all the + +00:17:20.440 --> 00:17:25.159 +composed functions together. It's all those main + +00:17:25.160 --> 00:17:28.479 +transducer steps. Finally, it's the reducer all + +00:17:28.480 --> 00:17:31.879 +composed together with normal function composition. + +00:17:31.880 --> 00:17:35.877 +That will matter very soon. Now here's the actual meat. + +00:17:35.878 --> 00:17:40.519 +We can see the accumulative result that's coming in with the + +00:17:40.520 --> 00:17:45.739 +current element. Now we need to operate on this. + +00:17:45.740 --> 00:17:47.840 +Were it normally mapped, we would see us + +00:17:47.841 --> 00:17:49.919 +applying the F to the input. + +00:17:49.920 --> 00:17:53.519 +But here, you can see us applying the F to the input and then + +00:17:53.520 --> 00:17:58.679 +continuing on. So us calling the rest of the composed chain + +00:17:58.680 --> 00:18:03.159 +here is the effect of, in the previous slide, moving to the + +00:18:03.160 --> 00:18:07.156 +next step. We could ignore this line for now. + +00:18:07.157 --> 00:18:13.819 +If you're curious, please read the README in detail. + +00:18:13.820 --> 00:18:15.579 +Now, what about reducers? + +00:18:15.580 --> 00:18:18.879 +What do those look like? Well, let's just scroll + +00:18:18.880 --> 00:18:22.439 +down here. Recall that a reducer is a function that's + +00:18:22.440 --> 00:18:26.959 +consuming a stream, right? Zoom that up for you a little bit. + +00:18:26.960 --> 00:18:33.919 +Now, in the case of count, recall that this is how it's + +00:18:33.920 --> 00:18:37.679 +working, how we saw a moment ago. So clearly this list of five + +00:18:37.680 --> 00:18:42.199 +elements only has five things in it. Well, a reducer by + +00:18:42.200 --> 00:18:47.599 +structure is a function of two, one, or zero arguments. So we + +00:18:47.600 --> 00:18:50.639 +can see here in the case of two, this is the normal iterative + +00:18:50.640 --> 00:18:54.519 +case. We don't care about the input for count, we just care + +00:18:54.520 --> 00:18:58.559 +about the current accumulated count that we're doing, and + +00:18:58.560 --> 00:19:02.879 +we add one to it, and that's it. This then goes back to + +00:19:02.880 --> 00:19:06.359 +the loop and the whole process starts again with the next + +00:19:06.360 --> 00:19:10.879 +element. In this kind of done case, this is used internal to + +00:19:10.880 --> 00:19:16.879 +that sort of the supervising function transduce. It's just + +00:19:16.880 --> 00:19:19.639 +confirming the final result. Sometimes some + +00:19:19.640 --> 00:19:21.839 +post-processing is necessary here, but in the case of + +00:19:21.840 --> 00:19:26.039 +count, as it is so simple, that is not necessary. And now + +00:19:26.040 --> 00:19:29.359 +here's the base case. This is also used within that + +00:19:29.360 --> 00:19:34.319 +supervising transduce function at the very top. Well, if + +00:19:34.320 --> 00:19:36.679 +you're counting, you have to start from somewhere, right? + +00:19:36.680 --> 00:19:37.349 +In this case, well, what you're starting with is zero. + +00:19:37.350 --> 00:19:40.251 +In the case of cons, you'd be starting with an empty list. + +00:19:40.252 --> 00:19:44.434 +In the case of vector, you'd be starting + +00:19:44.435 --> 00:19:53.999 +with an empty vector and so on. + +00:19:54.000 --> 00:19:56.799 +Once again, if you are more curious, please take a look at + +00:19:56.800 --> 00:19:57.679 +the README. + +NOTE Transducers in the wild - CSV + +00:20:00.520 --> 00:20:06.039 +Okay, transducers in the wild. Well, let's go take a look at + +00:20:06.040 --> 00:20:07.639 +processing some CSV data. + +00:20:07.640 --> 00:20:21.319 +We're going to open up a new Emacs Lisp bracket here. So I have + +00:20:21.320 --> 00:20:28.839 +a file. And in this file, let's just go look at C-x b right + +00:20:28.840 --> 00:20:34.839 +there, you will see that we've got some bank transaction + +00:20:34.840 --> 00:20:37.879 +information. It's got these transactions from a whole + +00:20:37.880 --> 00:20:40.199 +bunch of different people into different accounts, + +00:20:40.200 --> 00:20:43.879 +whether it's money coming in, money going out, and then a + +00:20:43.880 --> 00:20:47.839 +basic description. How's your Latin? But for this little + +00:20:47.840 --> 00:20:53.679 +test, what we want to do is we want to find Bob's final bank + +00:20:53.680 --> 00:20:59.679 +balance. Let's get on to it. First of all, let's + +00:20:59.680 --> 00:21:04.444 +just confirm, let's do some basic stuff. + +00:21:04.445 --> 00:21:10.844 +with-current-buffer, find-file-noselect. + +00:21:10.845 --> 00:21:15.542 +What's the name of that file? + +00:21:15.543 --> 00:21:17.439 +This is pre-organized, so you + +00:21:17.440 --> 00:21:20.879 +will just see it right here. + +00:21:20.880 --> 00:21:26.999 +t-transduce and t-comp. We don't know what we're going to comp + +00:21:27.000 --> 00:21:33.039 +yet. Actually, I'll just pass to show you. And then we will + +00:21:33.040 --> 00:21:36.999 +see, let's just do a little t-count just to confirm. What's + +00:21:37.000 --> 00:21:45.112 +our source? Well, our source is a buffer, t-buffer-read. + +00:21:45.113 --> 00:21:50.153 +And note that because we're using with-current-buffer, + +00:21:50.154 --> 00:21:55.079 +if we go like this, if we go current-buffer, this will just work. So + +00:21:55.080 --> 00:21:59.919 +now let's... Well, that was odd. I should have done it like + +00:21:59.920 --> 00:22:02.159 +that. There we go. So now we should make that a little smaller + +00:22:02.160 --> 00:22:04.799 +so you can see what it is. Now if we hit RET, we should get the + +00:22:04.800 --> 00:22:09.559 +right result. Okay, so there are 50,001 lines in this file, + +00:22:09.560 --> 00:22:13.516 +but the one extra one is the name of the headers, right? + +00:22:13.517 --> 00:22:18.079 +We want to process this file in more detail. So how can we do + +00:22:18.080 --> 00:22:22.079 +that? Well, let's start by just automatically + +00:22:22.080 --> 00:22:28.799 +interpreting the results as CSV. If we do that, okay, well + +00:22:28.800 --> 00:22:31.559 +now we only have 50,000 entries as we expected, right? + +00:22:31.560 --> 00:22:36.759 +Because it's going to pull out the header line. If we now say + +00:22:36.760 --> 00:22:42.679 +we want to just filter out, you know, We only want Bob, right? + +00:22:42.680 --> 00:22:53.679 +So if... gethash, it was in the row of name. Each line here is + +00:22:53.680 --> 00:22:57.079 +made into, at least by default, is made into a hash map. So if + +00:22:57.080 --> 00:23:02.759 +we go like this, we should see that. Okay, so 12,000 of these + +00:23:02.760 --> 00:23:05.639 +lines or thereabout belong to Bob. + +00:23:05.640 --> 00:23:13.839 +Let's just move that over a little bit. Actually, I suppose we don't even + +00:23:13.840 --> 00:23:17.799 +need that anymore. I'll just keep that full size for you. + +00:23:17.800 --> 00:23:24.399 +Okay, so all right, there's about 12,000 results for Bob of + +00:23:24.400 --> 00:23:32.479 +the 50,000. What's next? Well, we want to confirm, + +00:23:32.480 --> 00:23:40.039 +we want to pull out everything, + +00:23:40.040 --> 00:23:43.079 +all of the in and the out entries. + +00:23:43.080 --> 00:23:56.279 +Thank you. So, string to number, because we know that + +00:23:56.280 --> 00:24:01.239 +everything came in as strings. Unfortunately, the from-csv + +00:24:01.240 --> 00:24:03.799 +doesn't try to be smart at all, it's just pulling everything + +00:24:03.800 --> 00:24:09.479 +in as string values. If you want actual things to be + +00:24:09.480 --> 00:24:13.399 +numbers or whatever, that is up to you to do the parsing + +00:24:13.400 --> 00:24:20.679 +yourself. Okay, so we have those two values now. We know + +00:24:20.680 --> 00:24:23.879 +that we saw from the data just a moment ago that you're only + +00:24:23.880 --> 00:24:26.999 +going to have a value in one column or the other. It's either + +00:24:27.000 --> 00:24:29.119 +going to be 0 in the empty one, or you're going to have some + +00:24:29.120 --> 00:24:32.159 +number in the other. So we know that we can just naively add + +00:24:32.160 --> 00:24:35.479 +them. If it was in, it would always be positive. So we'll just + +00:24:35.480 --> 00:24:41.519 +add that. But in the negative case, we want to just make it + +00:24:41.520 --> 00:24:45.279 +negative really briefly before we add them all together. + +00:24:45.280 --> 00:24:50.519 +let's now just prove to ourselves that we are sane here. What + +00:24:50.520 --> 00:24:52.479 +we're going to do is we're going to quickly go say take + +00:24:52.480 --> 00:24:57.039 +5 just to convince ourselves, and we'll go cons, and let's + +00:24:57.040 --> 00:24:59.839 +see if we get kind of results that make sense. Okay, these + +00:24:59.840 --> 00:25:02.799 +sort of make sense. It looks like you know Bob's got some big + +00:25:02.800 --> 00:25:07.679 +expenses here. If we take say 15, does it look any better? + +00:25:07.680 --> 00:25:10.319 +Okay, looks like he had a payday. All right, good job Bob. + +00:25:10.320 --> 00:25:15.439 +Let's get back in there. Now we only really care about + +00:25:15.440 --> 00:25:20.119 +adding the final result, right? So there we go. Add that all + +00:25:20.120 --> 00:25:24.559 +together and we'll see what we get in a moment. Okay, wow, + +00:25:24.560 --> 00:25:27.519 +Bob's rich. Okay, so it looks like in his 12,000 + +00:25:27.520 --> 00:25:32.279 +transaction, Bob has an overall net worth of $8.5 million. + +00:25:32.280 --> 00:25:34.439 +Looking pretty good. + +00:25:34.440 --> 00:25:38.999 +So here's an example of how you can, particularly in Emacs + +00:25:39.000 --> 00:25:42.959 +Lisp, how you can very easily just get a file, consider it the + +00:25:42.960 --> 00:25:45.879 +current buffer, and then just do whatever you want to it. + +00:25:45.880 --> 00:25:50.359 +Note that there is sort of first-class support for both CSV + +00:25:50.360 --> 00:25:54.359 +and JSON, and then you have, and both of those bring in their + +00:25:54.360 --> 00:25:57.719 +values as hash maps, and then you're just free to do whatever + +00:25:57.720 --> 00:26:00.439 +you want and process them, potentially both writing them + +00:26:00.440 --> 00:26:03.239 +back out as CSV or JSON once again. + +NOTE Issues and next steps + +00:26:03.240 --> 00:26:10.719 +Some issues with transducers that can come up is + +00:26:10.720 --> 00:26:14.919 +that one, a zip operator is missing, but I'm working on it. + +00:26:14.920 --> 00:26:19.399 +Two is that performance, particularly in Emacs Lisp, isn't + +00:26:19.400 --> 00:26:24.119 +that great. It could be due to the sort of nested lambda calls + +00:26:24.120 --> 00:26:27.759 +that have to occur internally, but the common Lisp + +00:26:27.760 --> 00:26:32.239 +implementation is quite good. and there's yet no support + +00:26:32.240 --> 00:26:35.399 +for parallelism. You can imagine that a lot of those steps + +00:26:35.400 --> 00:26:38.559 +you could potentially perform in parallel depending on the + +00:26:38.560 --> 00:26:44.399 +platform, but research has not yet gotten that far. Okay, + +00:26:44.400 --> 00:26:47.639 +that's all. Thank you very much. If you have any questions, + +00:26:47.640 --> 00:26:51.240 +please contact me. |