diff options
-rw-r--r-- | 2020/info/12.md | 488 |
1 files changed, 487 insertions, 1 deletions
diff --git a/2020/info/12.md b/2020/info/12.md index d8d077c4..1f60436b 100644 --- a/2020/info/12.md +++ b/2020/info/12.md @@ -2,7 +2,8 @@ Leo Vivier [[!template id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier.webm" subtitles="/2020/subtitles/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier.vtt"]] -[Download compressed .webm video (22.3M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier--vp9-q56-video-original-audio.webm) +[Download compressed .webm video (22.3M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier--vp9-q56-video-original-audio.webm) +[View transcript](#transcript) Many discussions have been had over the years on the debate between using few big files versus many small files. However, more often than @@ -105,3 +106,488 @@ in many more. - "the problem is to let org-element to make sense of the item (?) …". +<a name="transcript"></a> +# Transcript + +00:00:24.160 --> 00:00:58.434 +Hello again, everyone! I hope you had, +well, quite a lot of talks ever since +the last one I did, and all more +interesting one after the other. You +know, I'm a bit in a bit of a weird spot +right now, because I'm supposed to be +presenting to you (as you can see on my +screen) "One big-ass Org file or +multiple tiny ones: finally, the end of +the debate," and it sounds about as +clickbaity as you can possibly get with +those topics. By the way, credit where +credit is due, the title is not mine. +It's actually from Bastien Guerry, the +current Org maintainer. + +00:00:58.434 --> 00:01:22.823 +Yeah, I wanted to talk to you a little +bit today about this question because if +you are used to going on +reddit.com/r/emacs , you know the +subreddit that we have, if you go on +Hacker News often, you know it's a +question that you see pop up every once +in a while. "Should I be using one big +file, or should I be using a lot of tiny +files?" + +00:01:22.823 --> 00:01:58.575 +I believe you know we've got defenders +on both sides. If I just show you one +example... We have Karl Voit. He's one +of the organizers for the conference. He +is the guy who probably has the biggest +Org Mode files right now in all the +people I know, and god knows I know +plenty of people use Org Mode. +But if you just look at this line--I hope +it's not too small; you just +make it a little larger--but +Karl basically has a file with +126,000 lines. + +00:01:58.575 --> 00:02:57.040 +I'm just going to pause and try to have +you imagine how large a file it actually +is. Just think about all of these lines +being tasks in your days. Think about +all those lines being about little +thoughts you know that you've had +throughout the day or project that you +were working on. It's massive. You know +one of the problems that Karl Voit +actually approaches on this topic is +that it takes him roughly 20 seconds to +get his Org agenda going, which is a +massive amount of time. I mean, we have +very fast computers now. You know, ever +since Emacs was created in 1976, +computers... I have no idea how much +faster they've gotten. And yet, you +know, for 100,000 lines, Emacs seems to +be choking. It's certainly not +reasonable, in a way, to have to wait 20 +seconds just for your entire file to be +parsed. So basically what I want to do-- + +00:02:57.040 --> 00:03:50.720 +By the way, I forgot to introduce the +presentation, but I'm Leo Vivier. I did +this before, for those who were around. +I help maintain a software which is +called org-roam, and that's the +expertise that I have on the topic. +Actually, if you go online, I do have a +Github page. I will make sure that you +have all the links available afterwards. +But I do publish my init files, and you +can see, if you scroll at the bottom, I +have a little demonstration which shows +you the fancy things that I can do with +my Org Mode setup. That might be even +interesting in light of the talk you've +just had about GTD stuff, because the +first one is about how I handle my +projects, the second one is about the +flow from a task as I work on it... So I +won't spend too much time on this, but +basically that's my expertise. I have +spent eight years working with Org Mode, +three of them actually thinking about +writing packages. + +00:03:50.720 --> 00:04:32.880 +The thing is, if I go into a little bit +of detail (and obviously it's only a +lighting talk, so I won't have time to +actually go really in depth about it), +but there is something in the Org Mode +library which is called org-element. You +have the name right there, +org-element.el, .el being for Elisp +file. As you can see, the page is on the +Worg wiki, so it's accessible by +everyone. It's basically the API that +Org Mode uses to parse Org Mode files. +For those who don't know, parsing means +basically checking a file, checking all +the contents of the file, and extracting +all the information that we need from +that file. + +00:04:32.880 --> 00:04:58.960 +As you can imagine, you all have Org +Mode files in your mind, well you know +they can be fairly complex. You can have +properties, you can have contextual +information, like if you write a line +which starts at column zero (which means +at the left), it doesn't have the same +meaning, whether or not it is before the +beginning of a headline or if it is +after the beginning of a headline. It's +going to be relatively different, +hierarchically speaking. + +00:04:58.960 --> 00:05:39.280 +So the problem, when it comes to the +question of many files versus one big +file or few big files, is that we always +have to keep in mind what org-element +wants you to do. The thing is, there are +plenty of problems when it comes to +parsing files, the first one being +obviously that Emacs is a single-thread +process (or has some threading +capabilities; we're not going to go into +the details right now, that's not my +goal). It makes it incredibly hard to +parallelize parsing processes with the +current technology. + +00:05:39.280 --> 00:07:03.759 +So you'd have to imagine that if you +have a very large file--if you go back +to the example of Karl Voit from before: +100,000 lines--that means that you have +to scan through every single line, +basically. Because sometimes... Let's +just say that you have a property +drawer, for instance, which tells you, +oh okay, this tree has the tag :foo:. So +the problem is, there are multiple ways +for you to define a tag. You can use the +usual way, which is about wrapping in +columns the :tag: at the end of a +heading. For instance, if I... (I'm not +going to switch to Emacs, that's going +to waste too much time) That's one way +to say your tag. But say, you have tag +inheritance, which means that when you +have a parent with a tag, you also want +the child to inherit the tag. If you +have first heading with the tag :foo:, +you have the first subheading, and the +tag :foo: is implied. Now imagine having +to do that with a file that is +completely nested, a file that has maybe +9, 10, 11 levels of depth to it. It's +mind-bogglingly complicated for the +software to do that, knowing that... +I've told you about tags, but any +property can be inheritable. Anything +like priorities, even. Though why would +you do this? You can have groups. You +can have all this. + +00:07:03.759 --> 00:07:21.957 +And as someone who went through the +trouble of optimizing his Org agenda... +So basically, if we go back to the +GIFs--oh god we've already had this +discussion between the "git" and "magit" +and now I've started "gif" and "gif" and +I only have one more minute left to do +so, so let's just +say I'm going to say "gif" +just to spite people... + +00:07:21.957 --> 00:07:41.360 +So if you go on the way I organize my +agenda, what I did in order to keep my +agenda build time under two seconds, is +that I've rewritten a whole lot of codes +to be able to parse my Org agenda files. +So the thing is, I'm going to be talking +more about this later. + +00:07:41.360 --> 00:07:44.479 +I only have, let's say, one minute to +conclude. + +00:07:44.479 --> 00:08:15.199 +So as you've gathered, I'm not going to +be giving you the answer right now. I'm +going to be talking about org-roam a +little later, which is about following +the principle of having many small +files. But as someone who has been using +one large file to manage my life, you +know, I'm sitting on the fence. I do not +know which one is the best, but I hope +that my presentation has given you a +little idea of what goes on behind the +principles. + +00:08:15.520 --> 00:08:52.000 +You also need to think about the +philosophy behind the organization of +your notes. I hope to be approaching +this topic with you in about two hours +or so (maybe one hour actually). I'm +actually finished. I've decided to leave +you two minutes of questions. If someone +could feed me the questions, that might +be best, because I don't want... oh +actually I can just open the pad. I can +just open it. Give me a second, okay. +Just loading up. I might stop showing my +screen. That might make it easier. So I +mean if you can make myself big now on +the screen, that would be splendid. +([Amin]: yeah sure) + +00:08:52.000 --> 00:09:13.920 +Thank you. Where are we... Question 12. +Okay, so what's better, one big file +or...? Is it a jab to tell me that I +haven't answered the question because +someone just +asked me the question? Well, personally, if +I were to give you a quick answer in +20 seconds, personally, I think it's a +question that is contextually based. + +00:09:13.920 --> 00:09:45.890 +Do you want something that is efficient +as far as optimization is concerned? +Then you need to think about this. +Personally, for all the organization +that I do, all this stuff, all the TODOs +that I handle, I like to do this in one +simple big file because you benefit from +all the refiling capabilities of Org +Mode, so I would do that. But for +knowledge management, for note-taking +and all this, well I'd much rather +follow the org-roam way of doing things, +which is about having many small files. + +00:09:45.890 --> 00:09:57.040 +I'm not getting any more questions. I'm +not sure if there is one on IRC that +could be fed to me. Otherwise, I'm happy +to pass over to the next speaker. + +00:09:57.040 --> 00:10:06.520 +By the way, just before I finish, your +world is a lie. It's not a three-piece +suit. I'm wearing jeans below, so I hope +that satisfies your curiosity. + +00:10:10.640 --> 00:10:35.680 +Okay, there's one more question +appearing. "but otherwise one big file +to have everything..." So I'm putting +you on the spot, I believe. It was such +a short talk. You know the problem is, I +just wanted to give you a little answer. +A little, you know, path of thinking on +this topic. Obviously it's a topic I +could be spending 40 minutes on, but I'm +going to be drained, you're going to be +drained, nobody's going to be happy if I +do this. + +00:10:39.440 --> 00:11:08.240 +Someone asked me if I switch between +British and French accents. A little +secret for you: when I'm stressed, I +tend to revert to a French accent, so +you can measure the amount of stress +that I'm feeling during this talk with +the amount of h's that I drop and the +amount of sheer fright that you can see +sometimes in my eyes, when I'm thinking +about what to say next. + +00:11:08.240 --> 00:11:17.040 +All right sir. So, Amin, do you believe +we can leave it at that? I'll be... +People will see plenty more of me later +on, anyway. + +00:11:17.040 --> 00:11:27.120 +([Amin:] So, looking at the schedule, I +think your talk has until like 2:02, +meaning like five or six minutes from +now.) + +00:11:27.120 --> 00:11:28.000 +Oh, right. + +00:11:28.000 --> 00:11:33.920 +([Amin:] So if you do like to take one +or two questions, to add two more +questions, by all means.) + +00:11:33.920 --> 00:12:20.555 +So someone has asked me what is the +Emacs icon (sorry, see, another French +accent) here in my status bar... Oh +sorry, I'm not sharing any more. I might +just share again just so that everyone +can catch a glimpse of that. There we +go. Allow... So it should be... So if +you could make me small again, Amin, I'm +not sure if it's going to do it by +itself, but I do have a little icon here +in my status bar which is basically a +way to interact with org-protocol. I'm +not going to look for it right now, but +it's a browser extension that is +developed by one of my friends over at +Ranger whose name is Li Fong (??) and +it's very useful. I'm someone who uses a +lot of Org protocols. + +00:12:20.555 --> 00:12:53.600 +And by the way, I used to teach English +to high schoolers, and they were +supremely worried when I showed them my +status line and they saw "kill" and +"explore" in my status line. As fellow +Emacs users, you know that obviously +kill means to kill a selection of text +and keep it inside your clipboard, but +for my students, they were very worried +about what their professor was up to +during his nights. + +00:12:53.600 --> 00:13:01.920 +So let's see if we've got more +questions. I'm showing you the questions +on the rainbow. Let's see if we've got +more. People are posting a lot of +questions now. + +00:13:01.920 --> 00:13:06.399 +So how do you feel about archiving files +in Org Mode and how can that work? + +00:13:06.399 --> 00:13:59.519 +So one of the things when we think about +optimization is: yes, archiving done +trees is a good idea because it means +that if we go back to the org-element, +the way it works (and we'll get into +technical details afterwards; I'm giving +a presentation about org-roam technical +aspects, sorry, so I'll have a chance to +expand a little more on this) but +basically, org-element needs to... Every +time it sees a TODO, it has to consider +it, even though it is a done TODO. Why? +Because let's say, for instance, that in +your agenda you want to activate log +mode, which is going to show the tasks +which are done... Now you could be +clever and say, oh okay, the Org agenda +does not need to show done items, so +it's not going to look for them, but the +problem is that org-element is always +called. It always needs to parse the +buffer. + +00:13:59.519 --> 00:14:22.079 +You know, Nicolas Goaziou, who is the +French developer who's worked a whole +lot on org-element has gone through a +lot of trouble to optimize org-element, +but the problem is there's just so much +that we can do with a concurrent +process. Right now it leaves somewhat +things to be desired, but we're working +on it. + +00:14:22.079 --> 00:14:32.639 +One more time... I feel like I spent +half of this talk teasing my next talks, +but I'll be talking more about this in +my future talks in about one to two +hours. + +00:14:32.639 --> 00:14:36.079 +So, continuing with questions, how big +are my Org files? + +00:14:36.079 --> 00:15:04.880 +So in the background, I'm just going to +check how many lines I have in my main +file. +In my own file, so the one I told you +about where I keep all +my TODO GTD stuff, I have +38,000 lines, which is... +It's sizable, definitely. +But I do archive a lot of stuff, +so that might be a slight difference +between myself and Karl Voit, +even though I don't remember if they +actually archive stuff. + +00:15:04.880 --> 00:15:12.560 +So does it not consume more resources +and time to load multiple files files +than a large file or the same content +now? + +00:15:12.560 --> 00:16:00.560 +Theoretically, yes, having many files +open concurrently is slightly slower +than having one main file opened. Now +the problem is for those of you who have +large files, you may have noticed that +when you are scrolling in a very large +file, it starts taking quite a bit of +time. Why? It's because in Org Mode, you +have a lot of content that is hidden, so +when you have the view mode which hides +as much stuff as possible, meaning that +you only see the top heading--and I'm +checking the time, Amin, don't worry, +I'm finished on this one-- when you're +hiding a whole lot of stuff, Org Mode +needs to keep track, or I should say, +Emacs needs to keep track of which areas +of text to show and which areas of text +to hide. + +00:16:00.560 --> 00:16:21.199 +The problem is that when you're hiding +stuff-- let's say you're moving from the +first heading to the second heading, but +you've got like 10,000 lines between +those two headings-- well, Emacs needs +to compute the difference between the +two passages, and that takes quite a lot +of time. That's why you might realize +that it's a little choppy when you start +scrolling in large files. + +00:16:21.199 --> 00:16:30.719 +Anyway I could be answering questions +about Org Mode for literally two hours +straight, +so I'm gonna hand it over to the next +speakers. I'll be seeing +you guys a little later. + +00:16:30.719 --> 00:16:33.440 +([Amin]: Thank you very much, Leo.) + +00:16:33.440 --> 00:16:34.889 +Oh, thank you. + +00:16:34.889 --> 00:16:36.959 +([Amin:] Yes. Bye.) + +00:16:36.959 --> 00:16:39.839 +Bye. |