summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
Diffstat (limited to '')
-rw-r--r--2020/info/12.md488
1 files changed, 487 insertions, 1 deletions
diff --git a/2020/info/12.md b/2020/info/12.md
index d8d077c4..1f60436b 100644
--- a/2020/info/12.md
+++ b/2020/info/12.md
@@ -2,7 +2,8 @@
Leo Vivier
[[!template id=vid src="https://mirror.csclub.uwaterloo.ca/emacsconf/2020/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier.webm" subtitles="/2020/subtitles/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier.vtt"]]
-[Download compressed .webm video (22.3M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier--vp9-q56-video-original-audio.webm)
+[Download compressed .webm video (22.3M)](https://mirror.csclub.uwaterloo.ca/emacsconf/2020/smaller/emacsconf-2020--12-one-big-ass-org-file-or-multiple-tiny-ones-finally-the-end-of-the-debate--leo-vivier--vp9-q56-video-original-audio.webm)
+[View transcript](#transcript)
Many discussions have been had over the years on the debate between
using few big files versus many small files. However, more often than
@@ -105,3 +106,488 @@ in many more.
- "the problem is to let org-element to make sense of the item (?)
…".
+<a name="transcript"></a>
+# Transcript
+
+00:00:24.160 --> 00:00:58.434
+Hello again, everyone! I hope you had,
+well, quite a lot of talks ever since
+the last one I did, and all more
+interesting one after the other. You
+know, I'm a bit in a bit of a weird spot
+right now, because I'm supposed to be
+presenting to you (as you can see on my
+screen) "One big-ass Org file or
+multiple tiny ones: finally, the end of
+the debate," and it sounds about as
+clickbaity as you can possibly get with
+those topics. By the way, credit where
+credit is due, the title is not mine.
+It's actually from Bastien Guerry, the
+current Org maintainer.
+
+00:00:58.434 --> 00:01:22.823
+Yeah, I wanted to talk to you a little
+bit today about this question because if
+you are used to going on
+reddit.com/r/emacs , you know the
+subreddit that we have, if you go on
+Hacker News often, you know it's a
+question that you see pop up every once
+in a while. "Should I be using one big
+file, or should I be using a lot of tiny
+files?"
+
+00:01:22.823 --> 00:01:58.575
+I believe you know we've got defenders
+on both sides. If I just show you one
+example... We have Karl Voit. He's one
+of the organizers for the conference. He
+is the guy who probably has the biggest
+Org Mode files right now in all the
+people I know, and god knows I know
+plenty of people use Org Mode.
+But if you just look at this line--I hope
+it's not too small; you just
+make it a little larger--but
+Karl basically has a file with
+126,000 lines.
+
+00:01:58.575 --> 00:02:57.040
+I'm just going to pause and try to have
+you imagine how large a file it actually
+is. Just think about all of these lines
+being tasks in your days. Think about
+all those lines being about little
+thoughts you know that you've had
+throughout the day or project that you
+were working on. It's massive. You know
+one of the problems that Karl Voit
+actually approaches on this topic is
+that it takes him roughly 20 seconds to
+get his Org agenda going, which is a
+massive amount of time. I mean, we have
+very fast computers now. You know, ever
+since Emacs was created in 1976,
+computers... I have no idea how much
+faster they've gotten. And yet, you
+know, for 100,000 lines, Emacs seems to
+be choking. It's certainly not
+reasonable, in a way, to have to wait 20
+seconds just for your entire file to be
+parsed. So basically what I want to do--
+
+00:02:57.040 --> 00:03:50.720
+By the way, I forgot to introduce the
+presentation, but I'm Leo Vivier. I did
+this before, for those who were around.
+I help maintain a software which is
+called org-roam, and that's the
+expertise that I have on the topic.
+Actually, if you go online, I do have a
+Github page. I will make sure that you
+have all the links available afterwards.
+But I do publish my init files, and you
+can see, if you scroll at the bottom, I
+have a little demonstration which shows
+you the fancy things that I can do with
+my Org Mode setup. That might be even
+interesting in light of the talk you've
+just had about GTD stuff, because the
+first one is about how I handle my
+projects, the second one is about the
+flow from a task as I work on it... So I
+won't spend too much time on this, but
+basically that's my expertise. I have
+spent eight years working with Org Mode,
+three of them actually thinking about
+writing packages.
+
+00:03:50.720 --> 00:04:32.880
+The thing is, if I go into a little bit
+of detail (and obviously it's only a
+lighting talk, so I won't have time to
+actually go really in depth about it),
+but there is something in the Org Mode
+library which is called org-element. You
+have the name right there,
+org-element.el, .el being for Elisp
+file. As you can see, the page is on the
+Worg wiki, so it's accessible by
+everyone. It's basically the API that
+Org Mode uses to parse Org Mode files.
+For those who don't know, parsing means
+basically checking a file, checking all
+the contents of the file, and extracting
+all the information that we need from
+that file.
+
+00:04:32.880 --> 00:04:58.960
+As you can imagine, you all have Org
+Mode files in your mind, well you know
+they can be fairly complex. You can have
+properties, you can have contextual
+information, like if you write a line
+which starts at column zero (which means
+at the left), it doesn't have the same
+meaning, whether or not it is before the
+beginning of a headline or if it is
+after the beginning of a headline. It's
+going to be relatively different,
+hierarchically speaking.
+
+00:04:58.960 --> 00:05:39.280
+So the problem, when it comes to the
+question of many files versus one big
+file or few big files, is that we always
+have to keep in mind what org-element
+wants you to do. The thing is, there are
+plenty of problems when it comes to
+parsing files, the first one being
+obviously that Emacs is a single-thread
+process (or has some threading
+capabilities; we're not going to go into
+the details right now, that's not my
+goal). It makes it incredibly hard to
+parallelize parsing processes with the
+current technology.
+
+00:05:39.280 --> 00:07:03.759
+So you'd have to imagine that if you
+have a very large file--if you go back
+to the example of Karl Voit from before:
+100,000 lines--that means that you have
+to scan through every single line,
+basically. Because sometimes... Let's
+just say that you have a property
+drawer, for instance, which tells you,
+oh okay, this tree has the tag :foo:. So
+the problem is, there are multiple ways
+for you to define a tag. You can use the
+usual way, which is about wrapping in
+columns the :tag: at the end of a
+heading. For instance, if I... (I'm not
+going to switch to Emacs, that's going
+to waste too much time) That's one way
+to say your tag. But say, you have tag
+inheritance, which means that when you
+have a parent with a tag, you also want
+the child to inherit the tag. If you
+have first heading with the tag :foo:,
+you have the first subheading, and the
+tag :foo: is implied. Now imagine having
+to do that with a file that is
+completely nested, a file that has maybe
+9, 10, 11 levels of depth to it. It's
+mind-bogglingly complicated for the
+software to do that, knowing that...
+I've told you about tags, but any
+property can be inheritable. Anything
+like priorities, even. Though why would
+you do this? You can have groups. You
+can have all this.
+
+00:07:03.759 --> 00:07:21.957
+And as someone who went through the
+trouble of optimizing his Org agenda...
+So basically, if we go back to the
+GIFs--oh god we've already had this
+discussion between the "git" and "magit"
+and now I've started "gif" and "gif" and
+I only have one more minute left to do
+so, so let's just
+say I'm going to say "gif"
+just to spite people...
+
+00:07:21.957 --> 00:07:41.360
+So if you go on the way I organize my
+agenda, what I did in order to keep my
+agenda build time under two seconds, is
+that I've rewritten a whole lot of codes
+to be able to parse my Org agenda files.
+So the thing is, I'm going to be talking
+more about this later.
+
+00:07:41.360 --> 00:07:44.479
+I only have, let's say, one minute to
+conclude.
+
+00:07:44.479 --> 00:08:15.199
+So as you've gathered, I'm not going to
+be giving you the answer right now. I'm
+going to be talking about org-roam a
+little later, which is about following
+the principle of having many small
+files. But as someone who has been using
+one large file to manage my life, you
+know, I'm sitting on the fence. I do not
+know which one is the best, but I hope
+that my presentation has given you a
+little idea of what goes on behind the
+principles.
+
+00:08:15.520 --> 00:08:52.000
+You also need to think about the
+philosophy behind the organization of
+your notes. I hope to be approaching
+this topic with you in about two hours
+or so (maybe one hour actually). I'm
+actually finished. I've decided to leave
+you two minutes of questions. If someone
+could feed me the questions, that might
+be best, because I don't want... oh
+actually I can just open the pad. I can
+just open it. Give me a second, okay.
+Just loading up. I might stop showing my
+screen. That might make it easier. So I
+mean if you can make myself big now on
+the screen, that would be splendid.
+([Amin]: yeah sure)
+
+00:08:52.000 --> 00:09:13.920
+Thank you. Where are we... Question 12.
+Okay, so what's better, one big file
+or...? Is it a jab to tell me that I
+haven't answered the question because
+someone just
+asked me the question? Well, personally, if
+I were to give you a quick answer in
+20 seconds, personally, I think it's a
+question that is contextually based.
+
+00:09:13.920 --> 00:09:45.890
+Do you want something that is efficient
+as far as optimization is concerned?
+Then you need to think about this.
+Personally, for all the organization
+that I do, all this stuff, all the TODOs
+that I handle, I like to do this in one
+simple big file because you benefit from
+all the refiling capabilities of Org
+Mode, so I would do that. But for
+knowledge management, for note-taking
+and all this, well I'd much rather
+follow the org-roam way of doing things,
+which is about having many small files.
+
+00:09:45.890 --> 00:09:57.040
+I'm not getting any more questions. I'm
+not sure if there is one on IRC that
+could be fed to me. Otherwise, I'm happy
+to pass over to the next speaker.
+
+00:09:57.040 --> 00:10:06.520
+By the way, just before I finish, your
+world is a lie. It's not a three-piece
+suit. I'm wearing jeans below, so I hope
+that satisfies your curiosity.
+
+00:10:10.640 --> 00:10:35.680
+Okay, there's one more question
+appearing. "but otherwise one big file
+to have everything..." So I'm putting
+you on the spot, I believe. It was such
+a short talk. You know the problem is, I
+just wanted to give you a little answer.
+A little, you know, path of thinking on
+this topic. Obviously it's a topic I
+could be spending 40 minutes on, but I'm
+going to be drained, you're going to be
+drained, nobody's going to be happy if I
+do this.
+
+00:10:39.440 --> 00:11:08.240
+Someone asked me if I switch between
+British and French accents. A little
+secret for you: when I'm stressed, I
+tend to revert to a French accent, so
+you can measure the amount of stress
+that I'm feeling during this talk with
+the amount of h's that I drop and the
+amount of sheer fright that you can see
+sometimes in my eyes, when I'm thinking
+about what to say next.
+
+00:11:08.240 --> 00:11:17.040
+All right sir. So, Amin, do you believe
+we can leave it at that? I'll be...
+People will see plenty more of me later
+on, anyway.
+
+00:11:17.040 --> 00:11:27.120
+([Amin:] So, looking at the schedule, I
+think your talk has until like 2:02,
+meaning like five or six minutes from
+now.)
+
+00:11:27.120 --> 00:11:28.000
+Oh, right.
+
+00:11:28.000 --> 00:11:33.920
+([Amin:] So if you do like to take one
+or two questions, to add two more
+questions, by all means.)
+
+00:11:33.920 --> 00:12:20.555
+So someone has asked me what is the
+Emacs icon (sorry, see, another French
+accent) here in my status bar... Oh
+sorry, I'm not sharing any more. I might
+just share again just so that everyone
+can catch a glimpse of that. There we
+go. Allow... So it should be... So if
+you could make me small again, Amin, I'm
+not sure if it's going to do it by
+itself, but I do have a little icon here
+in my status bar which is basically a
+way to interact with org-protocol. I'm
+not going to look for it right now, but
+it's a browser extension that is
+developed by one of my friends over at
+Ranger whose name is Li Fong (??) and
+it's very useful. I'm someone who uses a
+lot of Org protocols.
+
+00:12:20.555 --> 00:12:53.600
+And by the way, I used to teach English
+to high schoolers, and they were
+supremely worried when I showed them my
+status line and they saw "kill" and
+"explore" in my status line. As fellow
+Emacs users, you know that obviously
+kill means to kill a selection of text
+and keep it inside your clipboard, but
+for my students, they were very worried
+about what their professor was up to
+during his nights.
+
+00:12:53.600 --> 00:13:01.920
+So let's see if we've got more
+questions. I'm showing you the questions
+on the rainbow. Let's see if we've got
+more. People are posting a lot of
+questions now.
+
+00:13:01.920 --> 00:13:06.399
+So how do you feel about archiving files
+in Org Mode and how can that work?
+
+00:13:06.399 --> 00:13:59.519
+So one of the things when we think about
+optimization is: yes, archiving done
+trees is a good idea because it means
+that if we go back to the org-element,
+the way it works (and we'll get into
+technical details afterwards; I'm giving
+a presentation about org-roam technical
+aspects, sorry, so I'll have a chance to
+expand a little more on this) but
+basically, org-element needs to... Every
+time it sees a TODO, it has to consider
+it, even though it is a done TODO. Why?
+Because let's say, for instance, that in
+your agenda you want to activate log
+mode, which is going to show the tasks
+which are done... Now you could be
+clever and say, oh okay, the Org agenda
+does not need to show done items, so
+it's not going to look for them, but the
+problem is that org-element is always
+called. It always needs to parse the
+buffer.
+
+00:13:59.519 --> 00:14:22.079
+You know, Nicolas Goaziou, who is the
+French developer who's worked a whole
+lot on org-element has gone through a
+lot of trouble to optimize org-element,
+but the problem is there's just so much
+that we can do with a concurrent
+process. Right now it leaves somewhat
+things to be desired, but we're working
+on it.
+
+00:14:22.079 --> 00:14:32.639
+One more time... I feel like I spent
+half of this talk teasing my next talks,
+but I'll be talking more about this in
+my future talks in about one to two
+hours.
+
+00:14:32.639 --> 00:14:36.079
+So, continuing with questions, how big
+are my Org files?
+
+00:14:36.079 --> 00:15:04.880
+So in the background, I'm just going to
+check how many lines I have in my main
+file.
+In my own file, so the one I told you
+about where I keep all
+my TODO GTD stuff, I have
+38,000 lines, which is...
+It's sizable, definitely.
+But I do archive a lot of stuff,
+so that might be a slight difference
+between myself and Karl Voit,
+even though I don't remember if they
+actually archive stuff.
+
+00:15:04.880 --> 00:15:12.560
+So does it not consume more resources
+and time to load multiple files files
+than a large file or the same content
+now?
+
+00:15:12.560 --> 00:16:00.560
+Theoretically, yes, having many files
+open concurrently is slightly slower
+than having one main file opened. Now
+the problem is for those of you who have
+large files, you may have noticed that
+when you are scrolling in a very large
+file, it starts taking quite a bit of
+time. Why? It's because in Org Mode, you
+have a lot of content that is hidden, so
+when you have the view mode which hides
+as much stuff as possible, meaning that
+you only see the top heading--and I'm
+checking the time, Amin, don't worry,
+I'm finished on this one-- when you're
+hiding a whole lot of stuff, Org Mode
+needs to keep track, or I should say,
+Emacs needs to keep track of which areas
+of text to show and which areas of text
+to hide.
+
+00:16:00.560 --> 00:16:21.199
+The problem is that when you're hiding
+stuff-- let's say you're moving from the
+first heading to the second heading, but
+you've got like 10,000 lines between
+those two headings-- well, Emacs needs
+to compute the difference between the
+two passages, and that takes quite a lot
+of time. That's why you might realize
+that it's a little choppy when you start
+scrolling in large files.
+
+00:16:21.199 --> 00:16:30.719
+Anyway I could be answering questions
+about Org Mode for literally two hours
+straight,
+so I'm gonna hand it over to the next
+speakers. I'll be seeing
+you guys a little later.
+
+00:16:30.719 --> 00:16:33.440
+([Amin]: Thank you very much, Leo.)
+
+00:16:33.440 --> 00:16:34.889
+Oh, thank you.
+
+00:16:34.889 --> 00:16:36.959
+([Amin:] Yes. Bye.)
+
+00:16:36.959 --> 00:16:39.839
+Bye.