WEBVTT captioned by brandelune and bhavin192 NOTE Introduction 00:00.000 --> 00:00:05.400 Hello everyone, I am Jean-Christophe Helary, 00:00:05.400 --> 00:00:09.680 I live in Japan, and I'm a translator. 00:09.680 --> 00:00:12.633 Here is my second presentation on this very 00:00:12.633 --> 00:00:15.300 prestigious stage that is the Emacs conference. 00:00:15.300 --> 00:00:18.367 Following my "Let's Translate the 2 million words 00:00:18.367 --> 00:00:21.767 in the Emacs manual" in 2021, my topic this year, 00:00:21.767 --> 00:00:25.167 always related to translation, is 00:00:25.167 --> 00:00:28.400 pre-localizing Emacs or much less pretentiously, 00:00:28.400 --> 00:00:31.933 "Just make sure that your strings don't mix up plurals". NOTE Usage of package.el 00:00:31.933 --> 00:00:36.133 So, for some reason I resumed Emacs use 00:00:36.133 --> 00:00:39.940 around 2016, and as I was rediscovering the thing 00:00:39.940 --> 00:00:42.800 I found really old outline-mode files here 00:00:42.800 --> 00:00:44.033 and there on my machine. 00:00:44.033 --> 00:00:45.140 And I started to experiment 00:00:45.140 --> 00:00:47.167 again and write again with Emacs. 00:00:47.167 --> 00:00:48.564 I think that at the time, 00:00:48.564 --> 00:00:50.433 I was coming from Aquamacs and because of 00:00:50.433 --> 00:00:53.400 an integration bug with macOS, I decided 00:00:53.400 --> 00:00:55.440 to check what was going on in the code. 00:55.440 --> 00:00:59.040 That was my first official contribution. NOTE The bug in strings 00:59.040 --> 00:01:02.233 So as I was happily installing and uninstalling 00:01:02.233 --> 00:01:05.267 things, I noticed something weird one day. 00:01:05.267 --> 00:01:09.080 Let me enlarge that picture. 01:09.080 --> 00:01:12.400 See? And even if I were not a translator, 00:01:12.400 --> 00:01:14.960 I would not like that string, and obviously 01:14.960 --> 00:01:16.833 the same bug bites you when the string 00:01:16.833 --> 00:01:20.520 tells you to erase the package. 01:20.520 --> 00:01:26.720 Boom, so we agree that we have a problem here. NOTE Natural language engineering 01:26.720 --> 00:01:29.067 So, I started to do some spelunking into the code, 00:01:29.067 --> 00:01:31.067 and at least that was my feeling 00:01:31.067 --> 00:01:33.100 because I really am not a programmer 00:01:33.100 --> 00:01:37.240 by any stretch of the imagination. 01:37.240 --> 00:01:39.467 And what I found was an amazing piece of 00:01:39.467 --> 00:01:41.840 natural language engineering that was mixing code 01:41.840 --> 00:01:44.267 with English suffixes and all that, 00:01:44.267 --> 00:01:46.267 and I could see that the people who had 00:01:46.267 --> 00:01:47.767 written that code were pretty smart, 00:01:47.767 --> 00:01:49.533 but had missed a number of edge cases 00:01:49.533 --> 00:01:51.280 that produced the above bugs. 01:51.280 --> 00:01:53.500 That was my first experience with 00:01:53.500 --> 00:01:55.033 all the message related functions, 00:01:55.033 --> 00:01:58.360 "format", "concat", "message", etc. 01:58.360 --> 00:02:00.433 But even with my beginner's eyes I could see that 00:02:00.433 --> 00:02:03.040 something was off because when you want 02:03.040 --> 00:02:06.000 to produce natural language strings you never ever 00:02:06.000 --> 00:02:08.600 should use "replace-regex-in-string" to 02:08.600 --> 00:02:11.067 add an "ing" or an "ed" suffix 00:02:11.067 --> 00:02:12.980 to change the mode of a sentence. 02:12.980 --> 00:02:16.840 But that's what I was seeing was happening. NOTE More than a missed plural 02:16.840 --> 00:02:20.333 So, what we had to deal with here 00:02:20.333 --> 00:02:22.220 was way more than just a missed plural. 02:22.220 --> 00:02:24.000 It was an attempt at engineering all 00:02:24.000 --> 00:02:26.400 the message strings destined to the user 00:02:26.400 --> 00:02:28.567 with the smart code that was making assumptions 00:02:28.567 --> 00:02:30.067 on the structure of words, 00:02:30.067 --> 00:02:33.220 and in the localization world that's a big no-no. 02:33.220 --> 00:02:36.667 I'm a translator, and such UI strings issues 00:02:36.667 --> 00:02:38.433 have been sorted out decades ago. 00:02:38.433 --> 00:02:41.320 So I was a bit shocked. NOTE The final patch 02:41.320 --> 00:02:43.533 The final patch took me about a year to write, 00:02:43.533 --> 00:02:45.380 because I'm slow, because I needed to verify 02:45.380 --> 00:02:47.167 and understand a lot, because there are 00:02:47.167 --> 00:02:49.100 plenty of rules and plenty of people who are 00:02:49.100 --> 00:02:51.433 explaining you very nicely what the rules are, 00:02:51.433 --> 00:02:53.733 because I have kids, and because the 00:02:53.733 --> 00:02:55.600 Emacs development list is such a cool place to be 00:02:55.600 --> 00:02:58.560 that you often forget why you're there sometimes. 02:58.560 --> 00:03:01.800 Anyway, for people who can't click on a video, 00:03:01.800 --> 00:03:03.640 and I can't either, here are the relevant 03:03.640 --> 00:03:05.840 parts with some short comments. 03:05.840 --> 00:03:07.800 I'll be talking with localization in mind, 00:03:07.800 --> 00:03:09.640 knowing full well that Emacs localization 03:09.640 --> 00:03:12.800 is not on the map at the moment. 03:12.800 --> 00:03:14.167 So first, there is this thing 00:03:14.167 --> 00:03:15.520 about "format" and "concat". 03:15.520 --> 00:03:17.800 And if I remember correctly, 00:03:17.800 --> 00:03:20.300 "format" is better for user-facing things, 00:03:20.300 --> 00:03:25.160 and "concat" is better for internal things. 03:25.160 --> 00:03:26.800 Here, there are two things. 03:26.800 --> 00:03:28.800 First, a rule that we have when we prepare 00:03:28.800 --> 00:03:30.700 strings that need to be localized is 00:03:30.700 --> 00:03:33.333 never ever make assumptions on the way 00:03:33.333 --> 00:03:35.780 numbers are expressed in the language. 03:35.780 --> 00:03:37.067 Here, the assumption is that 00:03:37.067 --> 00:03:40.000 we have either a singular or plural form, 00:03:40.000 --> 00:03:42.040 and that's not always the case. 03:42.040 --> 00:03:44.067 That usually means that you should externalize 00:03:44.067 --> 00:03:48.280 numbers and find a generic way to express them. 03:48.280 --> 00:03:50.833 So it makes for slightly less natural 00:03:50.833 --> 00:03:54.400 language strings, but it's better anyway. 03:54.400 --> 00:03:56.667 Then we have that comma there that's trying 00:03:56.667 --> 00:03:58.167 to be externalized and that's weird, 00:03:58.167 --> 00:04:02.620 so I put it back into the sentence. 04:02.620 --> 00:04:04.967 Here we have another construct, or two rather, 00:04:04.967 --> 00:04:06.960 that really should not be used like this. 04:06.960 --> 00:04:10.033 It's "prin1" that uses quoting characters, 00:04:10.033 --> 00:04:12.480 just like "print", and "princ" that does not. 04:12.480 --> 00:04:15.400 And you see why they were combined together. 04:15.400 --> 00:04:17.133 And they were both trying to be really smart 00:04:17.133 --> 00:04:19.780 about which article to put in front of a vowel. 04:19.780 --> 00:04:20.960 And you just don't do that. 04:20.960 --> 00:04:25.000 You just keep things simple. 04:25.000 --> 00:04:26.633 Here again, the code is trying to be smart, 00:04:26.633 --> 00:04:28.480 but it's really not much more efficient than 04:28.480 --> 00:04:34.940 plainly stating what you want. 04:34.940 --> 00:04:36.500 And here again, we have "concat" things 00:04:36.500 --> 00:04:40.367 that we could just use to plainly state 00:04:40.367 --> 00:04:41.980 what we want to state. 04:41.980 --> 00:04:49.880 So, instead of "concat" I just put a "message". 04:49.880 --> 00:04:52.260 And here we have something that's very cute. 04:52.260 --> 00:04:54.540 It's a computerized plural. 04:54.540 --> 00:04:55.700 Here again, assuming that 00:04:55.700 --> 00:04:58.640 there are only plural or singular forms. 04:58.640 --> 00:05:00.867 But the end string is not that much more natural 00:05:00.867 --> 00:05:02.700 than the fix, the code is less efficient 00:05:02.700 --> 00:05:07.760 and is harder to understand. 05:07.760 --> 00:05:09.433 Here again, the code is trying to make 00:05:09.433 --> 00:05:13.520 smart things where it could be much simpler. 05:13.520 --> 00:05:14.667 That is the part where you get the 00:05:14.667 --> 00:05:19.480 number of packages and their names. 05:19.480 --> 00:05:22.067 Here the whole sentence with the semicolons 00:05:22.067 --> 00:05:26.333 and the question mark is split in parts, 00:05:26.333 --> 00:05:29.180 between which something will be inserted. 05:29.180 --> 00:05:34.240 That's really ugly and difficult to read. 05:34.240 --> 00:05:37.700 Here again, another "ing" waiting to be 00:05:37.700 --> 00:05:44.840 regex-inserted into the code. 05:44.840 --> 00:05:46.633 And here at last, we get to the point 00:05:46.633 --> 00:05:48.760 where everything started. 05:48.760 --> 00:05:50.833 And you can see that unlike in the other spots, 00:05:50.833 --> 00:05:52.400 there is no possibility for the expression 05:52.400 --> 00:05:54.680 to be singular. 05:54.680 --> 00:05:57.600 So, I guess that if it hadn't been for that bug, 00:05:57.600 --> 00:05:59.320 I would not have found the other items, 05:59.320 --> 00:06:01.033 and we would be left with code that works, 00:06:01.033 --> 00:06:02.033 of course, but that is 00:06:02.033 --> 00:06:06.020 harder to understand, and maintain. 06:06.020 --> 00:06:08.333 Last but not least, a last version of 00:06:08.333 --> 00:06:10.920 "just plainly state what you mean to state". 06:10.920 --> 00:06:14.880 Keep it simple. NOTE "What did I learn, and how did I learn it?" 06:14.880 --> 00:06:19.267 So first, we have this wonderful CONTRIBUTE file 00:06:19.267 --> 00:06:21.267 that is very explicit about 00:06:21.267 --> 00:06:23.520 how we must proceed when contributing code. 06:23.520 --> 00:06:25.233 So, that's really the first place 00:06:25.233 --> 00:06:27.760 that we should all read. 06:27.760 --> 00:06:29.333 The README file is pretty cool too, 00:06:29.333 --> 00:06:30.967 especially at the beginning of the process, 00:06:30.967 --> 00:06:31.867 when you're not sure whether 00:06:31.867 --> 00:06:36.240 you want to fix that bug or just report it. NOTE Useful packages 06:36.240 --> 00:06:37.920 And then we've got packages. 06:37.920 --> 00:06:39.900 We've got a number of packages that are really 00:06:39.900 --> 00:06:42.600 helpful when it comes to reading 00:06:42.600 --> 00:06:45.880 the information and the manuals. 06:45.880 --> 00:06:48.000 I'm mentioning three of them here, 00:06:48.000 --> 00:06:53.720 and I think they are the most important for us. NOTE Package: helpful 06:53.720 --> 00:06:55.600 So "helpful" is on the right, 00:06:55.600 --> 00:06:58.667 and it's overflowing the window with 00:06:58.667 --> 00:07:01.900 all the contextualized information it provides, 00:07:01.900 --> 00:07:05.280 and the standard "help" is on the left. 07:05.280 --> 00:07:07.933 I mean, really there are like two or three 00:07:07.933 --> 00:07:11.567 screen-full of information in the "helpful" output, 00:07:11.567 --> 00:07:13.233 so you really only see a part, 00:07:13.233 --> 00:07:16.320 but I guess if you use it, you know what I'm saying. 07:16.320 --> 00:07:18.867 What I like the most here is the "view in manual" 00:07:18.867 --> 00:07:21.800 part, where you can actually click and even get 00:07:21.800 --> 00:07:23.667 more information that's sometimes 00:07:23.667 --> 00:07:28.400 easier to read and understand. NOTE Package: inform 07:28.400 --> 00:07:33.640 And then you've got the "info" versus "inform" formats. 07:33.640 --> 00:07:34.567 When you're in the manual, 00:07:34.567 --> 00:07:37.140 "inform" makes a huge difference. 07:37.140 --> 00:07:39.367 You can see here that you've got colorized items, 00:07:39.367 --> 00:07:42.000 and also in the middle you've got that 07:42.000 --> 00:07:45.000 'read' part that's green and bold. 07:45.000 --> 00:07:49.333 In "info" it's not a specific object, 00:07:49.333 --> 00:07:52.200 it's just a string. In 'inform' it's actually 00:07:52.200 --> 00:07:53.800 a link that you can click, 00:07:53.800 --> 00:07:58.320 and actually go to that 'read' manual page. NOTE Package: which-key 07:58.320 --> 00:08:01.300 Now, we've got "which-key". 08:01.300 --> 00:08:03.400 "which-key" is a savior for beginners too. 08:03.400 --> 00:08:04.867 Just wait half a second or something, 00:08:04.867 --> 00:08:06.500 and Emacs will show you all the keys 00:08:06.500 --> 00:08:08.433 that you can access from the prefix combination 00:08:08.433 --> 00:08:09.920 that you just typed. 08:09.920 --> 00:08:13.200 So, it's really helpful for discovering functions 00:08:13.200 --> 00:08:19.160 and learning new functions, getting used to them. NOTE It all started with this messageā€¦ 08:19.160 --> 00:08:21.500 And so that whole process startedā€¦, 00:08:21.500 --> 00:08:26.533 it was May 23, 2017, 00:08:26.533 --> 00:08:30.440 with that thread when I found the bug. 08:30.440 --> 00:08:32.800 I just bumped into an English/code bug 00:08:32.800 --> 00:08:36.920 this morning. In package.el, when one package 08:36.920 --> 00:08:39.033 is not needed anymore, the message is: 00:08:39.033 --> 00:08:41.300 "Package menu: Operation finished. 00:08:41.300 --> 00:08:44.880 1 packages are no longer needed", etc. 08:44.880 --> 00:08:49.633 So, I was asking whether we had best practices 00:08:49.633 --> 00:08:53.800 for using messages, and we had a whole thread 08:53.800 --> 00:08:57.867 about that. And while I was discussing on that 00:08:57.867 --> 00:09:01.240 thread, I started that new thread, which is: 09:01.240 --> 00:09:02.867 "package.el strings". 00:09:02.867 --> 00:09:09.900 The whole thing actually ended on June 27, 2018. 00:09:09.900 --> 00:09:15.400 So, a year after, with that message from Noam 00:09:15.400 --> 00:09:18.567 telling me that "Yes I can close the bug," 00:09:18.567 --> 00:09:22.040 and that was it. 09:22.040 --> 00:09:24.000 So, it took about a year to finish that. 00:09:24.000 --> 00:09:28.133 What I did learn basically is that 00:09:28.133 --> 00:09:32.160 helping with Emacs is not that difficult. 09:32.160 --> 00:09:36.100 It takes time when you're not fluent with the code, 00:09:36.100 --> 00:09:37.100 but that's okay because the reference 09:37.100 --> 00:09:39.300 is excellent, and there are lots of people 00:09:39.300 --> 00:09:41.520 who are here to help. NOTE Conclusion 09:41.520 --> 00:09:45.700 Basically, the solution to all our problems is 00:09:45.700 --> 00:09:47.733 "Keep It Simple and Straightforward". 00:09:47.733 --> 00:09:51.033 As you can see in that patch, 00:09:51.033 --> 00:09:53.233 even if it's a beginner's patch, 00:09:53.233 --> 00:09:57.733 what I did shows what can be done by Emacs Lisp 00:09:57.733 --> 00:09:59.533 beginners to help with "straightening" the strings 00:09:59.533 --> 00:10:02.267 to reduce the number of potential English bugs. 00:10:02.267 --> 00:10:04.533 And then to make Emacs strings easier 00:10:04.533 --> 00:10:07.233 to be handled by real localization processes one day. 00:10:07.233 --> 00:10:09.067 But it doesn't have to be about strings 00:10:09.067 --> 00:10:12.767 because strings can be an easy entry point to Emacs, 00:10:12.767 --> 00:10:16.720 but it can be any itch that you want to scratch. 10:16.720 --> 00:10:18.267 And my real conclusion is that 00:10:18.267 --> 00:10:22.160 Emacs is free software, and what that means is mostly 10:22.160 --> 00:10:24.067 that it allows you to do things that you would 00:10:24.067 --> 00:10:27.920 never have thought of being able to do before. 10:27.920 --> 00:10:32.000 That's really the biggest lesson to be learned here. 10:32.000 --> 00:10:33.400 So, I want to thank all the people 00:10:33.400 --> 00:10:37.920 who allowed this to be happening, allowed me to 10:37.920 --> 00:10:41.267 learn a bit and contribute a bit to that wonderful 00:10:41.267 --> 00:10:42.800 piece of software that Emacs is. 00:10:42.800 --> 00:10:44.533 And thank you everyone for listening, 00:10:44.533 --> 00:10:46.700 and hopefully I'll see you next year 00:10:46.700 --> 00:10:51.520 with a different translation related presentation. 10:51.520 --> 11:13.640 Thank you very much.