From 57ed51229a2531f7307fa6fc0864ca0694cd1c39 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Sun, 4 Dec 2022 16:00:30 -0500 Subject: Automated commit --- ...calizing-emacs--jeanchristophe-helary--main.vtt | 726 +++++++++++++++++++++ 1 file changed, 726 insertions(+) create mode 100644 2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt (limited to '2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt') diff --git a/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt b/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt new file mode 100644 index 00000000..a86af897 --- /dev/null +++ b/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--main.vtt @@ -0,0 +1,726 @@ +WEBVTT captioned by brandelune and bhavin192 + +NOTE Introduction + +00:00.000 --> 00:00:05.400 +Hello everyone, I am Jean-Christophe Helary, + +00:00:05.400 --> 00:00:09.680 +I live in Japan, and I'm a translator. + +00:09.680 --> 00:00:12.633 +Here is my second presentation on this very + +00:00:12.633 --> 00:00:15.300 +prestigious stage that is the Emacs conference. + +00:00:15.300 --> 00:00:18.367 +Following my "Let's Translate the 2 million words + +00:00:18.367 --> 00:00:21.767 +in the Emacs manual" in 2021, my topic this year, + +00:00:21.767 --> 00:00:25.167 +always related to translation, is + +00:00:25.167 --> 00:00:28.400 +pre-localizing Emacs or much less pretentiously, + +00:00:28.400 --> 00:00:31.933 +"Just make sure that your strings don't mix up plurals". + +NOTE Usage of package.el + +00:00:31.933 --> 00:00:36.133 +So, for some reason I resumed Emacs use + +00:00:36.133 --> 00:00:39.940 +around 2016, and as I was rediscovering the thing + +00:00:39.940 --> 00:00:42.800 +I found really old outline-mode files here + +00:00:42.800 --> 00:00:44.033 +and there on my machine. + +00:00:44.033 --> 00:00:45.140 +And I started to experiment + +00:00:45.140 --> 00:00:47.167 +again and write again with Emacs. + +00:00:47.167 --> 00:00:48.564 +I think that at the time, + +00:00:48.564 --> 00:00:50.433 +I was coming from Aquamacs and because of + +00:00:50.433 --> 00:00:53.400 +an integration bug with macOS, I decided + +00:00:53.400 --> 00:00:55.440 +to check what was going on in the code. + +00:55.440 --> 00:00:59.040 +That was my first official contribution. + +NOTE The bug in strings + +00:59.040 --> 00:01:02.233 +So as I was happily installing and uninstalling + +00:01:02.233 --> 00:01:05.267 +things, I noticed something weird one day. + +00:01:05.267 --> 00:01:09.080 +Let me enlarge that picture. + +01:09.080 --> 00:01:12.400 +See? And even if I were not a translator, + +00:01:12.400 --> 00:01:14.960 +I would not like that string, and obviously + +01:14.960 --> 00:01:16.833 +the same bug bites you when the string + +00:01:16.833 --> 00:01:20.520 +tells you to erase the package. + +01:20.520 --> 00:01:26.720 +Boom, so we agree that we have a problem here. + +NOTE Natural language engineering + +01:26.720 --> 00:01:29.067 +So, I started to do some spelunking into the code, + +00:01:29.067 --> 00:01:31.067 +and at least that was my feeling + +00:01:31.067 --> 00:01:33.100 +because I really am not a programmer + +00:01:33.100 --> 00:01:37.240 +by any stretch of the imagination. + +01:37.240 --> 00:01:39.467 +And what I found was an amazing piece of + +00:01:39.467 --> 00:01:41.840 +natural language engineering that was mixing code + +01:41.840 --> 00:01:44.267 +with English suffixes and all that, + +00:01:44.267 --> 00:01:46.267 +and I could see that the people who had + +00:01:46.267 --> 00:01:47.767 +written that code were pretty smart, + +00:01:47.767 --> 00:01:49.533 +but had missed a number of edge cases + +00:01:49.533 --> 00:01:51.280 +that produced the above bugs. + +01:51.280 --> 00:01:53.500 +That was my first experience with + +00:01:53.500 --> 00:01:55.033 +all the message related functions, + +00:01:55.033 --> 00:01:58.360 +"format", "concat", "message", etc. + +01:58.360 --> 00:02:00.433 +But even with my beginner's eyes I could see that + +00:02:00.433 --> 00:02:03.040 +something was off because when you want + +02:03.040 --> 00:02:06.000 +to produce natural language strings you never ever + +00:02:06.000 --> 00:02:08.600 +should use "replace-regex-in-string" to + +02:08.600 --> 00:02:11.067 +add an "ing" or an "ed" suffix + +00:02:11.067 --> 00:02:12.980 +to change the mode of a sentence. + +02:12.980 --> 00:02:16.840 +But that's what I was seeing was happening. + +NOTE More than a missed plural + +02:16.840 --> 00:02:20.333 +So, what we had to deal with here + +00:02:20.333 --> 00:02:22.220 +was way more than just a missed plural. + +02:22.220 --> 00:02:24.000 +It was an attempt at engineering all + +00:02:24.000 --> 00:02:26.400 +the message strings destined to the user + +00:02:26.400 --> 00:02:28.567 +with the smart code that was making assumptions + +00:02:28.567 --> 00:02:30.067 +on the structure of words, + +00:02:30.067 --> 00:02:33.220 +and in the localization world that's a big no-no. + +02:33.220 --> 00:02:36.667 +I'm a translator, and such UI strings issues + +00:02:36.667 --> 00:02:38.433 +have been sorted out decades ago. + +00:02:38.433 --> 00:02:41.320 +So I was a bit shocked. + +NOTE The final patch + +02:41.320 --> 00:02:43.533 +The final patch took me about a year to write, + +00:02:43.533 --> 00:02:45.380 +because I'm slow, because I needed to verify + +02:45.380 --> 00:02:47.167 +and understand a lot, because there are + +00:02:47.167 --> 00:02:49.100 +plenty of rules and plenty of people who are + +00:02:49.100 --> 00:02:51.433 +explaining you very nicely what the rules are, + +00:02:51.433 --> 00:02:53.733 +because I have kids, and because the + +00:02:53.733 --> 00:02:55.600 +Emacs development list is such a cool place to be + +00:02:55.600 --> 00:02:58.560 +that you often forget why you're there sometimes. + +02:58.560 --> 00:03:01.800 +Anyway, for people who can't click on a video, + +00:03:01.800 --> 00:03:03.640 +and I can't either, here are the relevant + +03:03.640 --> 00:03:05.840 +parts with some short comments. + +03:05.840 --> 00:03:07.800 +I'll be talking with localization in mind, + +00:03:07.800 --> 00:03:09.640 +knowing full well that Emacs localization + +03:09.640 --> 00:03:12.800 +is not on the map at the moment. + +03:12.800 --> 00:03:14.167 +So first, there is this thing + +00:03:14.167 --> 00:03:15.520 +about "format" and "concat". + +03:15.520 --> 00:03:17.800 +And if I remember correctly, + +00:03:17.800 --> 00:03:20.300 +"format" is better for user-facing things, + +00:03:20.300 --> 00:03:25.160 +and "concat" is better for internal things. + +03:25.160 --> 00:03:26.800 +Here, there are two things. + +03:26.800 --> 00:03:28.800 +First, a rule that we have when we prepare + +00:03:28.800 --> 00:03:30.700 +strings that need to be localized is + +00:03:30.700 --> 00:03:33.333 +never ever make assumptions on the way + +00:03:33.333 --> 00:03:35.780 +numbers are expressed in the language. + +03:35.780 --> 00:03:37.067 +Here, the assumption is that + +00:03:37.067 --> 00:03:40.000 +we have either a singular or plural form, + +00:03:40.000 --> 00:03:42.040 +and that's not always the case. + +03:42.040 --> 00:03:44.067 +That usually means that you should externalize + +00:03:44.067 --> 00:03:48.280 +numbers and find a generic way to express them. + +03:48.280 --> 00:03:50.833 +So it makes for slightly less natural + +00:03:50.833 --> 00:03:54.400 +language strings, but it's better anyway. + +03:54.400 --> 00:03:56.667 +Then we have that comma there that's trying + +00:03:56.667 --> 00:03:58.167 +to be externalized and that's weird, + +00:03:58.167 --> 00:04:02.620 +so I put it back into the sentence. + +04:02.620 --> 00:04:04.967 +Here we have another construct, or two rather, + +00:04:04.967 --> 00:04:06.960 +that really should not be used like this. + +04:06.960 --> 00:04:10.033 +It's "prin1" that uses quoting characters, + +00:04:10.033 --> 00:04:12.480 +just like "print", and "princ" that does not. + +04:12.480 --> 00:04:15.400 +And you see why they were combined together. + +04:15.400 --> 00:04:17.133 +And they were both trying to be really smart + +00:04:17.133 --> 00:04:19.780 +about which article to put in front of a vowel. + +04:19.780 --> 00:04:20.960 +And you just don't do that. + +04:20.960 --> 00:04:25.000 +You just keep things simple. + +04:25.000 --> 00:04:26.633 +Here again, the code is trying to be smart, + +00:04:26.633 --> 00:04:28.480 +but it's really not much more efficient than + +04:28.480 --> 00:04:34.940 +plainly stating what you want. + +04:34.940 --> 00:04:36.500 +And here again, we have "concat" things + +00:04:36.500 --> 00:04:40.367 +that we could just use to plainly state + +00:04:40.367 --> 00:04:41.980 +what we want to state. + +04:41.980 --> 00:04:49.880 +So, instead of "concat" I just put a "message". + +04:49.880 --> 00:04:52.260 +And here we have something that's very cute. + +04:52.260 --> 00:04:54.540 +It's a computerized plural. + +04:54.540 --> 00:04:55.700 +Here again, assuming that + +00:04:55.700 --> 00:04:58.640 +there are only plural or singular forms. + +04:58.640 --> 00:05:00.867 +But the end string is not that much more natural + +00:05:00.867 --> 00:05:02.700 +than the fix, the code is less efficient + +00:05:02.700 --> 00:05:07.760 +and is harder to understand. + +05:07.760 --> 00:05:09.433 +Here again, the code is trying to make + +00:05:09.433 --> 00:05:13.520 +smart things where it could be much simpler. + +05:13.520 --> 00:05:14.667 +That is the part where you get the + +00:05:14.667 --> 00:05:19.480 +number of packages and their names. + +05:19.480 --> 00:05:22.067 +Here the whole sentence with the semicolons + +00:05:22.067 --> 00:05:26.333 +and the question mark is split in parts, + +00:05:26.333 --> 00:05:29.180 +between which something will be inserted. + +05:29.180 --> 00:05:34.240 +That's really ugly and difficult to read. + +05:34.240 --> 00:05:37.700 +Here again, another "ing" waiting to be + +00:05:37.700 --> 00:05:44.840 +regex-inserted into the code. + +05:44.840 --> 00:05:46.633 +And here at last, we get to the point + +00:05:46.633 --> 00:05:48.760 +where everything started. + +05:48.760 --> 00:05:50.833 +And you can see that unlike in the other spots, + +00:05:50.833 --> 00:05:52.400 +there is no possibility for the expression + +05:52.400 --> 00:05:54.680 +to be singular. + +05:54.680 --> 00:05:57.600 +So, I guess that if it hadn't been for that bug, + +00:05:57.600 --> 00:05:59.320 +I would not have found the other items, + +05:59.320 --> 00:06:01.033 +and we would be left with code that works, + +00:06:01.033 --> 00:06:02.033 +of course, but that is + +00:06:02.033 --> 00:06:06.020 +harder to understand, and maintain. + +06:06.020 --> 00:06:08.333 +Last but not least, a last version of + +00:06:08.333 --> 00:06:10.920 +"just plainly state what you mean to state". + +06:10.920 --> 00:06:14.880 +Keep it simple. + +NOTE "What did I learn, and how did I learn it?" + +06:14.880 --> 00:06:19.267 +So first, we have this wonderful CONTRIBUTE file + +00:06:19.267 --> 00:06:21.267 +that is very explicit about + +00:06:21.267 --> 00:06:23.520 +how we must proceed when contributing code. + +06:23.520 --> 00:06:25.233 +So, that's really the first place + +00:06:25.233 --> 00:06:27.760 +that we should all read. + +06:27.760 --> 00:06:29.333 +The README file is pretty cool too, + +00:06:29.333 --> 00:06:30.967 +especially at the beginning of the process, + +00:06:30.967 --> 00:06:31.867 +when you're not sure whether + +00:06:31.867 --> 00:06:36.240 +you want to fix that bug or just report it. + +NOTE Useful packages + +06:36.240 --> 00:06:37.920 +And then we've got packages. + +06:37.920 --> 00:06:39.900 +We've got a number of packages that are really + +00:06:39.900 --> 00:06:42.600 +helpful when it comes to reading + +00:06:42.600 --> 00:06:45.880 +the information and the manuals. + +06:45.880 --> 00:06:48.000 +I'm mentioning three of them here, + +00:06:48.000 --> 00:06:53.720 +and I think they are the most important for us. + +NOTE Package: helpful + +06:53.720 --> 00:06:55.600 +So "helpful" is on the right, + +00:06:55.600 --> 00:06:58.667 +and it's overflowing the window with + +00:06:58.667 --> 00:07:01.900 +all the contextualized information it provides, + +00:07:01.900 --> 00:07:05.280 +and the standard "help" is on the left. + +07:05.280 --> 00:07:07.933 +I mean, really there are like two or three + +00:07:07.933 --> 00:07:11.567 +screen-full of information in the "helpful" output, + +00:07:11.567 --> 00:07:13.233 +so you really only see a part, + +00:07:13.233 --> 00:07:16.320 +but I guess if you use it, you know what I'm saying. + +07:16.320 --> 00:07:18.867 +What I like the most here is the "view in manual" + +00:07:18.867 --> 00:07:21.800 +part, where you can actually click and even get + +00:07:21.800 --> 00:07:23.667 +more information that's sometimes + +00:07:23.667 --> 00:07:28.400 +easier to read and understand. + +NOTE Package: inform + +07:28.400 --> 00:07:33.640 +And then you've got the "info" versus "inform" formats. + +07:33.640 --> 00:07:34.567 +When you're in the manual, + +00:07:34.567 --> 00:07:37.140 +"inform" makes a huge difference. + +07:37.140 --> 00:07:39.367 +You can see here that you've got colorized items, + +00:07:39.367 --> 00:07:42.000 +and also in the middle you've got that + +07:42.000 --> 00:07:45.000 +'read' part that's green and bold. + +07:45.000 --> 00:07:49.333 +In "info" it's not a specific object, + +00:07:49.333 --> 00:07:52.200 +it's just a string. In 'inform' it's actually + +00:07:52.200 --> 00:07:53.800 +a link that you can click, + +00:07:53.800 --> 00:07:58.320 +and actually go to that 'read' manual page. + +NOTE Package: which-key + +07:58.320 --> 00:08:01.300 +Now, we've got "which-key". + +08:01.300 --> 00:08:03.400 +"which-key" is a savior for beginners too. + +08:03.400 --> 00:08:04.867 +Just wait half a second or something, + +00:08:04.867 --> 00:08:06.500 +and Emacs will show you all the keys + +00:08:06.500 --> 00:08:08.433 +that you can access from the prefix combination + +00:08:08.433 --> 00:08:09.920 +that you just typed. + +08:09.920 --> 00:08:13.200 +So, it's really helpful for discovering functions + +00:08:13.200 --> 00:08:19.160 +and learning new functions, getting used to them. + +NOTE It all started with this messageā€¦ + +08:19.160 --> 00:08:21.500 +And so that whole process startedā€¦, + +00:08:21.500 --> 00:08:26.533 +it was May 23, 2017, + +00:08:26.533 --> 00:08:30.440 +with that thread when I found the bug. + +08:30.440 --> 00:08:32.800 +I just bumped into an English/code bug + +00:08:32.800 --> 00:08:36.920 +this morning. In package.el, when one package + +08:36.920 --> 00:08:39.033 +is not needed anymore, the message is: + +00:08:39.033 --> 00:08:41.300 +"Package menu: Operation finished. + +00:08:41.300 --> 00:08:44.880 +1 packages are no longer needed", etc. + +08:44.880 --> 00:08:49.633 +So, I was asking whether we had best practices + +00:08:49.633 --> 00:08:53.800 +for using messages, and we had a whole thread + +08:53.800 --> 00:08:57.867 +about that. And while I was discussing on that + +00:08:57.867 --> 00:09:01.240 +thread, I started that new thread, which is: + +09:01.240 --> 00:09:02.867 +"package.el strings". + +00:09:02.867 --> 00:09:09.900 +The whole thing actually ended on June 27, 2018. + +00:09:09.900 --> 00:09:15.400 +So, a year after, with that message from Noam + +00:09:15.400 --> 00:09:18.567 +telling me that "Yes I can close the bug," + +00:09:18.567 --> 00:09:22.040 +and that was it. + +09:22.040 --> 00:09:24.000 +So, it took about a year to finish that. + +00:09:24.000 --> 00:09:28.133 +What I did learn basically is that + +00:09:28.133 --> 00:09:32.160 +helping with Emacs is not that difficult. + +09:32.160 --> 00:09:36.100 +It takes time when you're not fluent with the code, + +00:09:36.100 --> 00:09:37.100 +but that's okay because the reference + +09:37.100 --> 00:09:39.300 +is excellent, and there are lots of people + +00:09:39.300 --> 00:09:41.520 +who are here to help. + +NOTE Conclusion + +09:41.520 --> 00:09:45.700 +Basically, the solution to all our problems is + +00:09:45.700 --> 00:09:47.733 +"Keep It Simple and Straightforward". + +00:09:47.733 --> 00:09:51.033 +As you can see in that patch, + +00:09:51.033 --> 00:09:53.233 +even if it's a beginner's patch, + +00:09:53.233 --> 00:09:57.733 +what I did shows what can be done by Emacs Lisp + +00:09:57.733 --> 00:09:59.533 +beginners to help with "straightening" the strings + +00:09:59.533 --> 00:10:02.267 +to reduce the number of potential English bugs. + +00:10:02.267 --> 00:10:04.533 +And then to make Emacs strings easier + +00:10:04.533 --> 00:10:07.233 +to be handled by real localization processes one day. + +00:10:07.233 --> 00:10:09.067 +But it doesn't have to be about strings + +00:10:09.067 --> 00:10:12.767 +because strings can be an easy entry point to Emacs, + +00:10:12.767 --> 00:10:16.720 +but it can be any itch that you want to scratch. + +10:16.720 --> 00:10:18.267 +And my real conclusion is that + +00:10:18.267 --> 00:10:22.160 +Emacs is free software, and what that means is mostly + +10:22.160 --> 00:10:24.067 +that it allows you to do things that you would + +00:10:24.067 --> 00:10:27.920 +never have thought of being able to do before. + +10:27.920 --> 00:10:32.000 +That's really the biggest lesson to be learned here. + +10:32.000 --> 00:10:33.400 +So, I want to thank all the people + +00:10:33.400 --> 00:10:37.920 +who allowed this to be happening, allowed me to + +10:37.920 --> 00:10:41.267 +learn a bit and contribute a bit to that wonderful + +00:10:41.267 --> 00:10:42.800 +piece of software that Emacs is. + +00:10:42.800 --> 00:10:44.533 +And thank you everyone for listening, + +00:10:44.533 --> 00:10:46.700 +and hopefully I'll see you next year + +00:10:46.700 --> 00:10:51.520 +with a different translation related presentation. + +10:51.520 --> 11:13.640 +Thank you very much. -- cgit v1.2.3