From e83f377aba7079eca2ab774e7f27f2704f669f43 Mon Sep 17 00:00:00 2001 From: Sacha Chua Date: Tue, 20 Dec 2022 13:05:54 -0500 Subject: add answer captions, add rest of IRC comments --- ...izing-emacs--jeanchristophe-helary--answers.vtt | 497 +++++++++++++++++++++ 1 file changed, 497 insertions(+) create mode 100644 2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt (limited to '2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt') diff --git a/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt b/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt new file mode 100644 index 00000000..a6a4ba40 --- /dev/null +++ b/2022/captions/emacsconf-2022-localizing--prelocalizing-emacs--jeanchristophe-helary--answers.vtt @@ -0,0 +1,497 @@ +WEBVTT + +00:00.000 --> 00:09.260 +Excellent. Thank you for the great talk. As someone whose first language wasn't English + +00:09.260 --> 00:14.960 +and speaks other languages, I think localization and internationalization is a very important + +00:14.960 --> 00:20.920 +topic that's near and dear to my heart, and especially when it comes to Emacs. I think + +00:20.920 --> 00:26.700 +there's a lot that we could do better. So, yeah, thanks so much. Folks, if you have questions, + +00:26.700 --> 00:32.880 +you can post them on IRC on the pad, and Jon-Karstof will answer them, and we will also open up + +00:32.880 --> 00:37.600 +this big blue button for people who would like to join here and ask their questions + +00:37.600 --> 00:45.760 +directly. Jon-Karstof, please take it away. Okay, thank you. I'm not seeing much activity + +00:45.760 --> 00:55.920 +on IRC or the pad, so let me add a few things. First, that patch was really interesting in + +00:55.920 --> 01:03.680 +terms of actually getting into the code and understanding how really can a beginner join + +01:03.680 --> 01:11.080 +development, even if it's just a few lines. I mentioned in the first part of the presentation + +01:11.080 --> 01:17.600 +that there was this small integration bug with Mac, and that's the thing that actually + +01:17.600 --> 01:22.400 +got me started, and that was interesting because at the time I was trying to use Aquamax because + +01:22.400 --> 01:28.280 +it looked simpler, and I thought, okay, if I need to fix that, rather than fixing it + +01:28.280 --> 01:34.400 +in Aquamax, maybe I should just go to Emacs and fix it there. So, that was the first attempt + +01:34.400 --> 01:40.440 +for me to actually contribute something serious, and it was really nice to – I mean, this + +01:40.440 --> 01:47.160 +Emacs development list is really amazing. 99% of the discussion is just way above your + +01:47.160 --> 01:54.120 +head, but sometimes you grasp something, and the more you grasp it, the more you understand + +01:54.120 --> 02:00.600 +and the more you feel like you can actually do something, especially since – I mean, + +02:00.600 --> 02:06.640 +as for all the free software development projects, most of them, I guess, it's really just do + +02:06.640 --> 02:13.920 +it kind of thing. And if you try to do something, somebody's going to help you, and what I + +02:13.920 --> 02:21.200 +really enjoy when being there is that the people are always very nice. Sometimes you + +02:21.200 --> 02:28.080 +feel some tension when there are discussions about a specific topic, but it's – everybody + +02:28.080 --> 02:37.520 +is really polite, I mean, 99% of the time. And what I like the most is all the people + +02:37.520 --> 02:42.680 +are very strong opinionated, so they have a very good idea of what Emacs should be or + +02:42.680 --> 02:47.640 +should not be, and so it gives you a very good idea of in what direction you should + +02:47.640 --> 02:57.400 +go. So that experience – I mean, pretty much those 2017, 2018 years were until now + +02:57.400 --> 03:02.040 +the peak of my Emacs activity. I've had to craddle with that because I was busy with + +03:02.040 --> 03:07.160 +other things, but I'm really planning to go back to working on maybe not localization + +03:07.160 --> 03:13.480 +because it's really – it's too big for me right now. And what I was told is that + +03:13.480 --> 03:20.520 +it involved a bit of C programming and things like this, so I'm not really into that right + +03:20.520 --> 03:30.840 +now. But I think eventually one day – I just turned 53, so I guess in a few years + +03:30.840 --> 03:36.800 +from now when I have more time, I guess I'll just dive in and just work on those localization + +03:36.800 --> 03:43.800 +issues and really to bring Emacs to a different world because I think it's – if we were + +03:43.800 --> 03:49.920 +able to have – it's a big job. I mean, it's really – if you check the threads + +03:49.920 --> 03:55.400 +on dev, check my name, you will see that I mostly post on translation or localization + +03:55.400 --> 04:01.360 +issues at least at the time. And I did an estimate of the sheer volume of strings to + +04:01.360 --> 04:10.360 +translate. For example, the manuals were about 2 million words. That's big. That's big. + +04:10.360 --> 04:14.040 +But it's okay. I mean, it's not something that's impossible. And if you check the strings + +04:14.040 --> 04:20.160 +– that was a really rough estimate. If you check the strings for Emacs proper, not even + +04:20.160 --> 04:29.120 +talking about the packages and things, I think that would add probably like 500,000 words. + +04:29.120 --> 04:34.360 +I mean, I have no idea, but my very rough estimate would be that. So it's not something + +04:34.360 --> 04:41.120 +that's impossible to do. And we'd have to ensure that we have a good process for people + +04:41.120 --> 04:46.200 +who review the strings and contribute new strings and things like this and also best + +04:46.200 --> 04:53.560 +practices like what I tried to show in this video. And I was really not trying to be dismissive + +04:53.560 --> 04:58.680 +about the people who worked on Package L because they did a wonderful job at actually helping + +04:58.680 --> 05:02.840 +people like me access all those packages. So it's – I mean, the point of the video + +05:02.840 --> 05:10.840 +is naturally to dismiss the code. But I was kind of scared because I was like, if they + +05:10.840 --> 05:18.720 +write code like this for strings, then what about the rest of the code? Is it – so it + +05:18.720 --> 05:25.560 +was kind of – I mean, something that I really can't evaluate. But I'm like – I mean, + +05:25.560 --> 05:30.600 +those guys obviously are really smart and they're trying to make intelligent things + +05:30.600 --> 05:37.400 +about how they want to factor their code, et cetera. But if they do that for strings, + +05:37.400 --> 05:44.400 +which is quite simple actually – I mean, it's simple to mess up strings. So I was + +05:44.400 --> 05:50.320 +like, what about the rest of the code? Is it that complex or that difficult to understand? + +05:50.320 --> 05:56.000 +So that's kind of a put off for me. I'm like, I really don't want to try to envisage + +05:56.000 --> 06:01.760 +that more because – plus it's not – it's really not my area at all. So anyway, that's + +06:01.760 --> 06:04.400 +what I wanted to add. Yeah. + +06:04.400 --> 06:11.680 +Awesome. Yeah, I think I pretty much agree with all of what you said. + +06:11.680 --> 06:17.360 +Yeah, yeah, yeah. I have a question – I see a question on the pad. I use Emacs on + +06:17.360 --> 06:23.520 +English, but my mother language is – no, no, no. Okay. So the answer is that Emacs + +06:23.520 --> 06:33.760 +is not localized. And my understanding is that right now it's not localizable. And + +06:33.760 --> 06:40.840 +those discussions took place about four or five years ago. So check on the dev list and + +06:40.840 --> 06:46.280 +you'll see the state of the discussion because there is only a discussion at the moment. + +06:46.280 --> 06:57.480 +What I did for package L, I think it was really just a one-time attempt at fixing one package. + +06:57.480 --> 07:05.640 +And I did check the other – a number of other packages in core Emacs. And not a lot + +07:05.640 --> 07:12.280 +of them had – I mean, as far as I checked. And I really did not check everything. But + +07:12.280 --> 07:20.840 +basically what you have to do is check all the functions that impact strings. And some + +07:20.840 --> 07:28.600 +are really not user-facing strings, so they're not really interesting for us. And actually, + +07:28.600 --> 07:34.640 +that's really interesting to do that. So if you just take one list package, list code + +07:34.640 --> 07:40.480 +and just go through the thing and just check all of print1, printc, message, format, concat + +07:40.480 --> 07:43.520 +and stuff and just see how it goes. + +07:43.520 --> 07:50.240 +So basically right now there is no infrastructure to localize the thing. There is no process + +07:50.240 --> 07:56.720 +to extract the strings. And there is no way to actually import them back into the code. + +07:56.720 --> 08:02.800 +So what we can do right now is really just what I did, make sure that it's eventually + +08:02.800 --> 08:10.760 +possible one day. And as I just shown, it's really not such a big deal. If you're very + +08:10.760 --> 08:19.800 +careful about understanding the way that the strings are handled, it's just a few rewrites + +08:19.800 --> 08:24.560 +away. I mean, it's really not much. So there's – I mean, there's not a lot to be proud + +08:24.560 --> 08:31.140 +about in my patch. But it was really fun. And I think it's a very good entry point + +08:31.140 --> 08:39.480 +for people like us. I suppose – I mean, I suppose the first person question. I mean, + +08:39.480 --> 08:44.240 +I don't know. Maybe I'm just – I should not suppose that. But people who really enjoy + +08:44.240 --> 08:51.320 +working in Emacs and just sometimes would like to contribute something and are not programmers + +08:51.320 --> 08:56.320 +or anything or maybe even programmers. I mean, I'm not excluding them. But that's really + +08:56.320 --> 09:02.280 +a good way to just start doing something. And eventually from there, you can – I mean, + +09:02.280 --> 09:07.020 +you just use a package that you like and that you think is important and just check the + +09:07.020 --> 09:10.200 +strings and do things like this. And then eventually, you'll find other parts of the + +09:10.200 --> 09:18.840 +code that you want to improve or add functions. So yeah, actually, the patch that I did, this + +09:18.840 --> 09:26.840 +patch is actually in the process of the thing that I started with Equimax. So I did one + +09:26.840 --> 09:35.600 +little thing regarding those that were not fully integrated in macOS. And then I did + +09:35.600 --> 09:41.880 +something about a small function. I think I added the possibility to add an option. + +09:41.880 --> 09:48.960 +I did documentation improvement as well. So really just little things. And then the deeper + +09:48.960 --> 09:53.000 +you dive, the more interesting it gets. And then you find something that you really want + +09:53.000 --> 10:07.160 +to do. So just use that entry point as a way to have fun in Emacs. + +10:07.160 --> 10:15.240 +Well, so I mentioned Regex on strings. Well, it's not really a red flag for localization. + +10:15.240 --> 10:28.080 +But the way it's used, I mean, I guess there are ways to properly use it. But I think really + +10:28.080 --> 10:38.400 +the basically using that means that you're making assumptions on the way language is + +10:38.400 --> 10:45.800 +structured. And I did exactly the same mistake on a different project that I'm working on. + +10:45.800 --> 10:51.280 +Actually, I'm in charge of rewriting a manual. And we were using Docbook. And I just thought + +10:51.280 --> 10:57.240 +it would be smart to have automated links to parts of the chapters, et cetera. And the + +10:57.240 --> 11:01.240 +thing is that depending on the language, you've got different ways to introduce chapters. + +11:01.240 --> 11:10.540 +So I should know that. I should know that. You should not automatically insert strings + +11:10.540 --> 11:20.720 +in code because it's going to produce something that can't be handled by the translator. So + +11:20.720 --> 11:28.840 +basically Regex on strings is something that probably you might use. But if you see, I + +11:28.840 --> 11:33.320 +mean, you can see the way it was used in the original code. So if you see something like + +11:33.320 --> 11:39.360 +that, I mean, just don't run and just fix the thing because there is no way these can + +11:39.360 --> 11:44.920 +be localized, I mean, extracted properly and then localized. And that's the reason too + +11:44.920 --> 11:50.480 +why numbers are a big problem because, for example, in English but in French too, we + +11:50.480 --> 11:56.920 +have only singular forms and plural forms. But some languages have zero forms. Some languages + +11:56.920 --> 12:03.720 +have two forms like pair forms. Some languages don't have a different form for anything. + +12:03.720 --> 12:09.920 +For example, I live in Japan. I work in Japanese. And in Japanese, you don't have a form. You + +12:09.920 --> 12:16.640 +don't have different inflections for words based on their number. So saying one whatever + +12:16.640 --> 12:23.400 +or two whatevers or an infinity of whatevers or even zero whatever, it's just the same + +12:23.400 --> 12:28.480 +form. So making assumption on the number of things and the way it's expressed in the language + +12:28.480 --> 12:34.640 +is usually, and that's something that we already know in free software. I mean, if you check + +12:34.640 --> 12:40.060 +the getex library, they've got everything sorted out. And that's something that was + +12:40.060 --> 12:46.880 +created in the 90s at Sun Microsystem. And then it was freed, et cetera. But when you + +12:46.880 --> 12:52.560 +see the work that it did at the time, you would kind of expect that people understand + +12:52.560 --> 12:58.920 +that. But no. And that's OK because developers develop and localizers localize. So we kind + +12:58.920 --> 13:04.820 +of split. But everything has been done already. So we just have to be aware of what's being + +13:04.820 --> 13:11.720 +done. And we have to be aware of the rules. And I think of one very good set of rules + +13:11.720 --> 13:19.880 +that's been online for a while. It's the Worldwide Consortium. They have a really good internationalization + +13:19.880 --> 13:26.640 +page where everything is pretty much black on white on paper, on the web at least. And + +13:26.640 --> 13:31.960 +if you read that, you can see exactly what should be done for localization, what should + +13:31.960 --> 13:35.880 +not be done, what should be avoided at all costs, et cetera, et cetera. + +13:35.880 --> 13:44.440 +So there are plenty of references here and there. And in terms of software localization, + +13:44.440 --> 13:49.980 +it's the same. If you check the getex page, you should be able to get an idea of what + +13:49.980 --> 13:59.240 +should be good. So is my project to localize all of Emacs? I wish it were. Eventually I'll + +13:59.240 --> 14:05.160 +be rich. Hopefully. I don't know. I'm working on that. It's not working well. But the day + +14:05.160 --> 14:11.540 +I can take just one year off totally and focus on that, I think that's something I would + +14:11.540 --> 14:18.760 +love to work on and just get up to speed with the process of programming all the things, + +14:18.760 --> 14:23.080 +checking all the things, and organizing the infrastructure. But seriously, I don't think + +14:23.080 --> 14:31.240 +that will ever happen because I'm a poor translator. And I still have, what, like 20 years to go + +14:31.240 --> 14:40.560 +before I can't work anymore. And we don't have savings or anything with the corona shit. + +14:40.560 --> 14:47.560 +So I don't think that's ever going to happen. But I would love to help. And yes, yes. How + +14:47.560 --> 14:53.480 +deep would useful localization go? Because the core of Emacs are duck strings and localization. + +14:53.480 --> 15:00.280 +Yes, yes, yes. I mean, all those discussions have been made. I mean, no conclusion reached. + +15:00.280 --> 15:07.880 +But we have addressed those things on the discussions. And so just, I mean, it's really + +15:07.880 --> 15:13.560 +pretentious to say, check my name on the Emacs table list because I've talked about that. + +15:13.560 --> 15:18.680 +It's really pretentious. But that's not what I'm saying. I mean, there has been a lot of + +15:18.680 --> 15:24.400 +discussion on the development list. So if you check for localization, translation, stuff + +15:24.400 --> 15:30.800 +like that, you'll see keywords, and you'll see the discussion. And people are aware of + +15:30.800 --> 15:36.440 +the issues. So I mean, we just need to have a framework for that. + +15:36.440 --> 15:40.120 +Thank you. Just to quickly chime in to say, I think we have about two more minutes of + +15:40.120 --> 15:45.800 +on stream Q&A. And then you're welcome to either stay here, Jean-Christophe, or continue + +15:45.800 --> 15:48.800 +taking questions on the pad on IRC. + +15:48.800 --> 15:57.120 +I think, well, I got to go to work. So I need to get ready. But I think, unless we have + +15:57.120 --> 16:08.760 +something on IRC, I think we're good. If you find something else that I've not addressed, + +16:08.760 --> 16:19.840 +I'm good. Otherwise, yes, yes, yeah, we need to take all the C code. But I mean, you can + +16:19.840 --> 16:29.160 +decide the level down to which you want to work. So you can go all the way to the C code. + +16:29.160 --> 16:32.920 +But actually, the C code is actually easier to extract because there is all these get + +16:32.920 --> 16:40.280 +text things that works on the C code already. So the issue is pretty much the Emacs Lisp + +16:40.280 --> 16:47.760 +code, as far as I can understand. So that would be the process that we need to address. + +16:47.760 --> 16:56.800 +Doc strings, indeed. But then the doc strings and the manual, they are very close. And actually, + +16:56.800 --> 17:03.560 +yeah, my estimate of the 500,000 word, I think it was based on doc strings. So yeah, we need + +17:03.560 --> 17:09.760 +to take all that. And that's an ongoing project that's not going to go away anyway. So we'll + +17:09.760 --> 17:12.760 +be here 10 years from now, I'm sure. + +17:12.760 --> 17:17.680 +OK, cool. And yeah, I think that's about all the time that we have on the stream. I guess + +17:17.680 --> 17:21.720 +if folks have further questions, they could maybe reach out to you later on IRC or via + +17:21.720 --> 17:22.720 +email. + +17:22.720 --> 17:29.640 +And I'll be back on the development list shortly, maybe six months from now. So yeah, I can + +17:29.640 --> 17:30.640 +take it from there. + +17:30.640 --> 17:31.640 +Sounds great. + +17:31.640 --> 17:32.640 +Thank you very much. + +17:32.640 --> 17:33.640 +Thank you very much. + +17:33.640 --> 17:34.640 +Yeah, thanks again for your great talk. Cheers. + +17:34.640 --> 17:35.640 +Cheers. + +17:35.640 --> 17:56.640 +OK, bye. + -- cgit v1.2.3