From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from srv.colino.net (srv.colino.net [212.83.157.151]) by sourceware.org (Postfix) with ESMTPS id B95EB3858D33 for ; Tue, 8 Aug 2023 07:07:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B95EB3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=colino.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=colino.net Date: Tue, 8 Aug 2023 09:07:40 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colino.net; s=dkim; t=1691478473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bZ+En6tTywtZUXz0lDxZF8Qqx1LXo3PeMYdXjkmtySY=; b=XId2W/g4xRKB4ZD71EGoH5ueLwDUwFngsdHutYm2NkaStSor0cLvWMr0iMSlQj1qs5yzwm k9QuQ3xnXwjlhLy7JZditCr5GDzwyA6rqjSl6HwZPq9IPVGDXU5YAsVzpqSTlVaFOMckAE dFQbd6krY3sb9SWVAo7KS+cCaypu0Uk= Authentication-Results: srv.colino.net; auth=pass smtp.mailfrom=colin@colino.net From: Colin Leroy-Mira To: libc-alpha@sourceware.org Subject: Re: [PATCH v2] localedata: Translit common emojis to smileys [BZ #30649] Message-ID: <20230808090740.1fc2bf9c.colin@colino.net> In-Reply-To: <20230721141101.3337118-1-colin@colino.net> References: <20230719161707.1558085-1-colin@colino.net> <20230721141101.3337118-1-colin@colino.net> X-Face: Fy:*XpRna1/tz}cJ@O'0^:qYs:8b[Rg`*8,+o^[fI?<%5LeB,Xz8ZJK[r7V0hBs8G)*&C+XA0qHoR=LoTohe@7X5K$A-@cN6n~~J/]+{[)E4h'lK$13WQf$.R+Pi;E09tk&{t|;~dakRD%CLHrk6m!?gA,5|Sb=fJ=>[9#n1Bu8?VngkVM4{'^'V_qgdA.8yn3) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAIVBMVEUAAwASFBYfISMyMTRGRUdaWFdxb2yJhoKrqaPOzMT19Ow5JzfkAAACW0lEQVQ4y6WTP1PbMBjGo6T/RkmmLWxIotCxtkRpmYqVMLAFY2i7AU0DbEDACVOvd6aED9DYnntg61NWCo0tt2N1vpyjn5/30ftIajT+Z4BVLLw3/0w7a+MsVXln8a/5x+E47fXT5FLW55ubP0ZFViTj6KImcXb7l5Pr4XD4JQo3bNDSVdIbLgLhd3ZssLwXHyUXwheACWnXivrx13bHRU3ECHMt6yjsDA8XIWygJsZeBZ5E/mrHBwAgpB9W1XodSw+Dh3cEvA8lODrQVR5i0Q8pTUB0Dq3I3LKTR/3cQyXC5KzM6b1Kugj9+YfI91LxWaUHej1gqsL0vAzkSiUHZqXGFToWuFUTSRHExEiarAsrcO1ThPDUpuVszAC4KgbcI7plQ5q4Cus2Dzh30e72w3ekbCnpSSERXpJGAdhs4Y35ImZiLVNn0y7RyqxzcKvUwDtRKne5JnhuFmJLKXXnZ/p3X0tAZQ4MONUy1UWgAXG1H/rj/HRLA1fbQuCUYN1Ueakp0UkCwMqgW1mhTvFxca67hxDxEoCtvNhHDtd5IQAc65Qs5PcSMbHjasD96iyA+TymlEujcIoza5tfFGlAKOceQm9VYG2/KFRAGeeUoOXUPtRsMupxJrkutTSuXYPonT4MbaNYOq6BlVHAPWEA34O1i5YGQmoPXWsP1a7aeEI5oxRj9LEOtu449SgmBB6FtVpPB9rBDChuagDs8Okg4FVG7Ft7eRKLtmnEWc66Fvg0Wc/boeQu7u6Ovln2PwcL6ZdO2yOrz3ujX6BySA6fpXEovbntcHNwb9x/A34/oQVbOAIKAAAAAElFTkSuQmCC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: ----- X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi folks,=20 Ping :) I'd love a review on this! Thanks! > Add common emojis to the translit-able characters (mostly > faces and hearts), and translit them to old-fashioned > smileys. >=20 > Author: Colin Leroy-Mira > Signed-off-by: Colin Leroy-Mira > --- > v2: Fix a wrong smiley, add unit test > localedata/Makefile | 3 + > localedata/locales/translit_emojis | 91 ++++++++++++++++++++ > localedata/locales/translit_neutral | 1 + > localedata/tst-iconv-emojis-trans.c | 124 ++++++++++++++++++++++++++++ > 4 files changed, 219 insertions(+) > create mode 100644 localedata/locales/translit_emojis > create mode 100644 localedata/tst-iconv-emojis-trans.c >=20 > diff --git a/localedata/Makefile b/localedata/Makefile > index 3619b6d47e..5b6d10e33f 100644 > --- a/localedata/Makefile > +++ b/localedata/Makefile > @@ -164,6 +164,7 @@ tests =3D \ > bug-usesetlocale \ > tst-c-utf8-consistency \ > tst-digits \ > + tst-iconv-emojis-trans \ > tst-iconv-math-trans \ > tst-leaks \ > tst-mbswcs1 \ > @@ -320,6 +321,8 @@ LOCALES :=3D \ > =20 > include ../gen-locales.mk > =20 > +$(objpfx)tst-iconv-emojis-trans.out: $(gen-locales) > + > $(objpfx)tst-iconv-math-trans.out: $(gen-locales) > endif > =20 > diff --git a/localedata/locales/translit_emojis b/localedata/locales/tran= slit_emojis > new file mode 100644 > index 0000000000..260aeedc35 > --- /dev/null > +++ b/localedata/locales/translit_emojis > @@ -0,0 +1,91 @@ > +escape_char / > +comment_char % > + > +% This file is part of the GNU C Library and contains locale data. > +% The Free Software Foundation does not claim any copyright interest > +% in the locale data contained in this file. The foregoing does not > +% affect the license of the GNU C Library as a whole. It does not > +% exempt you from the conditions of the license if your use would > +% otherwise be governed by that license. > + > +% Transliterations of emojis to ASCII smileys. > +% Generated algorithmically. > + > +LC_CTYPE > + > +translit_start > + > + "" % WHITE HEART SUIT > + "" % BLACK HEART SUIT > + "" % HEAVY BLACK HEART > + "" % BLUE HEART > + "" % BEATING HEART > + "" % BROKEN HEART > + "" % SPARKLING HEART > + "" % GROWING HEART > + "" % GREEN HEART > + "" % YELLOW HEART > + "" % PURPLE HEART > + "" % BLACK HEART > + "" % ORANGE HEART > + "" % WHITE HEART > + "" % BROWN HEART > + "" % GRINNING FACE > + "" % GRINNING FACE WITH SMILING EYES > + "" % FACE WITH TEARS OF JOY > + "" % SMILING FACE WITH OPEN MOUTH (C.F.= =E2=98=BA) > + "" % SMILING FACE WITH OPEN MOUTH AND S= MILING EYES > + "" % SMILING FACE WITH OPEN MOUTH AND C= OLD SWEAT > + "" % SMILING FACE WITH OPEN MOUTH AND T= IGHTLY-CLOSED EYES > + "" % SMILING FACE WITH HALO > + "" % SMILING FACE WITH HORNS > + "" % WINKING FACE > + "" % SMILING FACE WITH SMILING EYES > + "" % FACE SAVOURING DELICIOUS FOOD > + "" % RELIEVED FACE > + "" % SMILING FACE WITH HEART-SHAPED EYES > + "" % SMILING FACE WITH SUNGLASSES > + "" % SMIRKING FACE > + "" % NEUTRAL FACE > + "" % EXPRESSIONLESS FACE > + "" % UNAMUSED FACE > + "" % FACE WITH COLD SWEAT > + "" % PENSIVE FACE > + "" % CONFUSED FACE > + "" % CONFOUNDED FACE > + "" % KISSING FACE > + "" % FACE THROWING A KISS > + "" % KISSING FACE WITH SMILING EYES > + "" % KISSING FACE WITH CLOSED EYES > + "" % FACE WITH STUCK-OUT TONGUE > + "" % FACE WITH STUCK-OUT TONGUE AND WIN= KING EYE > + "" % FACE WITH STUCK-OUT TONGUE AND TIG= HTLY-CLOSED EYES > + "" % DISAPPOINTED FACE > + "" % WORRIED FACE > + "" % ANGRY FACE > + "" % POUTING FACE > + "" % CRYING FACE > + "" % PERSEVERING FACE > + "" % FROWNING FACE WITH OPEN MOUTH > + "" % ANGUISHED FACE > + "" % FEARFUL FACE > + "" % WEARY FACE > + "" % LOUDLY CRYING FACE > + "" % FACE WITH OPEN MOUTH > + "" % HUSHED FACE > + "" % FACE WITH OPEN MOUTH AND CO= LD SWEAT > + "" % FACE SCREAMING IN FEAR > + "" % ASTONISHED FACE > + "" % GRINNING CAT FACE WITH SMILING EYES > + "" % CAT FACE WITH TEARS OF JOY > + "" % SMILING CAT FACE WITH OPEN MOUTH > + "" % SMILING CAT FACE WITH HEART-SHAPE = EYES > + "" % CAT FACE WITH WRY SMILE > + "" % KISSING CAT FACE WITH CLOSED EYES > + "" % SLIGHTLY FROWNING FACE > + "" % SLIGHTLY SMILING FACE > + "" % UPSIDE-DOWN FACE > + > +translit_end > + > +END LC_CTYPE > diff --git a/localedata/locales/translit_neutral b/localedata/locales/tra= nslit_neutral > index 72f66220b7..57412ae565 100644 > --- a/localedata/locales/translit_neutral > +++ b/localedata/locales/translit_neutral > @@ -17,6 +17,7 @@ translit_start > include "translit_circle";"" > include "translit_cjk_compat";"" > include "translit_compat";"" > +include "translit_emojis";"" > include "translit_font";"" > include "translit_fraction";"" > include "translit_narrow";"" > diff --git a/localedata/tst-iconv-emojis-trans.c b/localedata/tst-iconv-e= mojis-trans.c > new file mode 100644 > index 0000000000..89a32074d5 > --- /dev/null > +++ b/localedata/tst-iconv-emojis-trans.c > @@ -0,0 +1,124 @@ > +/* Test some emoji transliterations > + > + Copyright (C) 2019-2023 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > + > +static int > +do_test (void) > +{ > + iconv_t cd; > + > + const int num_emojis =3D 70; > + > + const char str[] =3D "\u2661 \u2665 \u2764 \U0001F499 " > + "\U0001F493 \U0001F494 \U0001F496 " > + "\U0001F497 \U0001F49A \U0001F49B " > + "\U0001F49C \U0001F5A4 \U0001F9E1 " > + "\U0001F90D \U0001F90E \U0001F600 " > + "\U0001F601 \U0001F602 \U0001F603 " > + "\U0001F604 \U0001F605 \U0001F606 " > + "\U0001F607 \U0001F608 \U0001F609 " > + "\U0001F60A \U0001F60B \U0001F60C " > + "\U0001F60D \U0001F60E \U0001F60F " > + "\U0001F610 \U0001F611 \U0001F612 " > + "\U0001F613 \U0001F614 \U0001F615 " > + "\U0001F616 \U0001F617 \U0001F618 " > + "\U0001F619 \U0001F61A \U0001F61B " > + "\U0001F61C \U0001F61D \U0001F61E " > + "\U0001F61F \U0001F620 \U0001F621 " > + "\U0001F622 \U0001F623 \U0001F626 " > + "\U0001F627 \U0001F628 \U0001F629 " > + "\U0001F62D \U0001F62E \U0001F62F " > + "\U0001F630 \U0001F631 \U0001F632 " > + "\U0001F638 \U0001F639 \U0001F63A " > + "\U0001F63B \U0001F63C \U0001F63D " > + "\U0001F641 \U0001F642 \U0001F643"; > + > + const char expected[] =3D "<3 <3 <3 <3 <3 " > + " + "<3 <3 <3 <3 <3 " > + ":-D :-D :'D :-D :-D " > + ":-D :-D O:-) >:) ;-) " > + ":-) :-P :-) :-* B-) " > + ";-) :-| :-| :-| :'-| " > + ":-| :-/ :-S :-* :-* " > + ":-* :-* :-P ;-P X-P " > + ":-( :-( >:-( :-( :'-( " > + "X-( :-O :-O :-O :-O " > + ":\"-( :-O :-O :'-O :-O " > + ":-O :-3 :'-3 :-3 :-3 " > + ";-3 :-3 :-( :-) (-:"; > + > + char *inptr =3D (char *) str; > + size_t inlen =3D strlen (str) + 1; > + char outbuf[500]; > + char *outptr =3D outbuf; > + size_t outlen =3D sizeof (outbuf); > + int result =3D 0; > + size_t n; > + > + if (setlocale (LC_ALL, "en_US.UTF-8") =3D=3D NULL) > + FAIL_EXIT1 ("setlocale failed"); > + > + cd =3D iconv_open ("ASCII//TRANSLIT", "UTF-8"); > + if (cd =3D=3D (iconv_t) -1) > + FAIL_EXIT1 ("iconv_open failed"); > + > + n =3D iconv (cd, &inptr, &inlen, &outptr, &outlen); > + if (n !=3D num_emojis) > + { > + if (n =3D=3D (size_t) -1) > + printf ("iconv() returned error: %m\n"); > + else > + printf ("iconv() returned %zd, expected %d\n", n, num_emojis); > + result =3D 1; > + } > + if (inlen !=3D 0) > + { > + puts ("not all input consumed"); > + result =3D 1; > + } > + else if (inptr - str !=3D strlen (str) + 1) > + { > + printf ("inptr wrong, advanced by %td\n", inptr - str); > + result =3D 1; > + } > + if (memcmp (outbuf, expected, sizeof (expected)) !=3D 0) > + { > + printf ("result wrong: \"%.*s\", expected: \"%s\"\n", > + (int) (sizeof (outbuf) - outlen), outbuf, expected); > + result =3D 1; > + } > + else if (outlen !=3D sizeof (outbuf) - sizeof (expected)) > + { > + printf ("outlen wrong: %zd, expected %zd\n", outlen, > + sizeof (outbuf) - sizeof (expected)); > + result =3D 1; > + } > + else > + printf ("output is \"%s\" which is OK\n", outbuf); > + > + return result; > +} > + > +#include --=20 Colin