From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62e.google.com (mail-ej1-x62e.google.com [IPv6:2a00:1450:4864:20::62e]) by sourceware.org (Postfix) with ESMTPS id 894F53858002 for ; Tue, 30 Mar 2021 17:33:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 894F53858002 Received: by mail-ej1-x62e.google.com with SMTP id u21so25986779ejo.13 for ; Tue, 30 Mar 2021 10:33:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=A5dh7MoPInLmaiHx4+YQAftp8iDOm96JA58ulNUjlCM=; b=L4Pfn+pCKhxDf1vrpXoAHiqKwTiLNnfSHn4qoqI41Ud20rIhfMOmgYWPLfhRV10jxr JjY2aeAMWyG8ZHTyn2AIMTpSTF9IJ9q4XW49tdLJRoRMYxDcKJrseS/0n621zQDeKXS9 bS0GFpbGgGlM6c0SiNp8bNsts+c6SvHcdcywV71qh1TrvfIDd2dtoNS20U9YYx7gWPCi t8l7QI7UilGuMd3d8lrllcWn+64lpX4CsSZiciJgrUnWoD1he1lUHInT6uWZvoEwFh34 9Fnbr8xZ1S9jt8Cj0YxFDWTXcSWNvR3XF4NT01DpTIcw3lv/rmizfElxLQtpQq1Yl+vh ZIog== X-Gm-Message-State: AOAM532JVmOL6IDGUdahhmD4ZOkB9tjN68nWtKwWYp99DLzuNmqqASZ3 ksNu6aiy4Zs/q34jt/32UWEhtGGhY1l8ErasBULvLA== X-Google-Smtp-Source: ABdhPJxULrCQD4YPfvsgQke9wopvFtX44gXdgNQ36P3I5ym5Ovdx288+SqVqRNw8yJwmA68XmLlu/ru7b7k3pk7Lt+c= X-Received: by 2002:a17:906:a0d4:: with SMTP id bh20mr34295030ejb.348.1617125630348; Tue, 30 Mar 2021 10:33:50 -0700 (PDT) MIME-Version: 1.0 References: <20210315184211.4124573-1-yuanzi@google.com> <2003e08c-55e2-80fa-89a6-fb8d59cc0e77@redhat.com> <20210316142816.GC4427@arm.com> In-Reply-To: From: Lirong Yuan Date: Tue, 30 Mar 2021 10:33:39 -0700 Message-ID: Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively To: Adhemerval Zanella Cc: "Carlos O'Donell" , libc-alpha@sourceware.org, Shu-Chun Weng , Szabolcs Nagy X-Spam-Status: No, score=-19.3 required=5.0 tests=BAYES_00, DKIMWL_WL_MED, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, USER_IN_DEF_DKIM_WL, USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Mar 2021 17:33:55 -0000 Friendly Ping~ Also sent the v2 patch for using the expected types: http://patchwork.sourceware.org/project/glibc/patch/20210330172518.184058-1= -yuanzi@google.com/ Regards, Lirong On Fri, Mar 19, 2021 at 11:31 AM Lirong Yuan wrote: > On Tue, Mar 16, 2021 at 12:47 PM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> Could you check if using the expected types yields any regression? >> All their usages are using explicit cast to the expected types, so >> I am can't see why they have declared as char at first place. > > > Took a while to get it right but checked that using the expected types di= d > not yield any regression. :) > > On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> But wouldn't the array alignment change cause this very changes as well? >> The ABI of binary data exposed is already exported as 'unsigned short', >> so not sure if will change anything here. I am not sure about on-disk >> layout and I don't recall the ALTMON change specifically. > > > Layout seemed to change for the newly built libraries. Clean lib: 6982: > 0000000000025f65 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class > 6983: 0000000000026265 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 6984: 0000000000027588 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > 6985: 00000000000272f8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 6986: 00000000000274ac 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 6987: 00000000000274f0 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 6988: 0000000000027340 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 6989: 0000000000027460 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 6990: 00000000000272b0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 6991: 0000000000027414 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 6992: 000000000002753c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 6993: 00000000000273d0 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 6994: 0000000000027268 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 6995: 0000000000027384 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > > New lib built using alignas: 5662: 000000000002ac5e 768 OBJECT LOCAL > HIDDEN 12 _nl_C_LC_CTYPE_class > 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > > New lib built using expected types: 5662: 000000000002ac5e 768 OBJECT > LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class > 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class32 > 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_upper > 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_lower > 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alpha > 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_digit > 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_xdigit > 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_space > 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_print > 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_graph > 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_blank > 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_cntrl > 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_punct > 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12 > _nl_C_LC_CTYPE_class_alnum > > So were it the case we got lucky that linker is laying out the data >> structures >> in the expected alignment? > > > Yeah, sort of like a race condition that can only be discovered under > certain circumstances. > In this case, the condition is running an arm program compiled with ubsan > under qemu user mode emulation on a x86 machine. > > Regards, > Lirong > > On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella < > adhemerval.zanella@linaro.org> wrote: > >> >> >> On 16/03/2021 18:05, Carlos O'Donell wrote: >> > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: >> >> >> >> >> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: >> >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell >> wrote: >> >>> >> >>>> My expectation is that normally aarch64 simply handles the unaligne= d >> load >> >>>> without any problems, >> >>>> but that it would be "better" if it were 16-bit aligned? >> >>>> Is this the *only* case of misaligned pointers? >> >>> >> >>> >> >>> Yes, this is the only case reported by UBSan. >> >>> >> >>>> Signed-off-by: Lirong Yuan >> >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this >> line >> >>>> would >> >>>> be normally removed, and you as the git author remains. >> >>> >> >>> >> >>> Thanks for the explanation! I will send an updated patch without >> >>> "Signed-off-by" if the current approach looks good. :) >> >>> >> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy >> wrote: >> >>> >> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote: >> >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >> >>>>>> steps to reproduce the problem: compile a program that uses ctype >> >>>> functions such as =E2=80=9Cisspace=E2=80=9D for aarch64 with UBSan = flag >> >>>> =E2=80=9C-fsanitize=3Dundefined=E2=80=9D and run it on x86_64 machi= nes with qemu user >> mode >> >>>> emulation. >> >>>>> >> >>>>> Szabolcs, >> >>>>> >> >>>>> Do you have any input on this? >> >>>>> >> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports >> >>>> misaligned-pointer-use in the program. >> >>>>> >> >>>>> Yes, the char array could be misaligned with respect to a 16-bit >> value, >> >>>>> and should be aligned to the type that is expected from the >> interface >> >>>> e.g. >> >>>> >> >>>> using char[] as uint16_t[] is aliasing violation. and in principle >> >>>> alignas on the definition does not fix this, but in practice that's >> >>>> the only abi visible aspect of the wrong type. >> >>>> >> >>> >> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and >> >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t >> arrays, >> >>> like _nl_C_LC_CTYPE_toupper array: >> >>> >> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CT= YPE_toupper >> >>> Though the conversion may be error-prune and require more test >> cases... >> >>> It would seem that using alignas is an approach that's both >> technically >> >>> correct and less likely to cause havoc. >> >> >> >> Could you check if using the expected types yields any regression? >> >> All their usages are using explicit cast to the expected types, so >> >> I am can't see why they have declared as char at first place. >> > >> > It might work, but it: >> > >> > * changes the ABI of binary data exposed via nl_langinfo()? >> > - Pointers to the array are returned. >> > >> > * changes the on-disk layout and invalidates all system binary locales= ? >> > - Something similar happened for ALTMON and we just rebuilt >> everything. >> > - It isn't entirely clear to me what our guarantee of binary >> compatibility >> > is for compiled locales. >> > >> > Is this a concern? >> > >> >> But wouldn't the array alignment change cause this very changes as well? >> The ABI of binary data exposed is already exported as 'unsigned short', >> so not sure if will change anything here. I am not sure about on-disk >> layout and I don't recall the ALTMON change specifically. >> >> >>>> i'm not sure why ubsanitizer cares about alignment specifically on >> >>>> aarch64, unaligned load should work. >> >>>> >> >>> >> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment >> is a >> >>> check for misaligned rather than unaligned. It's almost always worth >> fixing >> >>> since this can cause subtle and hard to track down failures that mor= e >> often >> >>> manifest on other architectures. >> >> >> >> I would expect that it this is really accessed in an unaligned manner >> it >> >> would blow in some architectures (sparc and some arm and mips >> environments). >> >> Not sure why we haven't see any issues on such architectures. >> > >> > AFAICT the structure is *not* misaligned on x86_64 and it has to do >> with the >> > vagaries of the compiler and linker you use and the alignment of globa= l >> > variables. On x86_64 the structures are all over-aligned. I expect the >> same >> > is true for all other arches. Perhaps Google used lld with an >> experimental >> > glibc patch and they have more tightly aligned object layout. >> >> So were it the case we got lucky that linker is laying out the data >> structures >> in the expected alignment? >> >> > >> > e.g. >> > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class >> > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class32 >> > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_alpha >> > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_print >> > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_alnum >> > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_lower >> > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_xdigit >> > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_graph >> > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_blank >> > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_punct >> > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_digit >> > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_cntrl >> > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class >> > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_upper >> > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 >> _nl_C_LC_CTYPE_class_space >> > >> > Aligned on 16-byte boundaries. >> > >> > On AArch64 I see this: >> > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class >> > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class32 >> > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_alpha >> > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_print >> > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_alnum >> > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_lower >> > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_xdigit >> > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_graph >> > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_blank >> > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_punct >> > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_digit >> > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_cntrl >> > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class >> > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_upper >> > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 >> _nl_C_LC_CTYPE_class_space >> > >> > Aligned on 8-byte boundaries. >> > >> > But that alignment could go lower and still be valid and cause >> unaligned accesses. >> > >> >