From: Carlos O'Donell <carlos@redhat.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
libc-alpha@sourceware.org, Shu-Chun Weng <scw@google.com>,
Lirong Yuan <yuanzi@google.com>,
Szabolcs Nagy <szabolcs.nagy@arm.com>
Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
Date: Tue, 16 Mar 2021 17:05:02 -0400 [thread overview]
Message-ID: <e4dc1e11-4f62-609b-4f37-3d288b9c9ce7@redhat.com> (raw)
In-Reply-To: <cf35b067-5ecb-622e-79a5-8602b712dfc7@linaro.org>
On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
>
>
> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> My expectation is that normally aarch64 simply handles the unaligned load
>>> without any problems,
>>> but that it would be "better" if it were 16-bit aligned?
>>> Is this the *only* case of misaligned pointers?
>>
>>
>> Yes, this is the only case reported by UBSan.
>>
>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
>>> would
>>> be normally removed, and you as the git author remains.
>>
>>
>> Thanks for the explanation! I will send an updated patch without
>> "Signed-off-by" if the current approach looks good. :)
>>
>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>>>>> steps to reproduce the problem: compile a program that uses ctype
>>> functions such as “isspace” for aarch64 with UBSan flag
>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
>>> emulation.
>>>>
>>>> Szabolcs,
>>>>
>>>> Do you have any input on this?
>>>>
>>>>> observed behavior: UndefinedBehaviorSanitizer reports
>>> misaligned-pointer-use in the program.
>>>>
>>>> Yes, the char array could be misaligned with respect to a 16-bit value,
>>>> and should be aligned to the type that is expected from the interface
>>> e.g.
>>>
>>> using char[] as uint16_t[] is aliasing violation. and in principle
>>> alignas on the definition does not fix this, but in practice that's
>>> the only abi visible aspect of the wrong type.
>>>
>>
>> Alternatively, we can define _nl_C_LC_CTYPE_class and
>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
>> like _nl_C_LC_CTYPE_toupper array:
>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
>> Though the conversion may be error-prune and require more test cases...
>> It would seem that using alignas is an approach that's both technically
>> correct and less likely to cause havoc.
>
> Could you check if using the expected types yields any regression?
> All their usages are using explicit cast to the expected types, so
> I am can't see why they have declared as char at first place.
It might work, but it:
* changes the ABI of binary data exposed via nl_langinfo()?
- Pointers to the array are returned.
* changes the on-disk layout and invalidates all system binary locales?
- Something similar happened for ALTMON and we just rebuilt everything.
- It isn't entirely clear to me what our guarantee of binary compatibility
is for compiled locales.
Is this a concern?
>>> i'm not sure why ubsanitizer cares about alignment specifically on
>>> aarch64, unaligned load should work.
>>>
>>
>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a
>> check for misaligned rather than unaligned. It's almost always worth fixing
>> since this can cause subtle and hard to track down failures that more often
>> manifest on other architectures.
>
> I would expect that it this is really accessed in an unaligned manner it
> would blow in some architectures (sparc and some arm and mips environments).
> Not sure why we haven't see any issues on such architectures.
AFAICT the structure is *not* misaligned on x86_64 and it has to do with the
vagaries of the compiler and linker you use and the alignment of global
variables. On x86_64 the structures are all over-aligned. I expect the same
is true for all other arches. Perhaps Google used lld with an experimental
glibc patch and they have more tightly aligned object layout.
e.g.
readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32
23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha
23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print
23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum
24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower
24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit
24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph
24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank
24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct
24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit
24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl
24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class
25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper
25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space
Aligned on 16-byte boundaries.
On AArch64 I see this:
eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32
34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha
35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print
35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum
35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower
35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit
36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph
36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank
36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct
36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit
36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl
36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class
36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper
37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space
Aligned on 8-byte boundaries.
But that alignment could go lower and still be valid and cause unaligned accesses.
--
Cheers,
Carlos.
next prev parent reply other threads:[~2021-03-16 21:05 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-15 18:42 Lirong Yuan
2021-03-16 1:44 ` Carlos O'Donell
2021-03-16 14:28 ` Szabolcs Nagy
2021-03-16 19:05 ` Lirong Yuan
2021-03-16 19:47 ` Adhemerval Zanella
2021-03-16 20:49 ` Andreas Schwab
2021-03-16 21:05 ` Carlos O'Donell [this message]
2021-03-17 11:34 ` Adhemerval Zanella
2021-03-19 18:31 ` Lirong Yuan
2021-03-30 17:33 ` Lirong Yuan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e4dc1e11-4f62-609b-4f37-3d288b9c9ce7@redhat.com \
--to=carlos@redhat.com \
--cc=adhemerval.zanella@linaro.org \
--cc=libc-alpha@sourceware.org \
--cc=scw@google.com \
--cc=szabolcs.nagy@arm.com \
--cc=yuanzi@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).