public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Carlos O'Donell <carlos@redhat.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	libc-alpha@sourceware.org, Shu-Chun Weng <scw@google.com>,
	Lirong Yuan <yuanzi@google.com>,
	Szabolcs Nagy <szabolcs.nagy@arm.com>
Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
Date: Tue, 16 Mar 2021 17:05:02 -0400	[thread overview]
Message-ID: <e4dc1e11-4f62-609b-4f37-3d288b9c9ce7@redhat.com> (raw)
In-Reply-To: <cf35b067-5ecb-622e-79a5-8602b712dfc7@linaro.org>

On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
> 
> 
> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> My expectation is that normally aarch64 simply handles the unaligned load
>>> without any problems,
>>> but that it would be "better" if it were 16-bit aligned?
>>> Is this the *only* case of misaligned pointers?
>>
>>
>> Yes, this is the only case reported by UBSan.
>>
>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
>>> would
>>> be normally removed, and you as the git author remains.
>>
>>
>> Thanks for the explanation! I will send an updated patch without
>> "Signed-off-by" if the current approach looks good. :)
>>
>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>>>>> steps to reproduce the problem: compile a program that uses ctype
>>> functions such as “isspace” for aarch64 with UBSan flag
>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
>>> emulation.
>>>>
>>>> Szabolcs,
>>>>
>>>> Do you have any input on this?
>>>>
>>>>> observed behavior: UndefinedBehaviorSanitizer reports
>>> misaligned-pointer-use in the program.
>>>>
>>>> Yes, the char array could be misaligned with respect to a 16-bit value,
>>>> and should be aligned to the type that is expected from the interface
>>> e.g.
>>>
>>> using char[] as uint16_t[] is aliasing violation. and in principle
>>> alignas on the definition does not fix this, but in practice that's
>>> the only abi visible aspect of the wrong type.
>>>
>>
>> Alternatively, we can define _nl_C_LC_CTYPE_class and
>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
>> like _nl_C_LC_CTYPE_toupper array:
>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
>> Though the conversion may be error-prune and require more test cases...
>> It would seem that using alignas is an approach that's both technically
>> correct and less likely to cause havoc.
> 
> Could you check if using the expected types yields any regression?
> All their usages are using explicit cast to the expected types, so
> I am can't see why they have declared as char at first place.

It might work, but it:

* changes the ABI of binary data exposed via nl_langinfo()?
  - Pointers to the array are returned.

* changes the on-disk layout and invalidates all system binary locales?
  - Something similar happened for ALTMON and we just rebuilt everything.
  - It isn't entirely clear to me what our guarantee of binary compatibility
    is for compiled locales.

Is this a concern?

>>> i'm not sure why ubsanitizer cares about alignment specifically on
>>> aarch64, unaligned load should work.
>>>
>>
>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a
>> check for misaligned rather than unaligned. It's almost always worth fixing
>> since this can cause subtle and hard to track down failures that more often
>> manifest on other architectures.
> 
> I would expect that it this is really accessed in an unaligned manner it
> would blow in some architectures (sparc and some arm and mips environments).
> Not sure why we haven't see any issues on such architectures.
 
AFAICT the structure is *not* misaligned on x86_64 and it has to do with the
vagaries of the compiler and linker you use and the alignment of global
variables. On x86_64 the structures are all over-aligned. I expect the same
is true for all other arches. Perhaps Google used lld with an experimental
glibc patch and they have more tightly aligned object layout.

e.g.
readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
 23087: 0000000000176ec0  1024 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class32
 23125: 00000000001761a0    72 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_alpha
 23699: 0000000000176020    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_print
 23960: 0000000000175e40    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_alnum
 24152: 0000000000176200    72 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_lower
 24208: 00000000001760e0    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_xdigit
 24547: 0000000000175fc0    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_graph
 24569: 0000000000175f60    68 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_blank
 24582: 0000000000175ea0    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_punct
 24764: 0000000000176140    68 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_digit
 24883: 0000000000175f00    76 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_cntrl
 24991: 00000000001772c0   768 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class
 25344: 0000000000176260    72 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_upper
 25416: 0000000000176080    68 OBJECT  LOCAL  DEFAULT   17 _nl_C_LC_CTYPE_class_space

Aligned on 16-byte boundaries.

On AArch64 I see this:
eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
34906: 000000000011eea8   1024 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class32
34945: 000000000011e1d0     72 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_alpha
35470: 000000000011e0a0     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_print
35712: 000000000011df18     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_alnum
35889: 000000000011e218     72 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_lower
35944: 000000000011e138     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_xdigit
36246: 000000000011e050     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_graph
36268: 000000000011e008     68 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_blank
36284: 000000000011df68     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_punct
36455: 000000000011e188     68 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_digit
36571: 000000000011dfb8     76 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_cntrl
36670: 000000000011f2a8    768 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class
36993: 000000000011e260     72 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_upper
37061: 000000000011e0f0     68 OBJECT  LOCAL  DEFAULT       16 _nl_C_LC_CTYPE_class_space

Aligned on 8-byte boundaries.

But that alignment could go lower and still be valid and cause unaligned accesses.

-- 
Cheers,
Carlos.


  parent reply	other threads:[~2021-03-16 21:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 18:42 Lirong Yuan
2021-03-16  1:44 ` Carlos O'Donell
2021-03-16 14:28   ` Szabolcs Nagy
2021-03-16 19:05     ` Lirong Yuan
2021-03-16 19:47       ` Adhemerval Zanella
2021-03-16 20:49         ` Andreas Schwab
2021-03-16 21:05         ` Carlos O'Donell [this message]
2021-03-17 11:34           ` Adhemerval Zanella
2021-03-19 18:31             ` Lirong Yuan
2021-03-30 17:33               ` Lirong Yuan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4dc1e11-4f62-609b-4f37-3d288b9c9ce7@redhat.com \
    --to=carlos@redhat.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    --cc=scw@google.com \
    --cc=szabolcs.nagy@arm.com \
    --cc=yuanzi@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).