* [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
@ 2021-03-15 18:42 Lirong Yuan
2021-03-16 1:44 ` Carlos O'Donell
0 siblings, 1 reply; 10+ messages in thread
From: Lirong Yuan @ 2021-03-15 18:42 UTC (permalink / raw)
To: libc-alpha; +Cc: stanshebs, scw, Lirong Yuan
steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation.
observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program.
solution: align the arrays defined in locale/C-ctype.c with correct data types as defined in ctype/ctype.h.
test suite regressions: none.
Signed-off-by: Lirong Yuan <yuanzi@google.com>
---
locale/C-ctype.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/locale/C-ctype.c b/locale/C-ctype.c
index bffdbedad0..da2c8cc33c 100644
--- a/locale/C-ctype.c
+++ b/locale/C-ctype.c
@@ -18,6 +18,7 @@
#include "localeinfo.h"
#include <endian.h>
+#include <stdalign.h>
#include <stdint.h>
#include "C-translit.h"
@@ -30,7 +31,7 @@
In the `_nl_C_LC_CTYPE_class' array the value for EOF (== -1)
is set to always return 0 and the conversion arrays return EOF. */
-const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
+alignas(uint16_t) const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
/* 0x80 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
/* 0x86 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
/* 0x8c */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
@@ -96,7 +97,7 @@ const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
/* 0xf4 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
/* 0xfa */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
;
-const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden =
+alignas(uint32_t) const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden =
/* 0x00 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
/* 0x03 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
/* 0x06 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
--
2.31.0.rc2.261.g7f71774620-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-15 18:42 [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively Lirong Yuan
@ 2021-03-16 1:44 ` Carlos O'Donell
2021-03-16 14:28 ` Szabolcs Nagy
0 siblings, 1 reply; 10+ messages in thread
From: Carlos O'Donell @ 2021-03-16 1:44 UTC (permalink / raw)
To: Lirong Yuan, libc-alpha, Szabolcs Nagy; +Cc: scw
On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
> steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation.
Szabolcs,
Do you have any input on this?
> observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program.
Yes, the char array could be misaligned with respect to a 16-bit value,
and should be aligned to the type that is expected from the interface e.g.
ctype/ctype.h:
91 # define __isctype_f(type) \
92 __extern_inline int \
93 is##type (int __c) __THROW \
94 { \
95 return (*__ctype_b_loc ())[(int) (__c)] & (unsigned short int) _IS##type; \
96 }
97 #endif
include/ctype.h:
38 CTYPE_EXTERN_INLINE const uint16_t ** __attribute__ ((const))
39 __ctype_b_loc (void)
40 {
41 return __libc_tsd_address (const uint16_t *, CTYPE_B);
42 }
So we expect a uint16_t type and the respective alignment.
My expectation is that normally aarch64 simply handles the unaligned load without any problems,
but that it would be "better" if it were 16-bit aligned?
Is this the *only* case of misaligned pointers?
> solution: align the arrays defined in locale/C-ctype.c with correct data types as defined in ctype/ctype.h.
>
> test suite regressions: none.
This looks technically correct, and the C locale is builtin so the layout
itself should be able to change without any problems.
I'd like to hear comments from Arm about this before accepting.
> Signed-off-by: Lirong Yuan <yuanzi@google.com>
We don't use DSOs in glibc, we assign copyright to the FSF, so this line would
be normally removed, and you as the git author remains.
See:
https://sourceware.org/glibc/wiki/Contribution%20checklist
You are covered by the Google copyright assignment so everything is accepted.
> ---
> locale/C-ctype.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/locale/C-ctype.c b/locale/C-ctype.c
> index bffdbedad0..da2c8cc33c 100644
> --- a/locale/C-ctype.c
> +++ b/locale/C-ctype.c
> @@ -18,6 +18,7 @@
>
> #include "localeinfo.h"
> #include <endian.h>
> +#include <stdalign.h>
OK.
> #include <stdint.h>
>
> #include "C-translit.h"
> @@ -30,7 +31,7 @@
> In the `_nl_C_LC_CTYPE_class' array the value for EOF (== -1)
> is set to always return 0 and the conversion arrays return EOF. */
>
> -const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
> +alignas(uint16_t) const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
OK. Used directly by __ctype_b_loc.
> /* 0x80 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
> /* 0x86 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
> /* 0x8c */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
> @@ -96,7 +97,7 @@ const char _nl_C_LC_CTYPE_class[768] attribute_hidden =
> /* 0xf4 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
> /* 0xfa */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000"
> ;
> -const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden =
> +alignas(uint32_t) const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden =
OK. Might be exposed via nl_langinfo and is internally uint32_t (thought directly
exposed __ctype32_b should not exist for aarch64 (verified not on abilist)).
> /* 0x00 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
> /* 0x03 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
> /* 0x06 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000"
>
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 1:44 ` Carlos O'Donell
@ 2021-03-16 14:28 ` Szabolcs Nagy
2021-03-16 19:05 ` Lirong Yuan
0 siblings, 1 reply; 10+ messages in thread
From: Szabolcs Nagy @ 2021-03-16 14:28 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: Lirong Yuan, libc-alpha, scw
The 03/15/2021 21:44, Carlos O'Donell wrote:
> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
> > steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation.
>
> Szabolcs,
>
> Do you have any input on this?
>
> > observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program.
>
> Yes, the char array could be misaligned with respect to a 16-bit value,
> and should be aligned to the type that is expected from the interface e.g.
using char[] as uint16_t[] is aliasing violation. and in principle
alignas on the definition does not fix this, but in practice that's
the only abi visible aspect of the wrong type.
i'm not sure why ubsanitizer cares about alignment specifically on
aarch64, unaligned load should work.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 14:28 ` Szabolcs Nagy
@ 2021-03-16 19:05 ` Lirong Yuan
2021-03-16 19:47 ` Adhemerval Zanella
0 siblings, 1 reply; 10+ messages in thread
From: Lirong Yuan @ 2021-03-16 19:05 UTC (permalink / raw)
To: Szabolcs Nagy; +Cc: Carlos O'Donell, libc-alpha, Shu-Chun Weng
On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
> My expectation is that normally aarch64 simply handles the unaligned load
> without any problems,
> but that it would be "better" if it were 16-bit aligned?
> Is this the *only* case of misaligned pointers?
Yes, this is the only case reported by UBSan.
> Signed-off-by: Lirong Yuan <yuanzi@google.com>
> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
> would
> be normally removed, and you as the git author remains.
Thanks for the explanation! I will send an updated patch without
"Signed-off-by" if the current approach looks good. :)
On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> The 03/15/2021 21:44, Carlos O'Donell wrote:
> > On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
> > > steps to reproduce the problem: compile a program that uses ctype
> functions such as “isspace” for aarch64 with UBSan flag
> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
> emulation.
> >
> > Szabolcs,
> >
> > Do you have any input on this?
> >
> > > observed behavior: UndefinedBehaviorSanitizer reports
> misaligned-pointer-use in the program.
> >
> > Yes, the char array could be misaligned with respect to a 16-bit value,
> > and should be aligned to the type that is expected from the interface
> e.g.
>
> using char[] as uint16_t[] is aliasing violation. and in principle
> alignas on the definition does not fix this, but in practice that's
> the only abi visible aspect of the wrong type.
>
Alternatively, we can define _nl_C_LC_CTYPE_class and
_nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
like _nl_C_LC_CTYPE_toupper array:
https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
Though the conversion may be error-prune and require more test cases...
It would seem that using alignas is an approach that's both technically
correct and less likely to cause havoc.
> i'm not sure why ubsanitizer cares about alignment specifically on
> aarch64, unaligned load should work.
>
Yes, the code works fine in practice on aarch64. The ubsan alignment is a
check for misaligned rather than unaligned. It's almost always worth fixing
since this can cause subtle and hard to track down failures that more often
manifest on other architectures.
Thanks,
Lirong
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 19:05 ` Lirong Yuan
@ 2021-03-16 19:47 ` Adhemerval Zanella
2021-03-16 20:49 ` Andreas Schwab
2021-03-16 21:05 ` Carlos O'Donell
0 siblings, 2 replies; 10+ messages in thread
From: Adhemerval Zanella @ 2021-03-16 19:47 UTC (permalink / raw)
To: libc-alpha, Shu-Chun Weng, Lirong Yuan, Szabolcs Nagy
On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
>
>> My expectation is that normally aarch64 simply handles the unaligned load
>> without any problems,
>> but that it would be "better" if it were 16-bit aligned?
>> Is this the *only* case of misaligned pointers?
>
>
> Yes, this is the only case reported by UBSan.
>
>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
>> would
>> be normally removed, and you as the git author remains.
>
>
> Thanks for the explanation! I will send an updated patch without
> "Signed-off-by" if the current approach looks good. :)
>
> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>
>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>>>> steps to reproduce the problem: compile a program that uses ctype
>> functions such as “isspace” for aarch64 with UBSan flag
>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
>> emulation.
>>>
>>> Szabolcs,
>>>
>>> Do you have any input on this?
>>>
>>>> observed behavior: UndefinedBehaviorSanitizer reports
>> misaligned-pointer-use in the program.
>>>
>>> Yes, the char array could be misaligned with respect to a 16-bit value,
>>> and should be aligned to the type that is expected from the interface
>> e.g.
>>
>> using char[] as uint16_t[] is aliasing violation. and in principle
>> alignas on the definition does not fix this, but in practice that's
>> the only abi visible aspect of the wrong type.
>>
>
> Alternatively, we can define _nl_C_LC_CTYPE_class and
> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
> like _nl_C_LC_CTYPE_toupper array:
> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
> Though the conversion may be error-prune and require more test cases...
> It would seem that using alignas is an approach that's both technically
> correct and less likely to cause havoc.
Could you check if using the expected types yields any regression?
All their usages are using explicit cast to the expected types, so
I am can't see why they have declared as char at first place.
>
>
>> i'm not sure why ubsanitizer cares about alignment specifically on
>> aarch64, unaligned load should work.
>>
>
> Yes, the code works fine in practice on aarch64. The ubsan alignment is a
> check for misaligned rather than unaligned. It's almost always worth fixing
> since this can cause subtle and hard to track down failures that more often
> manifest on other architectures.
I would expect that it this is really accessed in an unaligned manner it
would blow in some architectures (sparc and some arm and mips environments).
Not sure why we haven't see any issues on such architectures.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 19:47 ` Adhemerval Zanella
@ 2021-03-16 20:49 ` Andreas Schwab
2021-03-16 21:05 ` Carlos O'Donell
1 sibling, 0 replies; 10+ messages in thread
From: Andreas Schwab @ 2021-03-16 20:49 UTC (permalink / raw)
To: Adhemerval Zanella via Libc-alpha
Cc: Shu-Chun Weng, Lirong Yuan, Szabolcs Nagy, Adhemerval Zanella
On Mär 16 2021, Adhemerval Zanella via Libc-alpha wrote:
> I would expect that it this is really accessed in an unaligned manner it
> would blow in some architectures (sparc and some arm and mips environments).
> Not sure why we haven't see any issues on such architectures.
It is very unlikely that the arrays are placed at odd addresses, since
they are bundled together with aligned data.
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 19:47 ` Adhemerval Zanella
2021-03-16 20:49 ` Andreas Schwab
@ 2021-03-16 21:05 ` Carlos O'Donell
2021-03-17 11:34 ` Adhemerval Zanella
1 sibling, 1 reply; 10+ messages in thread
From: Carlos O'Donell @ 2021-03-16 21:05 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha, Shu-Chun Weng, Lirong Yuan,
Szabolcs Nagy
On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
>
>
> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>
>>> My expectation is that normally aarch64 simply handles the unaligned load
>>> without any problems,
>>> but that it would be "better" if it were 16-bit aligned?
>>> Is this the *only* case of misaligned pointers?
>>
>>
>> Yes, this is the only case reported by UBSan.
>>
>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
>>> would
>>> be normally removed, and you as the git author remains.
>>
>>
>> Thanks for the explanation! I will send an updated patch without
>> "Signed-off-by" if the current approach looks good. :)
>>
>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>>>>> steps to reproduce the problem: compile a program that uses ctype
>>> functions such as “isspace” for aarch64 with UBSan flag
>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
>>> emulation.
>>>>
>>>> Szabolcs,
>>>>
>>>> Do you have any input on this?
>>>>
>>>>> observed behavior: UndefinedBehaviorSanitizer reports
>>> misaligned-pointer-use in the program.
>>>>
>>>> Yes, the char array could be misaligned with respect to a 16-bit value,
>>>> and should be aligned to the type that is expected from the interface
>>> e.g.
>>>
>>> using char[] as uint16_t[] is aliasing violation. and in principle
>>> alignas on the definition does not fix this, but in practice that's
>>> the only abi visible aspect of the wrong type.
>>>
>>
>> Alternatively, we can define _nl_C_LC_CTYPE_class and
>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
>> like _nl_C_LC_CTYPE_toupper array:
>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
>> Though the conversion may be error-prune and require more test cases...
>> It would seem that using alignas is an approach that's both technically
>> correct and less likely to cause havoc.
>
> Could you check if using the expected types yields any regression?
> All their usages are using explicit cast to the expected types, so
> I am can't see why they have declared as char at first place.
It might work, but it:
* changes the ABI of binary data exposed via nl_langinfo()?
- Pointers to the array are returned.
* changes the on-disk layout and invalidates all system binary locales?
- Something similar happened for ALTMON and we just rebuilt everything.
- It isn't entirely clear to me what our guarantee of binary compatibility
is for compiled locales.
Is this a concern?
>>> i'm not sure why ubsanitizer cares about alignment specifically on
>>> aarch64, unaligned load should work.
>>>
>>
>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a
>> check for misaligned rather than unaligned. It's almost always worth fixing
>> since this can cause subtle and hard to track down failures that more often
>> manifest on other architectures.
>
> I would expect that it this is really accessed in an unaligned manner it
> would blow in some architectures (sparc and some arm and mips environments).
> Not sure why we haven't see any issues on such architectures.
AFAICT the structure is *not* misaligned on x86_64 and it has to do with the
vagaries of the compiler and linker you use and the alignment of global
variables. On x86_64 the structures are all over-aligned. I expect the same
is true for all other arches. Perhaps Google used lld with an experimental
glibc patch and they have more tightly aligned object layout.
e.g.
readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32
23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha
23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print
23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum
24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower
24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit
24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph
24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank
24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct
24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit
24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl
24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class
25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper
25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space
Aligned on 16-byte boundaries.
On AArch64 I see this:
eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32
34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha
35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print
35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum
35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower
35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit
36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph
36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank
36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct
36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit
36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl
36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class
36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper
37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space
Aligned on 8-byte boundaries.
But that alignment could go lower and still be valid and cause unaligned accesses.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-16 21:05 ` Carlos O'Donell
@ 2021-03-17 11:34 ` Adhemerval Zanella
2021-03-19 18:31 ` Lirong Yuan
0 siblings, 1 reply; 10+ messages in thread
From: Adhemerval Zanella @ 2021-03-17 11:34 UTC (permalink / raw)
To: Carlos O'Donell, libc-alpha, Shu-Chun Weng, Lirong Yuan,
Szabolcs Nagy
On 16/03/2021 18:05, Carlos O'Donell wrote:
> On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
>>
>>
>> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
>>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>>> My expectation is that normally aarch64 simply handles the unaligned load
>>>> without any problems,
>>>> but that it would be "better" if it were 16-bit aligned?
>>>> Is this the *only* case of misaligned pointers?
>>>
>>>
>>> Yes, this is the only case reported by UBSan.
>>>
>>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line
>>>> would
>>>> be normally removed, and you as the git author remains.
>>>
>>>
>>> Thanks for the explanation! I will send an updated patch without
>>> "Signed-off-by" if the current approach looks good. :)
>>>
>>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>>
>>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>>>>>> steps to reproduce the problem: compile a program that uses ctype
>>>> functions such as “isspace” for aarch64 with UBSan flag
>>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode
>>>> emulation.
>>>>>
>>>>> Szabolcs,
>>>>>
>>>>> Do you have any input on this?
>>>>>
>>>>>> observed behavior: UndefinedBehaviorSanitizer reports
>>>> misaligned-pointer-use in the program.
>>>>>
>>>>> Yes, the char array could be misaligned with respect to a 16-bit value,
>>>>> and should be aligned to the type that is expected from the interface
>>>> e.g.
>>>>
>>>> using char[] as uint16_t[] is aliasing violation. and in principle
>>>> alignas on the definition does not fix this, but in practice that's
>>>> the only abi visible aspect of the wrong type.
>>>>
>>>
>>> Alternatively, we can define _nl_C_LC_CTYPE_class and
>>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
>>> like _nl_C_LC_CTYPE_toupper array:
>>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
>>> Though the conversion may be error-prune and require more test cases...
>>> It would seem that using alignas is an approach that's both technically
>>> correct and less likely to cause havoc.
>>
>> Could you check if using the expected types yields any regression?
>> All their usages are using explicit cast to the expected types, so
>> I am can't see why they have declared as char at first place.
>
> It might work, but it:
>
> * changes the ABI of binary data exposed via nl_langinfo()?
> - Pointers to the array are returned.
>
> * changes the on-disk layout and invalidates all system binary locales?
> - Something similar happened for ALTMON and we just rebuilt everything.
> - It isn't entirely clear to me what our guarantee of binary compatibility
> is for compiled locales.
>
> Is this a concern?
>
But wouldn't the array alignment change cause this very changes as well?
The ABI of binary data exposed is already exported as 'unsigned short',
so not sure if will change anything here. I am not sure about on-disk
layout and I don't recall the ALTMON change specifically.
>>>> i'm not sure why ubsanitizer cares about alignment specifically on
>>>> aarch64, unaligned load should work.
>>>>
>>>
>>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a
>>> check for misaligned rather than unaligned. It's almost always worth fixing
>>> since this can cause subtle and hard to track down failures that more often
>>> manifest on other architectures.
>>
>> I would expect that it this is really accessed in an unaligned manner it
>> would blow in some architectures (sparc and some arm and mips environments).
>> Not sure why we haven't see any issues on such architectures.
>
> AFAICT the structure is *not* misaligned on x86_64 and it has to do with the
> vagaries of the compiler and linker you use and the alignment of global
> variables. On x86_64 the structures are all over-aligned. I expect the same
> is true for all other arches. Perhaps Google used lld with an experimental
> glibc patch and they have more tightly aligned object layout.
So were it the case we got lucky that linker is laying out the data structures
in the expected alignment?
>
> e.g.
> readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
> 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32
> 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha
> 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print
> 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum
> 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower
> 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit
> 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph
> 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank
> 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct
> 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit
> 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl
> 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class
> 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper
> 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space
>
> Aligned on 16-byte boundaries.
>
> On AArch64 I see this:
> eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
> 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32
> 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha
> 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print
> 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum
> 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower
> 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit
> 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph
> 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank
> 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct
> 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit
> 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl
> 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class
> 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper
> 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space
>
> Aligned on 8-byte boundaries.
>
> But that alignment could go lower and still be valid and cause unaligned accesses.
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-17 11:34 ` Adhemerval Zanella
@ 2021-03-19 18:31 ` Lirong Yuan
2021-03-30 17:33 ` Lirong Yuan
0 siblings, 1 reply; 10+ messages in thread
From: Lirong Yuan @ 2021-03-19 18:31 UTC (permalink / raw)
To: Adhemerval Zanella
Cc: Carlos O'Donell, libc-alpha, Shu-Chun Weng, Szabolcs Nagy
On Tue, Mar 16, 2021 at 12:47 PM Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:
> Could you check if using the expected types yields any regression?
> All their usages are using explicit cast to the expected types, so
> I am can't see why they have declared as char at first place.
Took a while to get it right but checked that using the expected types did
not yield any regression. :)
On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:
> But wouldn't the array alignment change cause this very changes as well?
> The ABI of binary data exposed is already exported as 'unsigned short',
> so not sure if will change anything here. I am not sure about on-disk
> layout and I don't recall the ALTMON change specifically.
Layout seemed to change for the newly built libraries. Clean lib: 6982:
0000000000025f65 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class
6983: 0000000000026265 1024 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class32
6984: 0000000000027588 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alnum
6985: 00000000000272f8 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alpha
6986: 00000000000274ac 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_blank
6987: 00000000000274f0 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_cntrl
6988: 0000000000027340 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_digit
6989: 0000000000027460 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_graph
6990: 00000000000272b0 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_lower
6991: 0000000000027414 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_print
6992: 000000000002753c 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_punct
6993: 00000000000273d0 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_space
6994: 0000000000027268 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_upper
6995: 0000000000027384 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_xdigit
New lib built using alignas: 5662: 000000000002ac5e 768 OBJECT LOCAL
HIDDEN 12 _nl_C_LC_CTYPE_class
5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class32
5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_upper
5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_lower
5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alpha
5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_digit
5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_xdigit
5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_space
5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_print
5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_graph
5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_blank
5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_cntrl
5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_punct
5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alnum
New lib built using expected types: 5662: 000000000002ac5e 768 OBJECT
LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class
5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class32
5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_upper
5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_lower
5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alpha
5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_digit
5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_xdigit
5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_space
5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_print
5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_graph
5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_blank
5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_cntrl
5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_punct
5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12
_nl_C_LC_CTYPE_class_alnum
So were it the case we got lucky that linker is laying out the data
> structures
> in the expected alignment?
Yeah, sort of like a race condition that can only be discovered under
certain circumstances.
In this case, the condition is running an arm program compiled with ubsan
under qemu user mode emulation on a x86 machine.
Regards,
Lirong
On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:
>
>
> On 16/03/2021 18:05, Carlos O'Donell wrote:
> > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
> >>
> >>
> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
> >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com>
> wrote:
> >>>
> >>>> My expectation is that normally aarch64 simply handles the unaligned
> load
> >>>> without any problems,
> >>>> but that it would be "better" if it were 16-bit aligned?
> >>>> Is this the *only* case of misaligned pointers?
> >>>
> >>>
> >>> Yes, this is the only case reported by UBSan.
> >>>
> >>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
> >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this
> line
> >>>> would
> >>>> be normally removed, and you as the git author remains.
> >>>
> >>>
> >>> Thanks for the explanation! I will send an updated patch without
> >>> "Signed-off-by" if the current approach looks good. :)
> >>>
> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com>
> wrote:
> >>>
> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
> >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
> >>>>>> steps to reproduce the problem: compile a program that uses ctype
> >>>> functions such as “isspace” for aarch64 with UBSan flag
> >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user
> mode
> >>>> emulation.
> >>>>>
> >>>>> Szabolcs,
> >>>>>
> >>>>> Do you have any input on this?
> >>>>>
> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports
> >>>> misaligned-pointer-use in the program.
> >>>>>
> >>>>> Yes, the char array could be misaligned with respect to a 16-bit
> value,
> >>>>> and should be aligned to the type that is expected from the interface
> >>>> e.g.
> >>>>
> >>>> using char[] as uint16_t[] is aliasing violation. and in principle
> >>>> alignas on the definition does not fix this, but in practice that's
> >>>> the only abi visible aspect of the wrong type.
> >>>>
> >>>
> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and
> >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays,
> >>> like _nl_C_LC_CTYPE_toupper array:
> >>>
> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
> >>> Though the conversion may be error-prune and require more test cases...
> >>> It would seem that using alignas is an approach that's both technically
> >>> correct and less likely to cause havoc.
> >>
> >> Could you check if using the expected types yields any regression?
> >> All their usages are using explicit cast to the expected types, so
> >> I am can't see why they have declared as char at first place.
> >
> > It might work, but it:
> >
> > * changes the ABI of binary data exposed via nl_langinfo()?
> > - Pointers to the array are returned.
> >
> > * changes the on-disk layout and invalidates all system binary locales?
> > - Something similar happened for ALTMON and we just rebuilt everything.
> > - It isn't entirely clear to me what our guarantee of binary
> compatibility
> > is for compiled locales.
> >
> > Is this a concern?
> >
>
> But wouldn't the array alignment change cause this very changes as well?
> The ABI of binary data exposed is already exported as 'unsigned short',
> so not sure if will change anything here. I am not sure about on-disk
> layout and I don't recall the ALTMON change specifically.
>
> >>>> i'm not sure why ubsanitizer cares about alignment specifically on
> >>>> aarch64, unaligned load should work.
> >>>>
> >>>
> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment
> is a
> >>> check for misaligned rather than unaligned. It's almost always worth
> fixing
> >>> since this can cause subtle and hard to track down failures that more
> often
> >>> manifest on other architectures.
> >>
> >> I would expect that it this is really accessed in an unaligned manner it
> >> would blow in some architectures (sparc and some arm and mips
> environments).
> >> Not sure why we haven't see any issues on such architectures.
> >
> > AFAICT the structure is *not* misaligned on x86_64 and it has to do with
> the
> > vagaries of the compiler and linker you use and the alignment of global
> > variables. On x86_64 the structures are all over-aligned. I expect the
> same
> > is true for all other arches. Perhaps Google used lld with an
> experimental
> > glibc patch and they have more tightly aligned object layout.
>
> So were it the case we got lucky that linker is laying out the data
> structures
> in the expected alignment?
>
> >
> > e.g.
> > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
> > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class32
> > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_alpha
> > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_print
> > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_alnum
> > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_lower
> > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_xdigit
> > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_graph
> > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_blank
> > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_punct
> > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_digit
> > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_cntrl
> > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class
> > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_upper
> > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17
> _nl_C_LC_CTYPE_class_space
> >
> > Aligned on 16-byte boundaries.
> >
> > On AArch64 I see this:
> > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
> > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class32
> > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_alpha
> > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_print
> > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_alnum
> > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_lower
> > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_xdigit
> > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_graph
> > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_blank
> > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_punct
> > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_digit
> > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_cntrl
> > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class
> > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_upper
> > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16
> _nl_C_LC_CTYPE_class_space
> >
> > Aligned on 8-byte boundaries.
> >
> > But that alignment could go lower and still be valid and cause unaligned
> accesses.
> >
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively
2021-03-19 18:31 ` Lirong Yuan
@ 2021-03-30 17:33 ` Lirong Yuan
0 siblings, 0 replies; 10+ messages in thread
From: Lirong Yuan @ 2021-03-30 17:33 UTC (permalink / raw)
To: Adhemerval Zanella
Cc: Carlos O'Donell, libc-alpha, Shu-Chun Weng, Szabolcs Nagy
Friendly Ping~
Also sent the v2 patch for using the expected types:
http://patchwork.sourceware.org/project/glibc/patch/20210330172518.184058-1-yuanzi@google.com/
Regards,
Lirong
On Fri, Mar 19, 2021 at 11:31 AM Lirong Yuan <yuanzi@google.com> wrote:
> On Tue, Mar 16, 2021 at 12:47 PM Adhemerval Zanella <
> adhemerval.zanella@linaro.org> wrote:
>
>> Could you check if using the expected types yields any regression?
>> All their usages are using explicit cast to the expected types, so
>> I am can't see why they have declared as char at first place.
>
>
> Took a while to get it right but checked that using the expected types did
> not yield any regression. :)
>
> On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella <
> adhemerval.zanella@linaro.org> wrote:
>
>> But wouldn't the array alignment change cause this very changes as well?
>> The ABI of binary data exposed is already exported as 'unsigned short',
>> so not sure if will change anything here. I am not sure about on-disk
>> layout and I don't recall the ALTMON change specifically.
>
>
> Layout seemed to change for the newly built libraries. Clean lib: 6982:
> 0000000000025f65 768 OBJECT LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class
> 6983: 0000000000026265 1024 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class32
> 6984: 0000000000027588 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alnum
> 6985: 00000000000272f8 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alpha
> 6986: 00000000000274ac 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_blank
> 6987: 00000000000274f0 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_cntrl
> 6988: 0000000000027340 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_digit
> 6989: 0000000000027460 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_graph
> 6990: 00000000000272b0 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_lower
> 6991: 0000000000027414 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_print
> 6992: 000000000002753c 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_punct
> 6993: 00000000000273d0 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_space
> 6994: 0000000000027268 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_upper
> 6995: 0000000000027384 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_xdigit
>
> New lib built using alignas: 5662: 000000000002ac5e 768 OBJECT LOCAL
> HIDDEN 12 _nl_C_LC_CTYPE_class
> 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class32
> 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_upper
> 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_lower
> 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alpha
> 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_digit
> 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_xdigit
> 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_space
> 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_print
> 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_graph
> 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_blank
> 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_cntrl
> 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_punct
> 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alnum
>
> New lib built using expected types: 5662: 000000000002ac5e 768 OBJECT
> LOCAL HIDDEN 12 _nl_C_LC_CTYPE_class
> 5663: 000000000002af60 1024 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class32
> 5666: 000000000002bf60 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_upper
> 5667: 000000000002bfa8 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_lower
> 5668: 000000000002bff0 72 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alpha
> 5669: 000000000002c038 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_digit
> 5670: 000000000002c07c 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_xdigit
> 5671: 000000000002c0c8 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_space
> 5672: 000000000002c10c 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_print
> 5673: 000000000002c158 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_graph
> 5674: 000000000002c1a4 68 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_blank
> 5675: 000000000002c1e8 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_cntrl
> 5676: 000000000002c234 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_punct
> 5677: 000000000002c280 76 OBJECT LOCAL HIDDEN 12
> _nl_C_LC_CTYPE_class_alnum
>
> So were it the case we got lucky that linker is laying out the data
>> structures
>> in the expected alignment?
>
>
> Yeah, sort of like a race condition that can only be discovered under
> certain circumstances.
> In this case, the condition is running an arm program compiled with ubsan
> under qemu user mode emulation on a x86 machine.
>
> Regards,
> Lirong
>
> On Wed, Mar 17, 2021 at 4:34 AM Adhemerval Zanella <
> adhemerval.zanella@linaro.org> wrote:
>
>>
>>
>> On 16/03/2021 18:05, Carlos O'Donell wrote:
>> > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote:
>> >>
>> >>
>> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote:
>> >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell <carlos@redhat.com>
>> wrote:
>> >>>
>> >>>> My expectation is that normally aarch64 simply handles the unaligned
>> load
>> >>>> without any problems,
>> >>>> but that it would be "better" if it were 16-bit aligned?
>> >>>> Is this the *only* case of misaligned pointers?
>> >>>
>> >>>
>> >>> Yes, this is the only case reported by UBSan.
>> >>>
>> >>>> Signed-off-by: Lirong Yuan <yuanzi@google.com>
>> >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this
>> line
>> >>>> would
>> >>>> be normally removed, and you as the git author remains.
>> >>>
>> >>>
>> >>> Thanks for the explanation! I will send an updated patch without
>> >>> "Signed-off-by" if the current approach looks good. :)
>> >>>
>> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy <szabolcs.nagy@arm.com>
>> wrote:
>> >>>
>> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote:
>> >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote:
>> >>>>>> steps to reproduce the problem: compile a program that uses ctype
>> >>>> functions such as “isspace” for aarch64 with UBSan flag
>> >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user
>> mode
>> >>>> emulation.
>> >>>>>
>> >>>>> Szabolcs,
>> >>>>>
>> >>>>> Do you have any input on this?
>> >>>>>
>> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports
>> >>>> misaligned-pointer-use in the program.
>> >>>>>
>> >>>>> Yes, the char array could be misaligned with respect to a 16-bit
>> value,
>> >>>>> and should be aligned to the type that is expected from the
>> interface
>> >>>> e.g.
>> >>>>
>> >>>> using char[] as uint16_t[] is aliasing violation. and in principle
>> >>>> alignas on the definition does not fix this, but in practice that's
>> >>>> the only abi visible aspect of the wrong type.
>> >>>>
>> >>>
>> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and
>> >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t
>> arrays,
>> >>> like _nl_C_LC_CTYPE_toupper array:
>> >>>
>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper
>> >>> Though the conversion may be error-prune and require more test
>> cases...
>> >>> It would seem that using alignas is an approach that's both
>> technically
>> >>> correct and less likely to cause havoc.
>> >>
>> >> Could you check if using the expected types yields any regression?
>> >> All their usages are using explicit cast to the expected types, so
>> >> I am can't see why they have declared as char at first place.
>> >
>> > It might work, but it:
>> >
>> > * changes the ABI of binary data exposed via nl_langinfo()?
>> > - Pointers to the array are returned.
>> >
>> > * changes the on-disk layout and invalidates all system binary locales?
>> > - Something similar happened for ALTMON and we just rebuilt
>> everything.
>> > - It isn't entirely clear to me what our guarantee of binary
>> compatibility
>> > is for compiled locales.
>> >
>> > Is this a concern?
>> >
>>
>> But wouldn't the array alignment change cause this very changes as well?
>> The ABI of binary data exposed is already exported as 'unsigned short',
>> so not sure if will change anything here. I am not sure about on-disk
>> layout and I don't recall the ALTMON change specifically.
>>
>> >>>> i'm not sure why ubsanitizer cares about alignment specifically on
>> >>>> aarch64, unaligned load should work.
>> >>>>
>> >>>
>> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment
>> is a
>> >>> check for misaligned rather than unaligned. It's almost always worth
>> fixing
>> >>> since this can cause subtle and hard to track down failures that more
>> often
>> >>> manifest on other architectures.
>> >>
>> >> I would expect that it this is really accessed in an unaligned manner
>> it
>> >> would blow in some architectures (sparc and some arm and mips
>> environments).
>> >> Not sure why we haven't see any issues on such architectures.
>> >
>> > AFAICT the structure is *not* misaligned on x86_64 and it has to do
>> with the
>> > vagaries of the compiler and linker you use and the alignment of global
>> > variables. On x86_64 the structures are all over-aligned. I expect the
>> same
>> > is true for all other arches. Perhaps Google used lld with an
>> experimental
>> > glibc patch and they have more tightly aligned object layout.
>>
>> So were it the case we got lucky that linker is laying out the data
>> structures
>> in the expected alignment?
>>
>> >
>> > e.g.
>> > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
>> > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class32
>> > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_alpha
>> > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_print
>> > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_alnum
>> > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_lower
>> > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_xdigit
>> > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_graph
>> > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_blank
>> > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_punct
>> > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_digit
>> > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_cntrl
>> > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class
>> > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_upper
>> > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17
>> _nl_C_LC_CTYPE_class_space
>> >
>> > Aligned on 16-byte boundaries.
>> >
>> > On AArch64 I see this:
>> > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class
>> > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class32
>> > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_alpha
>> > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_print
>> > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_alnum
>> > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_lower
>> > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_xdigit
>> > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_graph
>> > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_blank
>> > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_punct
>> > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_digit
>> > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_cntrl
>> > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class
>> > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_upper
>> > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16
>> _nl_C_LC_CTYPE_class_space
>> >
>> > Aligned on 8-byte boundaries.
>> >
>> > But that alignment could go lower and still be valid and cause
>> unaligned accesses.
>> >
>>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-03-30 17:33 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-15 18:42 [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively Lirong Yuan
2021-03-16 1:44 ` Carlos O'Donell
2021-03-16 14:28 ` Szabolcs Nagy
2021-03-16 19:05 ` Lirong Yuan
2021-03-16 19:47 ` Adhemerval Zanella
2021-03-16 20:49 ` Andreas Schwab
2021-03-16 21:05 ` Carlos O'Donell
2021-03-17 11:34 ` Adhemerval Zanella
2021-03-19 18:31 ` Lirong Yuan
2021-03-30 17:33 ` Lirong Yuan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).