From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 4F4D03857C74 for ; Tue, 16 Mar 2021 01:45:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4F4D03857C74 Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-385-GGNEpOcpOWuG-9gq3uOiWg-1; Mon, 15 Mar 2021 21:45:02 -0400 X-MC-Unique: GGNEpOcpOWuG-9gq3uOiWg-1 Received: by mail-qk1-f198.google.com with SMTP id c1so25837231qke.8 for ; Mon, 15 Mar 2021 18:45:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=0ME7tG+K3Uj4GiUGBkZ2NtLRwHKsPkaBwrg33NX8REE=; b=FDbYPGTYbkp6+THjEVPqXCmiee6QWMRiVQpgAkrvv3WWqL1n5f2QWo4xQ9xIgadQNn 9LPxKtK+iA7UVZBM8Celx8xYfuLiLaE2rBKOyFnALR8n7gOWudj51RnkCLHF3dy97i47 io18g83s6vCMSavz4o6KVcuWe2SVgGxdX6Nat44vo1mbCP/61pk7D8HByNKdE859HQhC q8QFPoyvXdxBd0bgeZgsrWuQ1/YZGKIF9iFZPrOqij4cUpX+jUxmBG5MTI44ieDLaQtT DwYTEipY5Qb3I6MCDFOgyIMceeRDW1J8s/Kz6YCTao8v9s26xAW53WgELLa2D9soKlkk wJiA== X-Gm-Message-State: AOAM531TdgToEfzCjvyPNqRFfM1olgnhED0f07l6HDjFkNuwQsC3xIQh FwJlG8srJus7YOXBquAN2o/0UmvjGWM367zen2ZvCApDf/XgTfq3bEZpjk2WVoaNnC3UG9k7YTX 4H4/PFbRh1LEAWBk0REKe X-Received: by 2002:ad4:542b:: with SMTP id g11mr13836468qvt.47.1615859101626; Mon, 15 Mar 2021 18:45:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFc7OZuuWgemx68OLas+uHKczYFfHp/54ASxtSMUGX23y388UsVtzXZDpYK/RyqB4iuRuw8g== X-Received: by 2002:ad4:542b:: with SMTP id g11mr13836458qvt.47.1615859101420; Mon, 15 Mar 2021 18:45:01 -0700 (PDT) Received: from [192.168.1.16] (198-84-214-74.cpe.teksavvy.com. [198.84.214.74]) by smtp.gmail.com with ESMTPSA id e15sm12757882qtp.58.2021.03.15.18.45.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 15 Mar 2021 18:45:00 -0700 (PDT) Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively To: Lirong Yuan , libc-alpha@sourceware.org, Szabolcs Nagy Cc: scw@google.com References: <20210315184211.4124573-1-yuanzi@google.com> From: Carlos O'Donell Organization: Red Hat Message-ID: <2003e08c-55e2-80fa-89a6-fb8d59cc0e77@redhat.com> Date: Mon, 15 Mar 2021 21:44:59 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20210315184211.4124573-1-yuanzi@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-11.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Mar 2021 01:45:06 -0000 On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: > steps to reproduce the problem: compile a program that uses ctype functions such as “isspace” for aarch64 with UBSan flag “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode emulation. Szabolcs, Do you have any input on this? > observed behavior: UndefinedBehaviorSanitizer reports misaligned-pointer-use in the program. Yes, the char array could be misaligned with respect to a 16-bit value, and should be aligned to the type that is expected from the interface e.g. ctype/ctype.h: 91 # define __isctype_f(type) \ 92 __extern_inline int \ 93 is##type (int __c) __THROW \ 94 { \ 95 return (*__ctype_b_loc ())[(int) (__c)] & (unsigned short int) _IS##type; \ 96 } 97 #endif include/ctype.h: 38 CTYPE_EXTERN_INLINE const uint16_t ** __attribute__ ((const)) 39 __ctype_b_loc (void) 40 { 41 return __libc_tsd_address (const uint16_t *, CTYPE_B); 42 } So we expect a uint16_t type and the respective alignment. My expectation is that normally aarch64 simply handles the unaligned load without any problems, but that it would be "better" if it were 16-bit aligned? Is this the *only* case of misaligned pointers? > solution: align the arrays defined in locale/C-ctype.c with correct data types as defined in ctype/ctype.h. > > test suite regressions: none. This looks technically correct, and the C locale is builtin so the layout itself should be able to change without any problems. I'd like to hear comments from Arm about this before accepting. > Signed-off-by: Lirong Yuan We don't use DSOs in glibc, we assign copyright to the FSF, so this line would be normally removed, and you as the git author remains. See: https://sourceware.org/glibc/wiki/Contribution%20checklist You are covered by the Google copyright assignment so everything is accepted. > --- > locale/C-ctype.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/locale/C-ctype.c b/locale/C-ctype.c > index bffdbedad0..da2c8cc33c 100644 > --- a/locale/C-ctype.c > +++ b/locale/C-ctype.c > @@ -18,6 +18,7 @@ > > #include "localeinfo.h" > #include > +#include OK. > #include > > #include "C-translit.h" > @@ -30,7 +31,7 @@ > In the `_nl_C_LC_CTYPE_class' array the value for EOF (== -1) > is set to always return 0 and the conversion arrays return EOF. */ > > -const char _nl_C_LC_CTYPE_class[768] attribute_hidden = > +alignas(uint16_t) const char _nl_C_LC_CTYPE_class[768] attribute_hidden = OK. Used directly by __ctype_b_loc. > /* 0x80 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0x86 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0x8c */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > @@ -96,7 +97,7 @@ const char _nl_C_LC_CTYPE_class[768] attribute_hidden = > /* 0xf4 */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > /* 0xfa */ "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" "\000\000" > ; > -const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = > +alignas(uint32_t) const char _nl_C_LC_CTYPE_class32[1024] attribute_hidden = OK. Might be exposed via nl_langinfo and is internally uint32_t (thought directly exposed __ctype32_b should not exist for aarch64 (verified not on abilist)). > /* 0x00 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" > /* 0x03 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" > /* 0x06 */ "\000\000\002\000" "\000\000\002\000" "\000\000\002\000" > -- Cheers, Carlos.