From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72d.google.com (mail-qk1-x72d.google.com [IPv6:2607:f8b0:4864:20::72d]) by sourceware.org (Postfix) with ESMTPS id AD4ED385BF9E for ; Wed, 17 Mar 2021 11:34:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AD4ED385BF9E Received: by mail-qk1-x72d.google.com with SMTP id t4so38484462qkp.1 for ; Wed, 17 Mar 2021 04:34:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=SUvHj64zfVLQbJbZh+KFNTC843cmNb56d9xUMf0o42s=; b=mt7yYg6XmIMQxoxIDGdHpaXNyPX7Kl+5+NLU9OhBLJbaAGXTZBaoByW3nll9sBgC7Z VsJhpBdgR2TLWFx3EI8xuQHkuJoMqQTZlaVMOKqllCaupQE44js2qN52KcsJEIk8aXUt DNuUIJJ03kawcQxwLiiLIr3PLVdal0u3qGCv6wypPkMvneW793cteB25yE7DfBP1Jx6+ eJlrqlMWykXddf8MwqdQqPoHmUwDmQW7kj6//cfV2HWs52vDY5dmaVNrctSpgTlcS/FW D/X/lNEeosCyYXLJ4miMt/A0k5SxL16/ZJ4GHdp0GUYB9BkcOgQ4gcrNlpkOtN49DYaE 3xEQ== X-Gm-Message-State: AOAM532vNqtosXF1A6tpzf12vf4aPeb3dCfc985zSVhuI7VRnCPsF+es bYmzJYNDAtJn4/01IMBvLgVqgA== X-Google-Smtp-Source: ABdhPJxT7NO7VlqD168i+oiCnTyE0Kd0eLkGfBjZGAE0YypZ92RuP/LaEIiZP+fMkKNQmx/PI9ZTYQ== X-Received: by 2002:a05:620a:941:: with SMTP id w1mr4194173qkw.484.1615980843216; Wed, 17 Mar 2021 04:34:03 -0700 (PDT) Received: from [192.168.1.4] ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id s19sm17128643qks.130.2021.03.17.04.34.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Mar 2021 04:34:02 -0700 (PDT) Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively To: Carlos O'Donell , libc-alpha@sourceware.org, Shu-Chun Weng , Lirong Yuan , Szabolcs Nagy References: <20210315184211.4124573-1-yuanzi@google.com> <2003e08c-55e2-80fa-89a6-fb8d59cc0e77@redhat.com> <20210316142816.GC4427@arm.com> From: Adhemerval Zanella Message-ID: Date: Wed, 17 Mar 2021 08:34:00 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Mar 2021 11:34:14 -0000 On 16/03/2021 18:05, Carlos O'Donell wrote: > On 3/16/21 3:47 PM, Adhemerval Zanella via Libc-alpha wrote: >> >> >> On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: >>> On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell wrote: >>> >>>> My expectation is that normally aarch64 simply handles the unaligned load >>>> without any problems, >>>> but that it would be "better" if it were 16-bit aligned? >>>> Is this the *only* case of misaligned pointers? >>> >>> >>> Yes, this is the only case reported by UBSan. >>> >>>> Signed-off-by: Lirong Yuan >>>> We don't use DSOs in glibc, we assign copyright to the FSF, so this line >>>> would >>>> be normally removed, and you as the git author remains. >>> >>> >>> Thanks for the explanation! I will send an updated patch without >>> "Signed-off-by" if the current approach looks good. :) >>> >>> On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy wrote: >>> >>>> The 03/15/2021 21:44, Carlos O'Donell wrote: >>>>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >>>>>> steps to reproduce the problem: compile a program that uses ctype >>>> functions such as “isspace” for aarch64 with UBSan flag >>>> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode >>>> emulation. >>>>> >>>>> Szabolcs, >>>>> >>>>> Do you have any input on this? >>>>> >>>>>> observed behavior: UndefinedBehaviorSanitizer reports >>>> misaligned-pointer-use in the program. >>>>> >>>>> Yes, the char array could be misaligned with respect to a 16-bit value, >>>>> and should be aligned to the type that is expected from the interface >>>> e.g. >>>> >>>> using char[] as uint16_t[] is aliasing violation. and in principle >>>> alignas on the definition does not fix this, but in practice that's >>>> the only abi visible aspect of the wrong type. >>>> >>> >>> Alternatively, we can define _nl_C_LC_CTYPE_class and >>> _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, >>> like _nl_C_LC_CTYPE_toupper array: >>> https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper >>> Though the conversion may be error-prune and require more test cases... >>> It would seem that using alignas is an approach that's both technically >>> correct and less likely to cause havoc. >> >> Could you check if using the expected types yields any regression? >> All their usages are using explicit cast to the expected types, so >> I am can't see why they have declared as char at first place. > > It might work, but it: > > * changes the ABI of binary data exposed via nl_langinfo()? > - Pointers to the array are returned. > > * changes the on-disk layout and invalidates all system binary locales? > - Something similar happened for ALTMON and we just rebuilt everything. > - It isn't entirely clear to me what our guarantee of binary compatibility > is for compiled locales. > > Is this a concern? > But wouldn't the array alignment change cause this very changes as well? The ABI of binary data exposed is already exported as 'unsigned short', so not sure if will change anything here. I am not sure about on-disk layout and I don't recall the ALTMON change specifically. >>>> i'm not sure why ubsanitizer cares about alignment specifically on >>>> aarch64, unaligned load should work. >>>> >>> >>> Yes, the code works fine in practice on aarch64. The ubsan alignment is a >>> check for misaligned rather than unaligned. It's almost always worth fixing >>> since this can cause subtle and hard to track down failures that more often >>> manifest on other architectures. >> >> I would expect that it this is really accessed in an unaligned manner it >> would blow in some architectures (sparc and some arm and mips environments). >> Not sure why we haven't see any issues on such architectures. > > AFAICT the structure is *not* misaligned on x86_64 and it has to do with the > vagaries of the compiler and linker you use and the alignment of global > variables. On x86_64 the structures are all over-aligned. I expect the same > is true for all other arches. Perhaps Google used lld with an experimental > glibc patch and they have more tightly aligned object layout. So were it the case we got lucky that linker is laying out the data structures in the expected alignment? > > e.g. > readelf -a -W /lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > 23087: 0000000000176ec0 1024 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class32 > 23125: 00000000001761a0 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alpha > 23699: 0000000000176020 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_print > 23960: 0000000000175e40 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_alnum > 24152: 0000000000176200 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_lower > 24208: 00000000001760e0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_xdigit > 24547: 0000000000175fc0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_graph > 24569: 0000000000175f60 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_blank > 24582: 0000000000175ea0 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_punct > 24764: 0000000000176140 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_digit > 24883: 0000000000175f00 76 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_cntrl > 24991: 00000000001772c0 768 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class > 25344: 0000000000176260 72 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_upper > 25416: 0000000000176080 68 OBJECT LOCAL DEFAULT 17 _nl_C_LC_CTYPE_class_space > > Aligned on 16-byte boundaries. > > On AArch64 I see this: > eu-readelf -a -W lib64/libc.so.6 | grep _nl_C_LC_CTYPE_class > 34906: 000000000011eea8 1024 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class32 > 34945: 000000000011e1d0 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alpha > 35470: 000000000011e0a0 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_print > 35712: 000000000011df18 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_alnum > 35889: 000000000011e218 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_lower > 35944: 000000000011e138 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_xdigit > 36246: 000000000011e050 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_graph > 36268: 000000000011e008 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_blank > 36284: 000000000011df68 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_punct > 36455: 000000000011e188 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_digit > 36571: 000000000011dfb8 76 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_cntrl > 36670: 000000000011f2a8 768 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class > 36993: 000000000011e260 72 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_upper > 37061: 000000000011e0f0 68 OBJECT LOCAL DEFAULT 16 _nl_C_LC_CTYPE_class_space > > Aligned on 8-byte boundaries. > > But that alignment could go lower and still be valid and cause unaligned accesses. >