From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by sourceware.org (Postfix) with ESMTPS id 4E96A3860C38 for ; Tue, 16 Mar 2021 19:47:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4E96A3860C38 Received: by mail-qt1-x831.google.com with SMTP id 94so12566657qtc.0 for ; Tue, 16 Mar 2021 12:47:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:references:from:subject:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=mSGETMHCD5LZM5RvG2uBu6TGQ/6nb7K/B42XbmV93y0=; b=ssjQvSWiC+1mk90sQqntcwkKJY4ioWQBLCUJYcIo7nxq7USR7zYtru6jF9BUKxBffN Wz00Gm25udbQUuEnf7V/wFqLHRkMlXsnNZe6zkqCb+h/TbTCw53pPjs7Lhy+3d6txZvT k1lpMrYAStvS54gZAd7NigCCB3lac5wDBHoUBnqfw5itT2RK683XvAkTrLCOQ5uEJwgW FI5Bx7OfPruSQJnVGvVt8ij0huttwvwHwf3EWR6RA2mqPMfQ3DwQp0MTEVct2qsRIZVH dJAgneIeMt0y9NL5dA35FYCMCXILYKCZ19IdUFnsfBfcuDtJ227JOw3yeXtO7qzrfvxQ uqJA== X-Gm-Message-State: AOAM532rbup96row1jui4yc2DDhWzf+dnRcWNecpW48/MlDmunjCXUt8 I2j2eSI19M7Sn+cgs9Vs+hluqw== X-Google-Smtp-Source: ABdhPJyLpimDJvFNfuFw/upUlU4r2Pbu3xCz/aWl38Zr23zaUnDMPuMZCkYfDEtEWvg5r7IZqDJ39Q== X-Received: by 2002:ac8:5e07:: with SMTP id h7mr487750qtx.225.1615924028826; Tue, 16 Mar 2021 12:47:08 -0700 (PDT) Received: from [192.168.1.4] ([177.194.48.209]) by smtp.googlemail.com with ESMTPSA id w197sm15501912qkb.89.2021.03.16.12.47.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 16 Mar 2021 12:47:08 -0700 (PDT) To: libc-alpha@sourceware.org, Shu-Chun Weng , Lirong Yuan , Szabolcs Nagy References: <20210315184211.4124573-1-yuanzi@google.com> <2003e08c-55e2-80fa-89a6-fb8d59cc0e77@redhat.com> <20210316142816.GC4427@arm.com> From: Adhemerval Zanella Subject: Re: [PATCH] locale: align _nl_C_LC_CTYPE_class and _nl_C_LC_CTYPE_class32 arrays to uint16_t and uint32_t respectively Message-ID: Date: Tue, 16 Mar 2021 16:47:05 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Mar 2021 19:47:10 -0000 On 16/03/2021 16:05, Lirong Yuan via Libc-alpha wrote: > On Mon, Mar 15, 2021 at 6:45 PM Carlos O'Donell wrote: > >> My expectation is that normally aarch64 simply handles the unaligned load >> without any problems, >> but that it would be "better" if it were 16-bit aligned? >> Is this the *only* case of misaligned pointers? > > > Yes, this is the only case reported by UBSan. > >> Signed-off-by: Lirong Yuan >> We don't use DSOs in glibc, we assign copyright to the FSF, so this line >> would >> be normally removed, and you as the git author remains. > > > Thanks for the explanation! I will send an updated patch without > "Signed-off-by" if the current approach looks good. :) > > On Tue, Mar 16, 2021 at 7:28 AM Szabolcs Nagy wrote: > >> The 03/15/2021 21:44, Carlos O'Donell wrote: >>> On 3/15/21 2:42 PM, Lirong Yuan via Libc-alpha wrote: >>>> steps to reproduce the problem: compile a program that uses ctype >> functions such as “isspace” for aarch64 with UBSan flag >> “-fsanitize=undefined” and run it on x86_64 machines with qemu user mode >> emulation. >>> >>> Szabolcs, >>> >>> Do you have any input on this? >>> >>>> observed behavior: UndefinedBehaviorSanitizer reports >> misaligned-pointer-use in the program. >>> >>> Yes, the char array could be misaligned with respect to a 16-bit value, >>> and should be aligned to the type that is expected from the interface >> e.g. >> >> using char[] as uint16_t[] is aliasing violation. and in principle >> alignas on the definition does not fix this, but in practice that's >> the only abi visible aspect of the wrong type. >> > > Alternatively, we can define _nl_C_LC_CTYPE_class and > _nl_C_LC_CTYPE_class32 arrays directly as uint16_t and uint32_t arrays, > like _nl_C_LC_CTYPE_toupper array: > https://code.woboq.org/userspace/glibc/locale/C-ctype.c.html#_nl_C_LC_CTYPE_toupper > Though the conversion may be error-prune and require more test cases... > It would seem that using alignas is an approach that's both technically > correct and less likely to cause havoc. Could you check if using the expected types yields any regression? All their usages are using explicit cast to the expected types, so I am can't see why they have declared as char at first place. > > >> i'm not sure why ubsanitizer cares about alignment specifically on >> aarch64, unaligned load should work. >> > > Yes, the code works fine in practice on aarch64. The ubsan alignment is a > check for misaligned rather than unaligned. It's almost always worth fixing > since this can cause subtle and hard to track down failures that more often > manifest on other architectures. I would expect that it this is really accessed in an unaligned manner it would blow in some architectures (sparc and some arm and mips environments). Not sure why we haven't see any issues on such architectures.