From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id 5E68E3952000 for ; Tue, 4 May 2021 11:07:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5E68E3952000 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-601-zxYwHpUsNsGapQe-0mI8Bg-1; Tue, 04 May 2021 07:07:25 -0400 X-MC-Unique: zxYwHpUsNsGapQe-0mI8Bg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8CEB26972F; Tue, 4 May 2021 11:07:23 +0000 (UTC) Received: from oldenburg.str.redhat.com (ovpn-112-137.ams2.redhat.com [10.36.112.137]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7B5CA101F965; Tue, 4 May 2021 11:07:22 +0000 (UTC) From: Florian Weimer To: Szabolcs Nagy Cc: Wilco Dijkstra , Wilco Dijkstra via Libc-alpha Subject: Re: [PATCH 0/5] Added optimized memcpy/memmove/memset for A64FX References: <20210430150127.GV9028@arm.com> <87eeer4woe.fsf@oldenburg.str.redhat.com> <20210504075643.GX9028@arm.com> <87a6pan6pr.fsf@oldenburg.str.redhat.com> <20210504104243.GY9028@arm.com> Date: Tue, 04 May 2021 13:07:37 +0200 In-Reply-To: <20210504104243.GY9028@arm.com> (Szabolcs Nagy's message of "Tue, 4 May 2021 11:42:44 +0100") Message-ID: <87o8dqlpty.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 04 May 2021 11:07:29 -0000 * Szabolcs Nagy: > The 05/04/2021 12:17, Florian Weimer wrote: >> * Szabolcs Nagy: >> >> > The 04/30/2021 16:40, Wilco Dijkstra wrote: >> >> >> Well it doesn't seem to behave like a NOP. So to avoid slowing down >> >> >> all string functions, bti c must be removed completely, not just from >> >> >> A64FX memcpy. Using a real NOP is fine in all cases as long as >> >> >> HAVE_AARCH64_BTI is not defined. >> >> > >> >> > I'm probably confused, but: If BTI is active, many more glibc functions >> >> > will have BTI markers. What makes the string functions special? >> >> >> >> Exactly. And at that point trying to remove it from memcpy is just pointless. >> >> >> >> The case we are discussing is where BTI is not turned on in GLIBC but we still >> >> emit a BTI at the start of assembler functions for simplicity. By using a NOP >> >> instead, A64FX will not execute BTI anywhere in GLIBC. >> > >> > the asm ENTRY was written with the assumption that bti c >> > behaves like a nop when bti is disabled, so we don't have >> > to make the asm conditional based on cflags. >> > >> > if that's not the case i agree with the patch, however we >> > will have to review some other code (e.g. libgcc outline >> > atomics asm) where we made the same assumption. >> >> I find this discussion extremely worrisome. If bti c does not behave >> like a nop, then we need a new AArch64 ABI variant to enable BTI. >> >> That being said, a distribution with lots of bti c instructions in >> binaries seems to run on A64FX CPUs, so I'm not sure what is going on. > > this does not have correctness impact, only performance impact. > > hint space instructions are seem slower than expected on a64fx. > > which means unconditionally adding bti c to asm entry code is not > ideal if somebody tries to build a system without branch-protection. > distros that build all binaries with branch protection will just > take a performance hit on a64fx, we cant fix that easily. I think I see it now. It's not critically slow, but there appears to be observable impact. I'm still worried. Thanks, Florian