From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mengyan1223.wang (mengyan1223.wang [89.208.246.23]) by sourceware.org (Postfix) with ESMTPS id 52285385841F for ; Thu, 24 Feb 2022 19:09:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 52285385841F Received: from [IPv6:240e:358:1147:5a00:dc73:854d:832e:4] (unknown [IPv6:240e:358:1147:5a00:dc73:854d:832e:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-384)) (Client did not present a certificate) (Authenticated sender: xry111@mengyan1223.wang) by mengyan1223.wang (Postfix) with ESMTPSA id EB9FB65B0A; Thu, 24 Feb 2022 14:09:41 -0500 (EST) Message-ID: <6349834d9ea31f579b04ba9215b6449ce13f008e.camel@mengyan1223.wang> Subject: Re: Libatomic 16B From: Xi Ruoyao To: Satish Vasudeva , gcc-help@gcc.gnu.org Date: Fri, 25 Feb 2022 03:09:34 +0800 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.4 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3031.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Feb 2022 19:09:48 -0000 On Wed, 2022-02-23 at 08:42 -0800, Satish Vasudeva via Gcc-help wrote: > Hi Team, > > I was looking at the hotspots in our software stack and interestingly I see > libat_load_16_i1 seems to be one of the top in the list. > > I am trying to understand why that is the case. My suspicion is some kind > of lock usage for 16B atomic accesses. > > I came across this discussion but frankly I am still confused. > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html > > Do you think the overhead of libat_load_16_i1 is due to spinlock usage? > Also reading some other Intel CPU docs, it seems like the CPU does support > loading 16B in single access. In that case can we optimize this for > performance? Open a issue at https://gcc.gnu.org/bugzilla, with the reference to the Intel CPU doc prove that some specific models supports loading 128-bit. Don't use "it seems like", nobody wants to write some nasty SSE code and then find it doesn't work on any CPU. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University