From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 25527 invoked by alias); 26 Jun 2018 16:01:31 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 19638 invoked by uid 89); 26 Jun 2018 16:01:25 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=H*r:sk:segher@ X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 26 Jun 2018 16:01:22 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id w5QG1GrF026256; Tue, 26 Jun 2018 11:01:17 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id w5QG1GrW026255; Tue, 26 Jun 2018 11:01:16 -0500 Date: Tue, 26 Jun 2018 16:01:00 -0000 From: Segher Boessenkool To: Aaron Sawdey Cc: GCC Patches Subject: Re: [PATCH, rs6000] don't use unaligned vsx for memset of less than 32 bytes Message-ID: <20180626160116.GL16221@gate.crashing.org> References: <979a1eeceb7c4c3f7b2068e9b924970760d695ff.camel@linux.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <979a1eeceb7c4c3f7b2068e9b924970760d695ff.camel@linux.ibm.com> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg01618.txt.bz2 Hi! On Mon, Jun 25, 2018 at 10:41:32AM -0500, Aaron Sawdey wrote: > In gcc 8 I added support for unaligned vsx in the builtin expansion of > memset(x,0,y). Turns out that for memset of less than 32 bytes, this > doesn't really help much, and it also runs into an egregious load-hit- > store case in CPU2006 components gcc and hmmer. > > This patch reverts to the previous (gcc 7) behavior for memset of 16-31 > bytes, which is to use vsx stores only if the target is 16 byte > aligned. For 32 bytes or more, unaligned vsx stores will still be used. > Performance testing of the memset expansion shows that not much is > given up by using scalar stores for 16-31 bytes, and CPU2006 runs show > the performance regression is fixed. > > Regstrap passes on powerpc64le, ok for trunk and backport to 8? Yes, okay for both. Thanks! Segher > 2018-06-25 Aaron Sawdey > > * config/rs6000/rs6000-string.c (expand_block_clear): Don't use > unaligned vsx for 16B memset.