From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 144BE3858418 for ; Fri, 19 Nov 2021 18:10:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 144BE3858418 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 1AJI9XNo007938; Fri, 19 Nov 2021 12:09:33 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 1AJI9WBc007937; Fri, 19 Nov 2021 12:09:32 -0600 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Fri, 19 Nov 2021 12:09:32 -0600 From: Segher Boessenkool To: "Paul A. Clarke" Cc: gcc-patches@gcc.gnu.org, wschmidt@linux.ibm.com Subject: Re: [PATCH] rs6000: Add optimizations for _mm_sad_epu8 Message-ID: <20211119180932.GG614@gate.crashing.org> References: <20211022172849.499625-1-pc@us.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20211022172849.499625-1-pc@us.ibm.com> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.2 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, KAM_NUMSUBJECT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 19 Nov 2021 18:10:35 -0000 Hi! On Fri, Oct 22, 2021 at 12:28:49PM -0500, Paul A. Clarke wrote: > Power9 ISA added `vabsdub` instruction which is realized in the > `vec_absd` instrinsic. > > Use `vec_absd` for `_mm_sad_epu8` compatibility intrinsic, when > `_ARCH_PWR9`. > > Also, the realization of `vec_sum2s` on little-endian includes > two shifts in order to position the input and output to match > the semantics of `vec_sum2s`: > - Shift the second input vector left 12 bytes. In the current usage, > that vector is `{0}`, so this shift is unnecessary, but is currently > not eliminated under optimization. The vsum2sws implementation uses an unspec, so there is almost no chance of anything with it being optimised :-( It rotates it right by 4 bytes btw, it's not a shift. > - Shift the vector produced by the `vsum2sws` instruction left 4 bytes. > The two words within each doubleword of this (shifted) result must then > be explicitly swapped to match the semantics of `_mm_sad_epu8`, > effectively reversing this shift. So, this shift (and a susequent swap) > are unnecessary, but not currently removed under optimization. Rotate left by 4 -- same thing once you consider word 0 and 2 are set to zeroes by the sum2sws. Not sure why it is not optimised, what do the dump files say? -dap and I'd start looking at the combine dump. > Using `__builtin_altivec_vsum2sws` retains both shifts, so is not an > option for removing the shifts. > > For little-endian, use the `vsum2sws` instruction directly, and > eliminate the explicit shift (swap). > > 2021-10-22 Paul A. Clarke > > gcc > * config/rs6000/emmintrin.h (_mm_sad_epu8): Use vec_absd > when _ARCH_PWR9, optimize vec_sum2s when LE. Please don't break changelog lines early. > - vmin = vec_min (a, b); > - vmax = vec_max (a, b); > +#ifndef _ARCH_PWR9 > + __v16qu vmin = vec_min (a, b); > + __v16qu vmax = vec_max (a, b); > vabsdiff = vec_sub (vmax, vmin); > +#else > + vabsdiff = vec_absd (a, b); > +#endif So hrm, maybe we should have the vec_absd macro (or the builtin) always, just expanding to three insns if necessary. Okay for trunk with approproate changelog and commit message changes. Thanks! Segher