From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 44935385841D for ; Mon, 24 Oct 2022 21:41:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 44935385841D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 29OLelrN015279; Mon, 24 Oct 2022 16:40:47 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 29OLekTm015276; Mon, 24 Oct 2022 16:40:46 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Mon, 24 Oct 2022 16:40:46 -0500 From: Segher Boessenkool To: HAO CHEN GUI Cc: gcc-patches , David , "Kewen.Lin" , Peter Bergner , xionghuluo@tencent.com Subject: Re: [PATCH-2, rs6000] Reverse V8HI on Power8 by vector rotation [PR100866] Message-ID: <20221024214046.GC25951@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi! On Mon, Oct 24, 2022 at 11:14:20AM +0800, HAO CHEN GUI wrote: > This patch implements V8HI byte reverse on Power8 by vector rotation. Please put *byte* reverse as the commit subject as well? > It should be effecient than orignial vector permute. The patch comes from > Xionghu's comments in PR. I just added a test case for it. Yeah, on all existing CPUs such a rotate is as fast or faster than a permute insn. And for bigger modes, we need more insns two dependent rotates for V4SI, and that is unlikely to be faster than a single permutation, certainly not if code can be unrolled. Okay for trunk. Thanks! Segher