From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DAAC43858C5E for ; Tue, 1 Nov 2022 14:25:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DAAC43858C5E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 892081FB; Tue, 1 Nov 2022 07:25:29 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 896EE3F703; Tue, 1 Nov 2022 07:25:22 -0700 (PDT) From: Richard Sandiford To: Tamar Christina via Gcc-patches Mail-Followup-To: Tamar Christina via Gcc-patches ,Tamar Christina , nd@arm.com, rguenther@suse.de, richard.sandiford@arm.com Cc: Tamar Christina , nd@arm.com, rguenther@suse.de Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector References: Date: Tue, 01 Nov 2022 14:25:21 +0000 In-Reply-To: (Tamar Christina via Gcc-patches's message of "Mon, 31 Oct 2022 11:57:42 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-42.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina via Gcc-patches writes: > Hi All, > > The current vector extract pattern can only extract from a vector when the > position to extract is a multiple of the vector bitsize as a whole. > > That means extract something like a V2SI from a V4SI vector from position 32 > isn't possible as 32 is not a multiple of 64. Ideally this optab should have > worked on multiple of the element size, but too many targets rely on this > semantic now. > > So instead add a new case which allows any extraction as long as the bit pos > is a multiple of the element size. We use a VEC_PERM to shuffle the elements > into the bottom parts of the vector and then use a subreg to extract the values > out. This now allows various vector operations that before were being > decomposed into very inefficient scalar operations. > > NOTE: I added 3 testcases, I only fixed the 3rd one. > > The 1st one missed because we don't optimize VEC_PERM expressions into > bitfields. The 2nd one is missed because extract_bit_field only works on > vector modes. In this case the intermediate extract is DImode. > > On targets where the scalar mode is tiable to vector modes the extract should > work fine. > > However I ran out of time to fix the first two and so will do so in GCC 14. > For now this catches the case that my pattern now introduces more easily. > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * expmed.cc (extract_bit_field_1): Add support for vector element > extracts. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/ext_1.c: New. > > --- inline copy of patch -- > diff --git a/gcc/expmed.cc b/gcc/expmed.cc > index bab020c07222afa38305ef8d7333f271b1965b78..ffdf65210d17580a216477cfe4ac1598941ac9e4 100644 > --- a/gcc/expmed.cc > +++ b/gcc/expmed.cc > @@ -1718,6 +1718,45 @@ extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum, > return target; > } > } > + else if (!known_eq (bitnum, 0U) > + && multiple_p (GET_MODE_UNIT_BITSIZE (tmode), bitnum, &pos)) > + { > + /* The encoding has a single stepped pattern. */ > + poly_uint64 nunits = GET_MODE_NUNITS (new_mode); > + int nelts = nunits.to_constant (); > + vec_perm_builder sel (nunits, nelts, 1); > + int delta = -pos.to_constant (); > + for (int i = 0; i < nelts; ++i) > + sel.quick_push ((i - delta) % nelts); > + vec_perm_indices indices (sel, 1, nunits); Thanks for doing this, looks good. But I don't think the to_constant calls are safe. new_mode and pos could in principle be nonconstant. To build a stepped pattern, we just need: vec_perm_builder sel (nunits, 1, 3); and then push pos, pos + 1, and pos + 2 to it. There's no need to clamp the position to nelts, it happens automatically. > + > + if (can_vec_perm_const_p (new_mode, new_mode, indices, false)) > + { > + class expand_operand ops[4]; > + machine_mode outermode = new_mode; > + machine_mode innermode = tmode; > + enum insn_code icode > + = direct_optab_handler (vec_perm_optab, outermode); > + target = gen_reg_rtx (outermode); > + if (icode != CODE_FOR_nothing) > + { > + rtx sel = vec_perm_indices_to_rtx (outermode, indices); > + create_output_operand (&ops[0], target, outermode); > + ops[0].target = 1; > + create_input_operand (&ops[1], op0, outermode); > + create_input_operand (&ops[2], op0, outermode); > + create_input_operand (&ops[3], sel, outermode); I think this should be GET_MODE (sel). Looks like the current version would ICE for float vectors. That said... > + if (maybe_expand_insn (icode, 4, ops)) > + return simplify_gen_subreg (innermode, target, outermode, 0); > + } > + else if (targetm.vectorize.vec_perm_const != NULL) > + { > + if (targetm.vectorize.vec_perm_const (outermode, outermode, > + target, op0, op0, indices)) > + return simplify_gen_subreg (innermode, target, outermode, 0); > + } ...can we use expand_vec_perm_const here? It will try the constant expansion first, which is the preferred order. It also has a few variations up its sleeve. Thanks, Richard > + } > + } > } > > /* See if we can get a better vector mode before extracting. */ > diff --git a/gcc/testsuite/gcc.target/aarch64/ext_1.c b/gcc/testsuite/gcc.target/aarch64/ext_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e571b3bc2ddf887a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/ext_1.c > @@ -0,0 +1,54 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +#include > + > +typedef unsigned int v4si __attribute__((vector_size (16))); > +typedef unsigned int v2si __attribute__((vector_size (8))); > + > +/* > +** extract: { xfail *-*-* } > +** ext v0.16b, v0.16b, v0.16b, #4 > +** ret > +*/ > +v2si extract (v4si x) > +{ > + v2si res = {x[1], x[2]}; > + return res; > +} > + > +/* > +** extract1: { xfail *-*-* } > +** ext v0.16b, v0.16b, v0.16b, #4 > +** ret > +*/ > +v2si extract1 (v4si x) > +{ > + v2si res; > + memcpy (&res, ((int*)&x)+1, sizeof(res)); > + return res; > +} > + > +typedef struct cast { > + int a; > + v2si b __attribute__((packed)); > +} cast_t; > + > +typedef union Data { > + v4si x; > + cast_t y; > +} data; > + > +/* > +** extract2: > +** ext v0.16b, v0.16b, v0.16b, #4 > +** ret > +*/ > +v2si extract2 (v4si x) > +{ > + data d; > + d.x = x; > + return d.y.b; > +} > +