From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4BE903852C4A for ; Wed, 23 Nov 2022 16:17:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4BE903852C4A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 324431FB; Wed, 23 Nov 2022 08:18:03 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C2EE13F73B; Wed, 23 Nov 2022 08:17:55 -0800 (PST) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com, richard.sandiford@arm.com Cc: gcc-patches@gcc.gnu.org, nd@arm.com, Richard.Earnshaw@arm.com, Marcus.Shawcroft@arm.com, Kyrylo.Tkachov@arm.com Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830] References: Date: Wed, 23 Nov 2022 16:17:54 +0000 In-Reply-To: (Tamar Christina's message of "Wed, 23 Nov 2022 14:24:44 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-39.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: > Hi All, > > SVE has an actual division optab, and when using -Os we don't > optimize the division away. This means that we need to distinguish > between a div which we can optimize and one we cannot even during > expansion. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > PR target/107830 > * config/aarch64/aarch64.cc > (aarch64_vectorize_can_special_div_by_constant): Check validity during > codegen phase as well. > > gcc/testsuite/ChangeLog: > > PR target/107830 > * gcc.target/aarch64/sve2/pr107830.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc > index 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45702961b2d7ab55d 100644 > --- a/gcc/config/aarch64/aarch64.cc > +++ b/gcc/config/aarch64/aarch64.cc > @@ -24322,12 +24322,15 @@ aarch64_vectorize_can_special_div_by_constant (enum tree_code code, > if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) > return false; > > + wide_int val = wi::add (cst, 1); > + int pow = wi::exact_log2 (val); > + bool valid_p = pow == (int)(element_precision (vectype) / 2); > + /* SVE actually has a div operator, we we may have gotten here through > + that route. */ > if (in0 == NULL_RTX && in1 == NULL_RTX) > - { > - wide_int val = wi::add (cst, 1); > - int pow = wi::exact_log2 (val); > - return pow == (int)(element_precision (vectype) / 2); > - } > + return valid_p; > + else if (!valid_p) > + return false; Is this equivalent to: int pow = wi::exact_log2 (cst + 1); if (pow != (int) (element_precision (vectype) / 2)) return false; /* We can use the optimized pattern. */ if (in0 == NULL_RTX && in1 == NULL_RTX) return true; ? If so, I'd find that slightly easier to follow, but I realise it's personal taste. OK with that change if it works and you agree. While looking at this, I noticed that we ICE for: void f(unsigned short *restrict p1, unsigned int *restrict p2) { for (int i = 0; i < 16; ++i) { p1[i] /= 0xff; p2[i] += 1; } } for -march=armv8-a+sve2 -msve-vector-bits=512. I guess we need to filter out partial modes or (better) add support for them. Adding support for them probably requires changes to the underlying ADDHNB pattern. Thanks, Richard > if (!VECTOR_TYPE_P (vectype)) > return false; > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr107830.c b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830.c > new file mode 100644 > index 0000000000000000000000000000000000000000..6d8ee3615fdb0083dbde1e45a2826fb681726139 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target fopenmp } */ > +/* { dg-additional-options "-Os -fopenmp" } */ > + > +void > +f2 (int *a) > +{ > + unsigned int i; > + > +#pragma omp simd > + for (i = 0; i < 4; ++i) > + a[i / 3] -= 4; > +}