From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 96F0A3858C5F for ; Fri, 10 Feb 2023 11:18:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 96F0A3858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F4EF4B3; Fri, 10 Feb 2023 03:19:36 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.99.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6DAFB3F8C6; Fri, 10 Feb 2023 03:18:53 -0800 (PST) From: Richard Sandiford To: Richard Biener via Gcc-patches Mail-Followup-To: Richard Biener via Gcc-patches ,Richard Biener , richard.sandiford@arm.com Cc: Richard Biener Subject: Re: [PATCH] tree-optimization/108724 - vectorized code getting piecewise expanded References: <20230210110251.62A5B385B52C@sourceware.org> Date: Fri, 10 Feb 2023 11:18:52 +0000 In-Reply-To: <20230210110251.62A5B385B52C@sourceware.org> (Richard Biener via Gcc-patches's message of "Fri, 10 Feb 2023 11:02:32 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-35.6 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener via Gcc-patches writes: > This fixes an oversight to when removing the hard limits on using > generic vectors for the vectorizer to enable both SLP and BB > vectorization to use those. The vectorizer relies on vector lowering > to expand plus, minus and negate to bit operations but vector > lowering has a hard limit on the minimum number of elements per > work item. Vectorizer costs for the testcase at hand work out > to vectorize a loop with just two work items per vector and that > causes element wise expansion and spilling. > > The fix for now is to re-instantiate the hard limit, matching what > vector lowering does. For the future the way to go is to emit the > lowered sequence directly from the vectorizer instead. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? LGTM after reading the vector lowering stuff in the PR trail. TBH I don't remember when the hard limit was removed though. Thanks, Richard > > Thanks, > Richard. > > PR tree-optimization/108724 > * tree-vect-stmts.cc (vectorizable_operation): Avoid > using word_mode vectors when vector lowering will > decompose them to elementwise operations. > > * gcc.target/i386/pr108724.c: New testcase. > --- > gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++++++++++++++ > gcc/tree-vect-stmts.cc | 14 ++++++++++++++ > 2 files changed, 29 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/i386/pr108724.c > > diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c b/gcc/testsuite/gcc.target/i386/pr108724.c > new file mode 100644 > index 00000000000..c4e0e918610 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr108724.c > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -mno-sse" } */ > + > +int a[16], b[16], c[16]; > +void foo() > +{ > + for (int i = 0; i < 16; i++) { > + a[i] = b[i] + c[i]; > + } > +} > + > +/* When this is vectorized this shouldn't be expanded piecewise again > + which will result in spilling for the upper half access. */ > + > +/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */ > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > index c86249adcc3..09b5af603d2 100644 > --- a/gcc/tree-vect-stmts.cc > +++ b/gcc/tree-vect-stmts.cc > @@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo, > return false; > } > > + /* ??? We should instead expand the operations here, instead of > + relying on vector lowering which has this hard cap on the number > + of vector elements below it performs elementwise operations. */ > + if (using_emulated_vectors_p > + && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR) > + && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4 > + || maybe_lt (nunits_out, 4U))) > + { > + if (dump_enabled_p ()) > + dump_printf (MSG_NOTE, "not using word mode for +- and less than " > + "four vector elements\n"); > + return false; > + } > + > int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); > vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); > internal_fn cond_fn = get_conditional_internal_fn (code);