From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenth@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1666)
	id DCFCE385B52C; Fri, 10 Feb 2023 11:22:01 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DCFCE385B52C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1676028121;
	bh=QpcKVEQn15zYCftSdERNkIO/DKGzVlcqLVXrhs7aCN4=;
	h=From:To:Subject:Date:From;
	b=Mh6Ct3GMhnQB9IRxRwqOJuNFPRgL97QMnvTEgkJ1RrkGn72fgeVy9cAwzakEH3LDm
	 HqqUziRz5bNe7SaCTcDfpt7GfEqmG4y7E8pfXwA7INASeuPZ54XRWYLm8gNI3ZyONU
	 36Tpfr/+4D+rFfoVsfW+2WwWJybnIc4TrKQoDJaU=
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="utf-8"
From: Richard Biener <rguenth@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r13-5771] tree-optimization/108724 - vectorized code getting
 piecewise expanded
X-Act-Checkin: gcc
X-Git-Author: Richard Biener <rguenther@suse.de>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: 2a37a4a3cbfaecb6c7666109353bb4d5c97b0702
X-Git-Newrev: dc87e1391c55c666c7ff39d4f0dea87666f25468
Message-Id: <20230210112201.DCFCE385B52C@sourceware.org>
Date: Fri, 10 Feb 2023 11:22:01 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:dc87e1391c55c666c7ff39d4f0dea87666f25468

commit r13-5771-gdc87e1391c55c666c7ff39d4f0dea87666f25468
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Feb 10 11:07:30 2023 +0100

    tree-optimization/108724 - vectorized code getting piecewise expanded
    
    This fixes an oversight to when removing the hard limits on using
    generic vectors for the vectorizer to enable both SLP and BB
    vectorization to use those.  The vectorizer relies on vector lowering
    to expand plus, minus and negate to bit operations but vector
    lowering has a hard limit on the minimum number of elements per
    work item.  Vectorizer costs for the testcase at hand work out
    to vectorize a loop with just two work items per vector and that
    causes element wise expansion and spilling.
    
    The fix for now is to re-instantiate the hard limit, matching what
    vector lowering does.  For the future the way to go is to emit the
    lowered sequence directly from the vectorizer instead.
    
            PR tree-optimization/108724
            * tree-vect-stmts.cc (vectorizable_operation): Avoid
            using word_mode vectors when vector lowering will
            decompose them to elementwise operations.
    
            * gcc.target/i386/pr108724.c: New testcase.

Diff:
---
 gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++++++++++++++
 gcc/tree-vect-stmts.cc                   | 14 ++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c b/gcc/testsuite/gcc.target/i386/pr108724.c
new file mode 100644
index 00000000000..c4e0e918610
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr108724.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mno-sse" } */
+
+int a[16], b[16], c[16];
+void foo()
+{
+  for (int i = 0; i < 16; i++) {
+    a[i] = b[i] + c[i];
+  }
+}
+
+/* When this is vectorized this shouldn't be expanded piecewise again
+   which will result in spilling for the upper half access.  */
+
+/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index c86249adcc3..09b5af603d2 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo,
       return false;
     }
 
+  /* ???  We should instead expand the operations here, instead of
+     relying on vector lowering which has this hard cap on the number
+     of vector elements below it performs elementwise operations.  */
+  if (using_emulated_vectors_p
+      && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)
+      && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4
+	  || maybe_lt (nunits_out, 4U)))
+    {
+      if (dump_enabled_p ())
+	dump_printf (MSG_NOTE, "not using word mode for +- and less than "
+		     "four vector elements\n");
+      return false;
+    }
+
   int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info);
   vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL);
   internal_fn cond_fn = get_conditional_internal_fn (code);