From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 869343858C5F for ; Fri, 10 Feb 2023 11:02:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 869343858C5F Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 6A9E467461; Fri, 10 Feb 2023 11:02:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1676026952; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=qNOMfApo88niAcDzi634iLRThV3woDUTDsrvG5AyBZ4=; b=Fg8WLeRFjT/5FqRDLMcKtfXdqnbGulqRJh1zHuOcfHG84G9bkqO8t8DZdfTxRotBlkP50n qvv2CHzNKsJMI8UNXFJjGnPptBJ7rRGULGYjJvk/XbpPg7tKAdgcL/HDCAEy5PQx89N4S0 XTUPoVnoteWtA1D0FmiQLGVPC8IpUys= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1676026952; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=qNOMfApo88niAcDzi634iLRThV3woDUTDsrvG5AyBZ4=; b=tBrN55wzt3TT3xobJ9YsJOfKCUkeqQOCBcTq600if7U2+yoN5rK8/yummLSWgdxmr6ko6n e2W2vIZRhJnOGBAg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 403C52C141; Fri, 10 Feb 2023 11:02:32 +0000 (UTC) Date: Fri, 10 Feb 2023 11:02:32 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/108724 - vectorized code getting piecewise expanded User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,MISSING_MID,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Message-ID: <20230210110232.hFjz6z1SaWJSlyr3Jf6hy5HDJQYZMh0cdId0Meppww0@z> This fixes an oversight to when removing the hard limits on using generic vectors for the vectorizer to enable both SLP and BB vectorization to use those. The vectorizer relies on vector lowering to expand plus, minus and negate to bit operations but vector lowering has a hard limit on the minimum number of elements per work item. Vectorizer costs for the testcase at hand work out to vectorize a loop with just two work items per vector and that causes element wise expansion and spilling. The fix for now is to re-instantiate the hard limit, matching what vector lowering does. For the future the way to go is to emit the lowered sequence directly from the vectorizer instead. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. PR tree-optimization/108724 * tree-vect-stmts.cc (vectorizable_operation): Avoid using word_mode vectors when vector lowering will decompose them to elementwise operations. * gcc.target/i386/pr108724.c: New testcase. --- gcc/testsuite/gcc.target/i386/pr108724.c | 15 +++++++++++++++ gcc/tree-vect-stmts.cc | 14 ++++++++++++++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr108724.c diff --git a/gcc/testsuite/gcc.target/i386/pr108724.c b/gcc/testsuite/gcc.target/i386/pr108724.c new file mode 100644 index 00000000000..c4e0e918610 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108724.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -mno-sse" } */ + +int a[16], b[16], c[16]; +void foo() +{ + for (int i = 0; i < 16; i++) { + a[i] = b[i] + c[i]; + } +} + +/* When this is vectorized this shouldn't be expanded piecewise again + which will result in spilling for the upper half access. */ + +/* { dg-final { scan-assembler-not "\\\[er\\\]sp" } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c86249adcc3..09b5af603d2 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6315,6 +6315,20 @@ vectorizable_operation (vec_info *vinfo, return false; } + /* ??? We should instead expand the operations here, instead of + relying on vector lowering which has this hard cap on the number + of vector elements below it performs elementwise operations. */ + if (using_emulated_vectors_p + && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR) + && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4 + || maybe_lt (nunits_out, 4U))) + { + if (dump_enabled_p ()) + dump_printf (MSG_NOTE, "not using word mode for +- and less than " + "four vector elements\n"); + return false; + } + int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); -- 2.35.3