From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by sourceware.org (Postfix) with ESMTPS id 17A9E386C589 for ; Mon, 27 May 2024 12:45:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 17A9E386C589 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 17A9E386C589 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716813938; cv=none; b=Sm83taY9kG995i7+DprlMrsjS8oidiAqoGRAYqRXASHAoaYx/uuSSYAAyED/zgn6RmSeas8KAQApKVErMU9KMh+Flc9vvnWyHHYa9o5elcEODOsUaoZKTkBccytBEHs7oL+BjvfQbwCg1tC56H9+Fnn1pwjy2f0INzPTbqgTNB4= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1716813938; c=relaxed/simple; bh=Pyyof+10vI6b97hhDWzxePQZyVZgklk4PaLqPURqnr4=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:Message-ID:MIME-Version; b=Un2GofCoKW2Hj+P2OABg7lCAug9uWBr6lxJsCduv1Ec4/sXI72Fv5dQz1EoD/PRtzyteD+YTJ9VAkAJVEvn268vmeiM448LcqTvYgogG9zri2Fzzg9m5mFqtv/OYGqSG1xh472fA4zCdu07YYEyDenbxs5PV8TeMsXahoGB/XS0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from murzim.nue2.suse.org (unknown [10.168.4.243]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id E83D71FD71; Mon, 27 May 2024 12:45:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716813935; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+qZf/ve50iFDe6LuO0HQVlhoKcbmd8mC9+I2ylyBMNM=; b=kjo9yz8bD+Ih6+Ic8dLDmWGcv6czJJBRLFjUFZUNdY4tQyQfQazUyzoJO3yPDIsuAzgiO5 JU28oRkaTsVKgvHfiAKpXchkrpll4R13kgXuiUFcE6l0xdhq6VH0rntJs+kR5+nO+VKGEs MQARuqG4l4FmhRqGQG1aTZ9OKIu++H8= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716813935; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+qZf/ve50iFDe6LuO0HQVlhoKcbmd8mC9+I2ylyBMNM=; b=pYoFTk7jMEO/+wLeNBuaUQEc2YzIVMML4tCjRomT6j1tz3T/XMVAPpjxvYlwSiQMJlptk9 I3XIcjCztyZW8ZAg== Authentication-Results: smtp-out2.suse.de; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1716813934; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+qZf/ve50iFDe6LuO0HQVlhoKcbmd8mC9+I2ylyBMNM=; b=HbQvCZv3ITxVCZ+7FzjUUIDItmMC3esfWfTXBQb8fbsU2tAVLDZcsIR8yB8GbjEFGKlePo FtH+gNjywAROMEEzasdM6gByr/PXQd7reC1raYZZS/tGOvW8lO9ahZYvTl6S+73H9fHRU7 sjE6HiFuar+9ZehyCPAhj49rAu5i4Lw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1716813934; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=+qZf/ve50iFDe6LuO0HQVlhoKcbmd8mC9+I2ylyBMNM=; b=r55pQ8CFiTGxVzFHs/7SWdvZAspK2+cbgYvKyZfMXkja49XDH/mbIcIkdJUf1EqHjh5tHx vNAjmu/wLiD1IaBA== Date: Mon, 27 May 2024 14:45:34 +0200 (CEST) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com, tamar.christina@arm.com Subject: Re: [PATCH 1/5] Do single-lane SLP discovery for reductions Message-ID: <56o5906r-n418-sq7o-q444-5286551457n3@fhfr.qr> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Level: X-Spamd-Result: default: False [-3.29 / 50.00]; BAYES_HAM(-3.00)[100.00%]; FAKE_REPLY(1.00)[]; NEURAL_HAM_LONG(-0.99)[-0.994]; NEURAL_HAM_SHORT(-0.20)[-0.998]; MIME_GOOD(-0.10)[text/plain]; MISSING_XM_UA(0.00)[]; MIME_TRACE(0.00)[0:+]; ARC_NA(0.00)[]; RCVD_COUNT_ZERO(0.00)[0]; FROM_HAS_DN(0.00)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_NONE(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email] X-Spam-Score: -3.29 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 24 May 2024, Richard Biener wrote: > This is the second merge proposed from the SLP vectorizer branch. > I have again managed without adding and using --param vect-single-lane-slp > but instead this provides always enabled functionality. > > This makes us use SLP reductions (a group of reductions) for the > case where the group size is one. This basically means we try > to use SLP for all reductions. > > I've kept the series close to changes how they are on the branch > but in the end I'll squash it, having separate commits for review > eventually helps identifying common issues we will run into. In > particular we lack full SLP support for several reduction kinds > and the branch has more enabling patches than in this series. > For example 4/5 makes sure we use shifts and direct opcode > reductions in the reduction epilog for SLP reductions but doesn't > bother to try covering the general case but enables it only > for the single-element group case to avoid regressions > in gcc.dg/vect/reduc-{mul,or}_[12].c testcases. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also > successfully built SPEC CPU 2017. This posting should trigger > arm & riscv pre-checkin CI. > > There's one ICE in gcc.target/i386/pr51235.c I discovered late > that I will investigate and address after the weekend. I've fixed this now. On aarch64 and arm there's FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "VEC_PERM_EXPR" 0 which is a testism, I _think_ due to a bogus vect_load_lanes check in that line. The code is as expected not using a SLP reduction of two lanes due to the widen-sum pattern used. It might be that we somehow fail to use load-lanes when vectorizing the load with SLP which means that for SLP reductions we fail to consider load-lanes as override. I think we should leave this FAIL, we need to work to get load-lanes vectorization from SLP anyway. To fix this the load-permutation followup I have in the works will be necessary. I also see FAIL: gcc.target/aarch64/sve/dot_1.c scan-assembler-times \\twhilelo\\t 8 FAIL: gcc.target/aarch64/sve/reduc_4.c scan-assembler-not \\tfadd\\t FAIL: gcc.target/aarch64/sve/sad_1.c scan-assembler-times \\tudot\\tz[0-9]+\\.s, z[0-9]+\\.b, z[0-9]+\\.b\\n 2 but scan-assemblers are not my favorite. For example dot_1.c has twice as many whilelo, but I'm not sure what goes wrong. There are quite some regressions reported for RISC-V, I looked at the ICEs and fixed them but I did not investigate any of the assembly scanning FAILs. I'll re-spin the series with the fixes tomorrow. If anybody wants to point out something I should investigate please speak up. Thanks, Richard. > This change should be more straight-forward than the previous one, > still comments are of course welcome. After pushed I will followup > with changes to enable single-lane SLP reductions for various > COND_EXPR reductions as well as double-reduction support and > in-order reduction support (also all restricted to single-lane > for the moment). > > Thanks, > Richard. > > -- > > The following performs single-lane SLP discovery for reductions. > This exposes a latent issue with reduction SLP in outer loop > vectorization and makes gcc.dg/vect/vect-outer-4[fgkl].c FAIL > execution. > > * tree-vect-slp.cc (vect_build_slp_tree_2): Only multi-lane > discoveries are reduction chains and need special backedge > treatment. > (vect_analyze_slp): Fall back to single-lane SLP discovery > for reductions. Make sure to try single-lane SLP reduction > for all reductions as fallback. > --- > gcc/tree-vect-slp.cc | 71 +++++++++++++++++++++++++++++++++----------- > 1 file changed, 54 insertions(+), 17 deletions(-) > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index c7ed520b629..73cc69d85ce 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -1907,7 +1907,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, > /* Reduction chain backedge defs are filled manually. > ??? Need a better way to identify a SLP reduction chain PHI. > Or a better overall way to SLP match those. */ > - if (all_same && def_type == vect_reduction_def) > + if (stmts.length () > 1 > + && all_same && def_type == vect_reduction_def) > skip_args[loop_latch_edge (loop)->dest_idx] = true; > } > else if (def_type != vect_internal_def) > @@ -3905,9 +3906,10 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) > } > > /* Find SLP sequences starting from groups of reductions. */ > - if (loop_vinfo->reductions.length () > 1) > + if (loop_vinfo->reductions.length () > 0) > { > - /* Collect reduction statements. */ > + /* Collect reduction statements we can combine into > + a SLP reduction. */ > vec scalar_stmts; > scalar_stmts.create (loop_vinfo->reductions.length ()); > for (auto next_info : loop_vinfo->reductions) > @@ -3920,25 +3922,60 @@ vect_analyze_slp (vec_info *vinfo, unsigned max_tree_size) > reduction path. In that case we'd have to reverse > engineer that conversion stmt following the chain using > reduc_idx and from the PHI using reduc_def. */ > - && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def > - /* Do not discover SLP reductions for lane-reducing ops, that > - will fail later. */ > - && (!(g = dyn_cast (STMT_VINFO_STMT (next_info))) > + && STMT_VINFO_DEF_TYPE (next_info) == vect_reduction_def) > + { > + /* Do not discover SLP reductions combining lane-reducing > + ops, that will fail later. */ > + if (!(g = dyn_cast (STMT_VINFO_STMT (next_info))) > || (gimple_assign_rhs_code (g) != DOT_PROD_EXPR > && gimple_assign_rhs_code (g) != WIDEN_SUM_EXPR > - && gimple_assign_rhs_code (g) != SAD_EXPR))) > - scalar_stmts.quick_push (next_info); > + && gimple_assign_rhs_code (g) != SAD_EXPR)) > + scalar_stmts.quick_push (next_info); > + else > + { > + /* Do SLP discovery for single-lane reductions. */ > + vec stmts; > + vec roots = vNULL; > + vec remain = vNULL; > + stmts.create (1); > + stmts.quick_push (next_info); > + vect_build_slp_instance (vinfo, > + slp_inst_kind_reduc_group, > + stmts, roots, remain, > + max_tree_size, &limit, > + bst_map, NULL); > + } > + } > } > - if (scalar_stmts.length () > 1) > + /* Save for re-processing on failure. */ > + vec saved_stmts = scalar_stmts.copy (); > + vec roots = vNULL; > + vec remain = vNULL; > + if (scalar_stmts.length () <= 1 > + || !vect_build_slp_instance (loop_vinfo, > + slp_inst_kind_reduc_group, > + scalar_stmts, roots, remain, > + max_tree_size, &limit, bst_map, > + NULL)) > { > - vec roots = vNULL; > - vec remain = vNULL; > - vect_build_slp_instance (loop_vinfo, slp_inst_kind_reduc_group, > - scalar_stmts, roots, remain, > - max_tree_size, &limit, bst_map, NULL); > + if (scalar_stmts.length () <= 1) > + scalar_stmts.release (); > + /* Do SLP discovery for single-lane reductions. */ > + for (auto stmt_info : saved_stmts) > + { > + vec stmts; > + vec roots = vNULL; > + vec remain = vNULL; > + stmts.create (1); > + stmts.quick_push (vect_stmt_to_vectorize (stmt_info)); > + vect_build_slp_instance (vinfo, > + slp_inst_kind_reduc_group, > + stmts, roots, remain, > + max_tree_size, &limit, > + bst_map, NULL); > + } > + saved_stmts.release (); > } > - else > - scalar_stmts.release (); > } > } > > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)