From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 89590 invoked by alias); 11 Dec 2017 13:11:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 89573 invoked by uid 89); 11 Dec 2017 13:11:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-16.4 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_NUMSUBJECT,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=k_8 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Dec 2017 13:11:42 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 35EB92D1EF9; Mon, 11 Dec 2017 13:11:41 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-34.ams2.redhat.com [10.36.116.34]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 965B62D076; Mon, 11 Dec 2017 13:11:40 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id vBBDBauk007078; Mon, 11 Dec 2017 14:11:37 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id vBBDBYPl007077; Mon, 11 Dec 2017 14:11:34 +0100 Date: Mon, 11 Dec 2017 13:11:00 -0000 From: Jakub Jelinek To: Kilian Verhetsel Cc: Richard Biener , Alan Hayward , GCC Patches , nd Subject: Re: [PATCH] Fix result for conditional reductions matching at index 0 Message-ID: <20171211131134.GL2353@tucnak> Reply-To: Jakub Jelinek References: <87d14brhj6.fsf@uclouvain.be> <87zi7fbn07.fsf@uclouvain.be> <87wp2ib6aj.fsf@uclouvain.be> <20171208181501.GB2353@tucnak> <87po7lleh4.fsf@uclouvain.be> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87po7lleh4.fsf@uclouvain.be> User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes X-SW-Source: 2017-12/txt/msg00610.txt.bz2 On Mon, Dec 11, 2017 at 11:56:55AM +0100, Kilian Verhetsel wrote: > Jakub Jelinek writes: > > As it doesn't apply, I can't easily check what the patch generates > > on the PR80631 testcases vs. my thoughts on that; though, if it emits > > something more complicated for the simple cases, perhaps we could improve > > those (not handle it like COND_REDUCTION, but instead as former > > INTEGER_INDUC_COND_REDUCTION and just use a different constant instead of 0 > > if 0 isn't usable for the condition never matched value. > > While you could use values different from 0, I'm not sure this can be > done as efficiently. 0 is convenient because a single bitwise-and > between the index vector and the condition will set lanes that do not > contain a match to 0. Of course it can be done efficiently, what we care most is that the body of the vectorized loop is efficient. Whether we choose -1, 0 or 124 as the COND_EXPR not ever meant value matters only before that loop (when we need to load that into a register holding vector of all those constants) and then a scalar comparison on the REDUC_* result. Load of -1 vector on some targets is as expensive as load of 0, for arbitrary value worst case it is one memory load compared to a specialized zero register (or set all bits) instruction. On the other side, by not using any offsetted iteration var, one can reuse the vector register that holds the IV, which can be used in some loops too and thus decrease register pressure. And while comparison against 0 is sometimes one scalar insn cheaper than comparison against other value, if the insn producing it already sets the flags, I doubt it is the case here, so it is exactly the same cost. Not to mention that in your patch you are instead subtracting one in the scalar code. > Jakub Jelinek writes: > > First of all, I fail to see why we don't handle negative step, > > that can be done with REDUC_MIN instead of REDUC_MAX. > > That would not work without first using values different from 0 to > indicate the absence of matches (except in cases where all indices are > negative), which I assume is why the test was there in the first place. > > Below is the patch with fixed formatting and changes to make it apply > cleanly to r255537 (as far as I can tell this was simply caused by some > variables and constants having been renamed). Thanks, it applies cleanly now > + else if ((STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION > + || (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) > + == INTEGER_INDUC_COND_REDUCTION)) > + && reduc_fn == IFN_LAST)» contains a character at the end of line that makes it not to compile. Trying to understand your patch, here is the difference with your patch between additional: --- tree-vect-loop.c 2017-12-11 13:39:35.619122907 +0100 +++ tree-vect-loop.c 2017-12-11 13:35:27.000000000 +0100 @@ -6021,8 +6021,8 @@ vectorizable_reduction (gimple *stmt, gi dump_printf_loc (MSG_NOTE, vect_location, "condition expression based on " "integer induction.\n"); - STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - = INTEGER_INDUC_COND_REDUCTION; +/* STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + = INTEGER_INDUC_COND_REDUCTION; */ } so that COND_REDUCTION is used, and the case with INTEGER_INDUC_COND_REDUCTION with your patch on: int v[256] = { 77, 1, 79, 3, 4, 5, 6, 7 }; __attribute__((noipa)) void foo () { int k, r = -1; for (k = 0; k < 256; k++) if (v[k] == 77) r = k; if (r != 0) __builtin_abort (); } vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 }; vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 }; + vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 }; [local count: 139586436]: # k_12 = PHI # r_13 = PHI # ivtmp_11 = PHI # vect_vec_iv_.0_22 = PHI - # vect_r_3.1_24 = PHI + # vect_r_3.1_24 = PHI # vectp_v.2_25 = PHI - # ivtmp_30 = PHI - # _32 = PHI <_33(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> - # ivtmp_43 = PHI + # ivtmp_31 = PHI + # _33 = PHI <_34(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> + # ivtmp_41 = PHI vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21; vect__1.4_27 = MEM[(int *)vectp_v.2_25]; _1 = v[k_12]; vect_r_3.6_29 = VEC_COND_EXPR ; r_3 = _1 == 77 ? k_12 : r_13; k_8 = k_12 + 1; ivtmp_2 = ivtmp_11 - 1; vectp_v.2_26 = vectp_v.2_25 + 32; - _33 = VEC_COND_EXPR ; - ivtmp_31 = ivtmp_30 + { 8, 8, 8, 8, 8, 8, 8, 8 }; - ivtmp_44 = ivtmp_43 + 1; - if (ivtmp_44 < 32) + _34 = VEC_COND_EXPR ; + ivtmp_32 = ivtmp_31 + { 8, 8, 8, 8, 8, 8, 8, 8 }; + ivtmp_42 = ivtmp_41 + 1; + if (ivtmp_42 < 32) goto ; [92.31%] else goto ; [7.69%] ... [local count: 10737418]: # r_19 = PHI - # vect_r_3.6_34 = PHI - # _45 = PHI <_33(3)> - _35 = REDUC_MAX (_45); - _36 = {_35, _35, _35, _35, _35, _35, _35, _35}; - _37 = { 0, 0, 0, 0, 0, 0, 0, 0 }; - _38 = _45 == _36; - _39 = VEC_COND_EXPR <_38, vect_r_3.6_34, _37>; - _40 = VIEW_CONVERT_EXPR(_39); - _41 = REDUC_MAX (_40); - _42 = (int) _41; - if (_42 != 0) + # vect_r_3.6_35 = PHI + # _43 = PHI <_34(3)> + _36 = REDUC_MAX (_43); + _37 = {_36, _36, _36, _36, _36, _36, _36, _36}; + _38 = _36 + 4294967295; + _39 = (int) _38; + stmp_r_3.7_40 = _36 == 0 ? -1 : _39; + if (stmp_r_3.7_40 != 0) Nothing uses vect_cst__30, the only real change is using 0 instead of -1 as the starting value for vect_r_3.1_24, and the REDUC_MAX stuff, where the most important is that vect_r_3.6_35 is completely ignored with INTEGER_INDUC_COND_EXPR and thus we rely on DCE to remove it. So, my preference would be your patch goes in without the » character and then we incrementally add SIMPLE_INTEGER_INDUC_COND_EXPR which will be used if vectorizable_reduction determines possibilities to use some value smaller than all iterations, and then tweak the cases that can be handled with your INTEGER_INDUC_COND_EXPR or SIMPLE_INTEGER_INDUC_COND_EXPR instead of COND_EXPR (negative step, variable base). Jakub