From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by sourceware.org (Postfix) with ESMTPS id B5E89385B832 for ; Mon, 6 Apr 2020 12:30:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org B5E89385B832 Received: by mail-lj1-x243.google.com with SMTP id p10so14493584ljn.1 for ; Mon, 06 Apr 2020 05:30:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=hqH8/MZLbZGwn5+FNwn2Cd6aHvA5q5eZzSKoeoGHgYg=; b=ulLmrdJGsTCXiv+Ez9yApWf2NWC0htVezSsms+k7SmAQ3Wbccr15UPyqNPNVdEAo8g 6oQG8ZyQfS/IR/4JOkoVImiGY60+zFcsn2ERHl5sFtcTCJ67Lb84V0W444/kwt4axhyf ni0piAuT51zV/corRKz/I8bCwkdZjfEfV3DEXdyKR3JhAZLFiY5cT/8u9doXNKqF3HtF y5/KlTnlXLPuxgh+Io7cbls5QE1pc/mBbG98qqve4sXqfhCs28xffKoNqnrttnTYIF5R p6nqaYr+brcoU6snQwUT7DH3d2JYvqaCMSJiXvSXCuhXcSAT08xEE1bcfTFB26P3u61z BHLg== X-Gm-Message-State: AGi0PuaqLVYbKxranCDc6S5egJ0h3sRYcYuFlwZ20xjvv+FUwovf6WzN 1AvvNXxhLXPc15eTfj8NHblsIGUI87LmlNiD4V0= X-Google-Smtp-Source: APiQypKxxNLOJSPv4m3o3W12vwdcOrUfmjZkhJtAKTXGPZQfM6o306X3zqJAopuWpf2+1rmo/ut63tb0I5qt2VegrBY= X-Received: by 2002:a2e:7c1a:: with SMTP id x26mr11352820ljc.209.1586176212157; Mon, 06 Apr 2020 05:30:12 -0700 (PDT) MIME-Version: 1.0 References: <366dfa22-58e7-e1df-62c3-cc98082a552c@suse.cz> <0e5c0fa7-9719-e3ab-b909-f1c0b052b283@suse.cz> In-Reply-To: From: Richard Biener Date: Mon, 6 Apr 2020 14:30:00 +0200 Message-ID: Subject: Re: [stage1][PATCH] Lower VEC_COND_EXPR into internal functions. To: =?UTF-8?Q?Martin_Li=C5=A1ka?= , GCC Patches , Richard Sandiford Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-16.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2020 12:30:15 -0000 On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford wrote: > > Martin Li=C5=A1ka writes: > > Hello. > > > > This is second attempt to get rid of tcc_comparison GENERIC trees > > to be used as the first argument of VEC_COND_EXPR. > > > > The patch attempts achieves that in the following steps: > > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME > > 2) since that tcc_comparsion can't be used as the first argument of VEC= _COND_EXPR > > (done in GIMPLE verifier) > > 3) I exposed new internal functions with: > > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond) > > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu) > > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq) > > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask) > > > > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel= pass > > 5) the pass expands VEC_COND_EXPR into one of the internal functions de= fined in 3) > > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p = when > > a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements > > > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and = skylake > > target and I don't see any reasonable change. > > > > Achieved benefits of the patch: > > - removal of a GENERIC expression being used in GIMPLE statements > > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM,= PRE) > > - possibility to expand smarter based on number of uses (expand_vec_cmp= _expr_p) > > > > Future plans: > > - tcc_comparison removal just during gimplification > > - removal of a code where these expressions are handled for VEC_COND_EX= PR > > - do the similar thing for COND_EXPR? > > > > The task was guided by Richi (Biener) and I bet he can help with both f= urther questions > > and reasoning. > > Thanks for doing this. It definitely seems more friendly than the > four-operand version to targets where separate comparisons are the norm. > > Just a couple of comments about the implementation: > > > diff --git a/gcc/passes.def b/gcc/passes.def > > index 2bf2cb78fc5..d654e5ee9fe 100644 > > --- a/gcc/passes.def > > +++ b/gcc/passes.def > > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3. If not see > > NEXT_PASS (pass_cleanup_eh); > > NEXT_PASS (pass_lower_resx); > > NEXT_PASS (pass_nrv); > > + NEXT_PASS (pass_gimple_isel); > > NEXT_PASS (pass_cleanup_cfg_post_optimizing); > > NEXT_PASS (pass_warn_function_noreturn); > > NEXT_PASS (pass_gen_hsail); > > What was the reason for making this a separate pass, rather than doing > it as part of veclower? If we do them separately, then it's harder for > veclower to know which VEC_COND_EXPRs it needs to open-code. (OK, so > that's a general problem between veclower and expand already, but it > seems like the new approach could help to move away from that by > doing the instruction selection directly in veclower.) As the name of the pass suggests it was supposed to be the starting point of doing all the "complex" (multi-GIMPLE-stmt matching) RTL expansion trick= s. But most importantly veclower is too early to catch CSE opportunities from loop opts on the conditions and if veclower lowers things then we also want CSE to cleanup its mess. I guess we also do not want veclower to be done before vectorization since it should be easier to re-vectorize from unsuppo= rted vector code than from what veclower makes out of it ... catch-22. So I consider pass placement a secondary issue for now. > > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal > > + function based on type of selected expansion. */ > > + > > +static gimple * > > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi) > > +{ > > + tree lhs, op0a =3D NULL_TREE, op0b =3D NULL_TREE; > > + enum tree_code code; > > + enum tree_code tcode; > > + machine_mode cmp_op_mode; > > + bool unsignedp; > > + enum insn_code icode; > > + imm_use_iterator imm_iter; > > + > > + /* Only consider code =3D=3D GIMPLE_ASSIGN. */ > > + gassign *stmt =3D dyn_cast (gsi_stmt (*gsi)); > > + if (!stmt) > > + return NULL; > > + > > + code =3D gimple_assign_rhs_code (stmt); > > + if (code !=3D VEC_COND_EXPR) > > + return NULL; > > + > > + tree op0 =3D gimple_assign_rhs1 (stmt); > > + tree op1 =3D gimple_assign_rhs2 (stmt); > > + tree op2 =3D gimple_assign_rhs3 (stmt); > > + lhs =3D gimple_assign_lhs (stmt); > > + machine_mode mode =3D TYPE_MODE (TREE_TYPE (lhs)); > > + > > + gcc_assert (!COMPARISON_CLASS_P (op0)); > > + if (TREE_CODE (op0) =3D=3D SSA_NAME) > > + { > > + unsigned int used_vec_cond_exprs =3D 0; > > + gimple *use_stmt; > > + FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0) > > + { > > + gassign *assign =3D dyn_cast (use_stmt); > > + if (assign !=3D NULL && gimple_assign_rhs_code (assign) =3D=3D = VEC_COND_EXPR > > + && gimple_assign_rhs1 (assign) =3D=3D op0) > > + used_vec_cond_exprs++; > > + } > > This looks like it's quadratic in the worst case. Could we check > this in a different way? > > > + > > + gassign *def_stmt =3D dyn_cast (SSA_NAME_DEF_STMT (op= 0)); > > + if (def_stmt) > > + { > > + tcode =3D gimple_assign_rhs_code (def_stmt); > > + op0a =3D gimple_assign_rhs1 (def_stmt); > > + op0b =3D gimple_assign_rhs2 (def_stmt); > > + > > + tree op0a_type =3D TREE_TYPE (op0a); > > + if (used_vec_cond_exprs >=3D 2 > > It would be good if targets were able to provide only vcond_mask. > In that case I guess we should go this path if the later one would fail. > > > + && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type)) > > + !=3D CODE_FOR_nothing) > > + && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode= )) > > + { > > + /* Keep the SSA name and use vcond_mask. */ > > + tcode =3D TREE_CODE (op0); > > + } > > + } > > + else > > + tcode =3D TREE_CODE (op0); > > + } > > + else > > + tcode =3D TREE_CODE (op0); > > Might be easier to follow if tcode is TREE_CODE (op0) by default and > only gets changed when we want to fold in the comparison. > > Thanks, > Richard