From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id BE0EA3858D28 for ; Mon, 17 Jul 2023 09:30:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BE0EA3858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 6600521980; Mon, 17 Jul 2023 09:30:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1689586248; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yoYT59Z01hYeqRG3Vm6kK5vAzCMGGAkf/YLcvzockqs=; b=0XM5rRxYPX8KjcGvVTTPwYNAZxl2dAPvT9LlKPnJi5FdHduVNFpY97Mu7W1BrBMsjU7XCh kVCt/mW7GfMNeYHkoK7+aFkUShofiuLa3x+L52IwLIr+ZW4rVe2/CJ11FH/JybTdB1vdQG tQMky4MpMEF1xV4y98nvuvbSl5ZpejA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1689586248; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yoYT59Z01hYeqRG3Vm6kK5vAzCMGGAkf/YLcvzockqs=; b=jrcLPeZJ2AC3GJ+8Be4w3FnAiJRKAi2kqjExnNgVlK1134nBS+UWL8mJvjX5luwR52zi5c hNw1Xy6nZjnaTsBg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 574DE2C142; Mon, 17 Jul 2023 09:30:48 +0000 (UTC) Date: Mon, 17 Jul 2023 09:30:48 +0000 (UTC) From: Richard Biener To: Andrew Pinski cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH][RFC] tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-math In-Reply-To: Message-ID: References: <20230713095401.6C7E53858C41@sourceware.org> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, 14 Jul 2023, Andrew Pinski wrote: > On Thu, Jul 13, 2023 at 2:54?AM Richard Biener via Gcc-patches > wrote: > > > > The following makes sure that FP x > y ? x : y style max/min operations > > are if-converted at the GIMPLE level. While we can neither match > > it to MAX_EXPR nor .FMAX as both have different semantics with IEEE > > than the ternary ?: operation we can make sure to maintain this form > > as a COND_EXPR so backends have the chance to match this to instructions > > their ISA offers. > > > > The patch does this in phiopt where we recognize min/max and instead > > of giving up when we have to honor NaNs we alter the generated code > > to a COND_EXPR. > > > > This resolves PR88540 and we can then SLP vectorize the min operation > > for its testcase. It also resolves part of the regressions observed > > with the change matching bit-inserts of bit-field-refs to vec_perm. > > > > Expansion from a COND_EXPR rather than from compare-and-branch > > regresses gcc.target/i386/pr54855-13.c and gcc.target/i386/pr54855-9.c > > by producing extra moves while the corresponding min/max operations > > are now already synthesized by RTL expansion, register selection > > isn't optimal. This can be also provoked without this change by > > altering the operand order in the source. > > > > It regresses gcc.target/i386/pr110170.c where we end up CSEing the > > condition which makes RTL expansion no longer produce the min/max > > directly and code generation is obfuscated enough to confuse > > RTL if-conversion. > > > > It also regresses gcc.target/i386/ssefp-[12].c where oddly one > > variant isn't if-converted and ix86_expand_fp_movcc doesn't > > match directly (the FP constants get expanded twice). A fix > > could be in emit_conditional_move where both prepare_cmp_insn > > and emit_conditional_move_1 force the constants to (different) > > registers. > > > > Otherwise bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > PR tree-optimization/88540 > > * tree-ssa-phiopt.cc (minmax_replacement): Do not give up > > with NaNs but handle the simple case by if-converting to a > > COND_EXPR. > > One thing which I was thinking about adding to phiopt is having the > last pass do the conversion to COND_EXPR if the target supports a > conditional move for that expression. That should fix this one right? > This was one of things I was working towards with the moving to use > match-and-simplify too. Note the if-conversion has to happen before BB SLP but the last phiopt is too late for this (yes, BB SLP could also be enhanced to handle conditionals and do if-conversion on-the-fly). For BB SLP there's also usually jump threading making a mess of same condition chain of if-convertible ops ... As for the min + max case that regresses due to CSE (gcc.target/i386/pr110170.c) I wonder whether pre-expanding _1 = _2 < _3; _4 = _1 ? _2 : _3; _5 = _1 ? _3 : _2; to something more clever would be appropriate anyway. We could adjust this to either duplicate _1 or expand the COND_EXPRs back to a single CFG diamond. I suppose force-duplicating non-vector compares of COND_EXPRs to make TER work again would fix similar regressions we might already observe (but I'm not aware of many COND_EXPR generators). Richard. > Thanks, > Andrew > > > > > * gcc.target/i386/pr88540.c: New testcase. > > * gcc.target/i386/pr54855-12.c: Adjust. > > * gcc.target/i386/pr54855-13.c: Likewise. > > --- > > gcc/testsuite/gcc.target/i386/pr54855-12.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr54855-13.c | 2 +- > > gcc/testsuite/gcc.target/i386/pr88540.c | 10 ++++++++++ > > gcc/tree-ssa-phiopt.cc | 21 ++++++++++++++++----- > > 4 files changed, 28 insertions(+), 7 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/i386/pr88540.c > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-12.c b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > index 2f8af392c83..09e8ab8ae39 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr54855-12.c > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-12.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-options "-O2 -mavx512fp16" } */ > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ > > /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ > > /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr54855-13.c b/gcc/testsuite/gcc.target/i386/pr54855-13.c > > index 87b4f459a5a..a4f25066f81 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr54855-13.c > > +++ b/gcc/testsuite/gcc.target/i386/pr54855-13.c > > @@ -1,6 +1,6 @@ > > /* { dg-do compile } */ > > /* { dg-options "-O2 -mavx512fp16" } */ > > -/* { dg-final { scan-assembler-times "vmaxsh\[ \\t\]" 1 } } */ > > +/* { dg-final { scan-assembler-times "vm\[ai\]\[nx\]sh\[ \\t\]" 1 } } */ > > /* { dg-final { scan-assembler-not "vcomish\[ \\t\]" } } */ > > /* { dg-final { scan-assembler-not "vmovsh\[ \\t\]" { target { ! ia32 } } } } */ > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr88540.c b/gcc/testsuite/gcc.target/i386/pr88540.c > > new file mode 100644 > > index 00000000000..b927d0c57d5 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/i386/pr88540.c > > @@ -0,0 +1,10 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -msse2" } */ > > + > > +void test(double* __restrict d1, double* __restrict d2, double* __restrict d3) > > +{ > > + for (int n = 0; n < 2; ++n) > > + d3[n] = d1[n] < d2[n] ? d1[n] : d2[n]; > > +} > > + > > +/* { dg-final { scan-assembler "minpd" } } */ > > diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc > > index 467c9fd108a..13ee486831d 100644 > > --- a/gcc/tree-ssa-phiopt.cc > > +++ b/gcc/tree-ssa-phiopt.cc > > @@ -1580,10 +1580,6 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ > > > > tree type = TREE_TYPE (PHI_RESULT (phi)); > > > > - /* The optimization may be unsafe due to NaNs. */ > > - if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) > > - return false; > > - > > gcond *cond = as_a (*gsi_last_bb (cond_bb)); > > enum tree_code cmp = gimple_cond_code (cond); > > tree rhs = gimple_cond_rhs (cond); > > @@ -1770,6 +1766,9 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ > > else > > return false; > > } > > + else if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) > > + /* The optimization may be unsafe due to NaNs. */ > > + return false; > > else if (middle_bb != alt_middle_bb && threeway_p) > > { > > /* Recognize the following case: > > @@ -2103,7 +2102,19 @@ minmax_replacement (basic_block cond_bb, basic_block middle_bb, basic_block alt_ > > /* Emit the statement to compute min/max. */ > > gimple_seq stmts = NULL; > > tree phi_result = PHI_RESULT (phi); > > - result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1); > > + > > + /* When we can't use a MIN/MAX_EXPR still make sure the expression > > + stays in a form to be recognized by ISA that map to IEEE x > y ? x : y > > + semantics (that's not IEEE max semantics). */ > > + if (HONOR_NANS (type) || HONOR_SIGNED_ZEROS (type)) > > + { > > + result = gimple_build (&stmts, cmp, boolean_type_node, > > + gimple_cond_lhs (cond), rhs); > > + result = gimple_build (&stmts, COND_EXPR, TREE_TYPE (phi_result), > > + result, arg_true, arg_false); > > + } > > + else > > + result = gimple_build (&stmts, minmax, TREE_TYPE (phi_result), arg0, arg1); > > > > gsi = gsi_last_bb (cond_bb); > > gsi_insert_seq_before (&gsi, stmts, GSI_NEW_STMT); > > -- > > 2.35.3 > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)