From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 373CE3858C00; Fri, 15 Sep 2023 06:42:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 373CE3858C00
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1694760154;
	bh=0DQ4iekPg63xEnB3stMqa7mkLRHTNQZWGoeWrr98XaE=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=HQ/NHeBr8L8HpHJvetfSiLtBzPLYpplVNhpE4h7D8n7gdWhKrsolo928UAE3t4oO4
	 vST5FH5yYX9Yj/5v+RBj2eNAia+ougLBgYdk9DsJxLVc7GalgnKnN1sGje4tQEFX+g
	 ljmzSQishk+Toy+d11tzuMtV0a0d1PDRDhOk/H9k=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug middle-end/111401] Middle-end: Missed optimization of
 MASK_LEN_FOLD_LEFT_PLUS
Date: Fri, 15 Sep 2023 06:42:33 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: middle-end
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-111401-4-vrdoYpLhm7@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-111401-4@http.gcc.gnu.org/bugzilla/>
References: <bug-111401-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111401
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #6)
> Created attachment 55902 [details]
> Tentative
>=20
> You're referring to the case where we have init =3D -0.0, the condition is
> false and we end up wrongly doing -0.0 + 0.0 =3D 0.0?
> I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when
> honoring signed zeros?  And 0.0 for MINUS?  Doesn't that also depend on t=
he
> rounding mode?

Yes, if the rounding mode isn't known there isn't a working neutral element.

> neutral_op_for_reduction could return a -0 for PLUS if we honor it for th=
at
> type.  Or is that too intrusive?

I suppose that could work, but we need to check that we're not using this
for the initial value.

> Guess I should add a test case for that as well.
>=20
> Another thing is that swapping operands is not as easy with COND_ADD beca=
use
> the addition would be in the else.  I'd punt for that case for now.
>=20
> Next problem - might be a mistake on my side.  For avx512 we create a
> COND_ADD but the respective MASK_FOLD_LEFT_PLUS is not available, causing=
 us
> to create numerous vec_extracts as fallback that increase the cost until =
we
> don't vectorize anymore.

Yeah, but then a fold-left reduction wasn't necessary in the first place?
We should avoid that (it's slow even when the target supports it) when
possible.

> Therefore I added a
> vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)).
> SLP paths and ncopies !=3D 1 are excluded as well.  Not really happy with=
 how
> the patch looks now but at least the testsuites on aarch and x86 pass.=