From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 714E03857357; Tue, 17 May 2022 06:48:34 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 714E03857357
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/105617] [12/13 Regression] Slp is maybe too aggressive
 in some/many cases
Date: Tue, 17 May 2022 06:48:34 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.1.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.2
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cf_reconfirmed_on everconfirmed bug_status
Message-ID: <bug-105617-4-euw0qox3rN@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-105617-4@http.gcc.gnu.org/bugzilla/>
References: <bug-105617-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 17 May 2022 06:48:34 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D105617

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-05-17
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #9)
> (In reply to Hongtao.liu from comment #8)
> > (In reply to Hongtao.liu from comment #7)
> > > Hmm, we have specific code to add scalar->vector(vmovq) cost to vector
> > > construct, but it seems not to work here, guess it's because &r0,and =
thought
> > > it was load not scalar=EF=BC=9F=20
> > Yes, true for as gimple_assign_load_p
> >=20
> >=20
> > (gdb) p debug_gimple_stmt (def)
> > 72# VUSE <.MEM_46>
> > 73r0.0_20 =3D r0;
> It's a load from stack, and finally eliminated in rtl dse1, but here the
> vectorizer doesn't know.

Yes, it's difficult for the SLP vectorizer to guess whether rN will come
from memory or not.  Some friendlier middle-end representation for
add-with-carry might be nice - the x86 backend could for example fold
__builtin_ia32_addcarryx_u64 to use a _Complex unsinged long long for the
return, ferrying the carry in __imag.  Alternatively we could devise
some special GIMPLE_ASM kind ferrying RTL and not assembly so the
backend could fold it directly to RTL on GIMPLE with asm constraints
doing the plumbing ... (we'd need some match-scratch and RTL expansion
would still need to allocate the actual pseudos).

  <bb 2> [local count: 1073741824]:
  _1 =3D *srcB_17(D);
  _2 =3D *srcA_18(D);
  _30 =3D __builtin_ia32_addcarryx_u64 (0, _2, _1, &r0);
  _3 =3D MEM[(const uint64_t *)srcB_17(D) + 8B];
  _4 =3D MEM[(const uint64_t *)srcA_18(D) + 8B];
  _5 =3D (int) _30;
  _29 =3D __builtin_ia32_addcarryx_u64 (_5, _4, _3, &r1);
  _6 =3D MEM[(const uint64_t *)srcB_17(D) + 16B];
  _7 =3D MEM[(const uint64_t *)srcA_18(D) + 16B];
  _8 =3D (int) _29;
  _28 =3D __builtin_ia32_addcarryx_u64 (_8, _7, _6, &r2);
  _9 =3D MEM[(const uint64_t *)srcB_17(D) + 24B];
  _10 =3D MEM[(const uint64_t *)srcA_18(D) + 24B];
  _11 =3D (int) _28;
  __builtin_ia32_addcarryx_u64 (_11, _10, _9, &r3);
  r0.0_12 =3D r0;
  r1.1_13 =3D r1;
  _36 =3D {r0.0_12, r1.1_13};
  r2.2_14 =3D r2;
  r3.3_15 =3D r3;
  _37 =3D {r2.2_14, r3.3_15};
  vectp.9_35 =3D dst_19(D);
  MEM <vector(2) long unsigned int> [(uint64_t *)vectp.9_35] =3D _36;
  vectp.9_39 =3D vectp.9_35 + 16;
  MEM <vector(2) long unsigned int> [(uint64_t *)vectp.9_39] =3D _37;

so for the situation at hand I don't see any reasonable way out that
doesn't have the chance of regressing things in other places (like
treat loads from non-indexed auto variables specially or so).  The
only real solution is to find a GIMPLE representation for
__builtin_ia32_addcarryx_u64 that doesn't force the alternate output
to memory.=