From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 50E92387054C; Fri,  9 Dec 2022 22:45:02 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 50E92387054C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1670625902;
	bh=jLX1IEH9hQr+v3YpMbI/vTrIarQS0IhYdabv3syLKiE=;
	h=From:To:Subject:Date:From;
	b=BvI36J2iKLo4a5c0g2DIb5EI/kbgdsuHPqoN2iUSZPYJ66PCj1V0R5eE1qAMS8zZr
	 R1OsnW2NYPVcc5GmWMZOtj6ATFtxQh7IRJiCRiBWTcKRSDiYuGWu1RhUBIHFLNuU3a
	 VsFkSRplNf21W9QGRGJIiuaRzatwXg3m6Ux2XJng=
From: "law at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/108041] New: ivopts results in extra
 instruction in simple loop
Date: Fri, 09 Dec 2022 22:45:01 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: law at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter cc target_milestone
Message-ID: <bug-108041-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108041

            Bug ID: 108041
           Summary: ivopts results in extra instruction in simple loop
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: law at gcc dot gnu.org
                CC: rzinsly at ventanamicro dot com
  Target Milestone: ---

ivopts seems to make a bit of a mess out of this code resulting in the loop
having an unnecessary instruction.  Compile with rv64 -O2:

typedef struct network
{
  long nr_group, full_groups, max_elems;
} network_t;
void marc_arcs(network_t* net)
{
  while (net->full_groups < 0) {
    net->full_groups =3D net->nr_group + net->full_groups;
    net->max_elems--;
  }
}





After slp1 we have this loop:
;;   basic block 3, loop depth 0
;;    pred:       2
  _1 =3D net_8(D)->nr_group;
  net__max_elems_lsm.4_16 =3D net_8(D)->max_elems;
;;    succ:       4

;;   basic block 4, loop depth 1
;;    pred:       7
;;                3
  # _13 =3D PHI <_2(7), _11(3)>
  # net__max_elems_lsm.4_5 =3D PHI <_4(7), net__max_elems_lsm.4_16(3)>
  _2 =3D _1 + _13;
  _4 =3D net__max_elems_lsm.4_5 + -1;
  if (_2 < 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 5>; [11.00%]
;;    succ:       7
;;                5

;;   basic block 7, loop depth 1
;;    pred:       4
  goto <bb 4>; [100.00%]
;;    succ:       4

;;   basic block 5, loop depth 0
;;    pred:       4
  # _12 =3D PHI <_2(4)>
  # _17 =3D PHI <_4(4)>
  net_8(D)->full_groups =3D _12;
  net_8(D)->max_elems =3D _17;
;;    succ:       6


Of particular interest is the max_elems computation into _4.  We accumulate=
 it
in the loop, then do the final store after the loop (thank you LSM!).  After
ivopts we have:


;;   basic block 3, loop depth 0
;;    pred:       2
  _1 =3D net_8(D)->nr_group;
  net__max_elems_lsm.4_16 =3D net_8(D)->max_elems;
  _22 =3D net__max_elems_lsm.4_16 + -1;
  ivtmp.10_21 =3D (unsigned long) _22;
;;    succ:       4

;;   basic block 4, loop depth 1
;;    pred:       7
;;                3
  # _13 =3D PHI <_2(7), _11(3)>
  # ivtmp.10_3 =3D PHI <ivtmp.10_18(7), ivtmp.10_21(3)>
  _2 =3D _1 + _13;
  _4 =3D (long int) ivtmp.10_3;
  ivtmp.10_18 =3D ivtmp.10_3 - 1;
  if (_2 < 0)
    goto <bb 7>; [89.00%]
  else
    goto <bb 5>; [11.00%]
;;    succ:       7
;;                5

;;   basic block 7, loop depth 1
;;    pred:       4=20
  goto <bb 4>; [100.00%]
;;    succ:       4

;;   basic block 5, loop depth 0
;;    pred:       4
  # _12 =3D PHI <_2(4)>
  # _17 =3D PHI <_4(4)>
  net_8(D)->full_groups =3D _12;
  net_8(D)->max_elems =3D _17;
;;    succ:       6

Note the introduction of the IV and its relationship to _4.  Essentially we
compute both in the loop even _4 is always one greater than the IV.  Worse =
yet,
the IV is only used to compute _4!  And since they differ by 1, we actually
compute both and keep them alive resulting in this final code for rv64:




.L3:
        add     a5,a5,a2
        mv      a3,a4
        addi    a4,a4,-1
        blt     a5,zero,.L3
        sd      a5,8(a0)
        sd      a3,16(a0)


Note how we had to "stash away" the value of a4 before the decrement so tha=
t we
could store it after the loop.  The induction variable doesn't really buy us
anything in this loop -- it's actively harmful.  Not using the IV would
probably be best.  Second best would be to realize that _4 (aka a3) can be
derived from the IV (a4) after the loop by adding 1.=