From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id E73BF3858D1E; Wed, 11 Oct 2023 12:38:18 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E73BF3858D1E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1697027898;
	bh=SG85ISneU/+fs9VtaWSfz6BNwY3h+ycZZXSaoTdvf4c=;
	h=From:To:Subject:Date:From;
	b=WrSp/UA43A1qLEGs8sQwUXeGnfFTGV1YFmSp5WtmLCFEP9+BIOtHFijPDSM9gkG/O
	 sjy1fNJuyt2HAyXkILwbf5CBfdi9Q1xa1zi2zjON8unLxxn/j/qKBVqW4FdwGTlMuO
	 ccb5wpgCFIHJllliLBRryOW0177kfrZaA3TzZN90=
From: "tnfchris at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/111770] New: predicated loads inactive lane
 values not modelled
Date: Wed, 11 Oct 2023 12:38:18 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: tnfchris at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 keywords bug_severity priority component assigned_to reporter
 target_milestone
Message-ID: <bug-111770-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111770

            Bug ID: 111770
           Summary: predicated loads inactive lane values not modelled
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

For this example:

int foo(int n, char *a, char *b) {
  int sum =3D 0;
  for (int i =3D 0; i < n; ++i) {
    sum +=3D a[i] * b[i];
  }
  return sum;
}

we generate with -O3 -march=3Darmv8-a+sve

.L3:
        ld1b    z29.b, p7/z, [x1, x3]
        ld1b    z31.b, p7/z, [x2, x3]
        add     x3, x3, x4
        sel     z31.b, p7, z31.b, z28.b
        whilelo p7.b, w3, w0
        udot    z30.s, z29.b, z31.b
        b.any   .L3
        uaddv   d30, p6, z30.s
        fmov    w0, s30
        ret

Which is pretty good, but we completely ruin it with the SEL.

In gimple this is:

  vect__7.12_81 =3D .MASK_LOAD (_21, 8B, loop_mask_77);
  masked_op1_82 =3D .VCOND_MASK (loop_mask_77, vect__7.12_81, { 0, ... });
  vect_patt_33.13_83 =3D DOT_PROD_EXPR <vect__3.9_78, masked_op1_82,
vect_sum_19.6_74>;

The missed optimization here is that we don't model what happens with
predicated operations that zero inactive lanes.

i.e. in this case .MASK_LOAD will zero the unactive lanes, so the .VCOND_MA=
SK
is  completely superfluous.

I'm not entirely sure how we should go about fixing this generally.=