From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id E73BF3858D1E; Wed, 11 Oct 2023 12:38:18 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E73BF3858D1E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1697027898; bh=SG85ISneU/+fs9VtaWSfz6BNwY3h+ycZZXSaoTdvf4c=; h=From:To:Subject:Date:From; b=WrSp/UA43A1qLEGs8sQwUXeGnfFTGV1YFmSp5WtmLCFEP9+BIOtHFijPDSM9gkG/O sjy1fNJuyt2HAyXkILwbf5CBfdi9Q1xa1zi2zjON8unLxxn/j/qKBVqW4FdwGTlMuO ccb5wpgCFIHJllliLBRryOW0177kfrZaA3TzZN90= From: "tnfchris at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/111770] New: predicated loads inactive lane values not modelled Date: Wed, 11 Oct 2023 12:38:18 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: tnfchris at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status keywords bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D111770 Bug ID: 111770 Summary: predicated loads inactive lane values not modelled Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- For this example: int foo(int n, char *a, char *b) { int sum =3D 0; for (int i =3D 0; i < n; ++i) { sum +=3D a[i] * b[i]; } return sum; } we generate with -O3 -march=3Darmv8-a+sve .L3: ld1b z29.b, p7/z, [x1, x3] ld1b z31.b, p7/z, [x2, x3] add x3, x3, x4 sel z31.b, p7, z31.b, z28.b whilelo p7.b, w3, w0 udot z30.s, z29.b, z31.b b.any .L3 uaddv d30, p6, z30.s fmov w0, s30 ret Which is pretty good, but we completely ruin it with the SEL. In gimple this is: vect__7.12_81 =3D .MASK_LOAD (_21, 8B, loop_mask_77); masked_op1_82 =3D .VCOND_MASK (loop_mask_77, vect__7.12_81, { 0, ... }); vect_patt_33.13_83 =3D DOT_PROD_EXPR ; The missed optimization here is that we don't model what happens with predicated operations that zero inactive lanes. i.e. in this case .MASK_LOAD will zero the unactive lanes, so the .VCOND_MA= SK is completely superfluous. I'm not entirely sure how we should go about fixing this generally.=