From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 2DBC83858D39; Wed, 28 Jun 2023 12:53:11 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2DBC83858D39
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1687956791;
	bh=euM+cQ0zc9ttdPXNhSg9Hp6nG7vGX4aNTp3rGLviScI=;
	h=From:To:Subject:Date:From;
	b=khsn6iq9zIFrr7BN7BEt73O2mNhf7Ww84M52CQ2tcwSCSBOHAdHBX6MW6DfzfV+t9
	 Eov3D8X2QRVeh3LG2fzHCKPusqKKIARs0HEf1eTSlhfVifpNJOehNQgnzdQMl4JcS/
	 +6YgA3Radp3F7dm/bhPP6va00zTiLXE9Z+JqkL9g=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/110456] New: vectorization with loop masking prone to
 STLF issues
Date: Wed, 28 Jun 2023 12:53:10 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 14.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-110456-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110456

            Bug ID: 110456
           Summary: vectorization with loop masking prone to STLF issues
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

void __attribute__((noipa))
test (double * __restrict a, double *b, int n, int m)
{
  for (int j =3D 0; j < m; ++j)
    for (int i =3D 0; i < n; ++i)
      a[i + j*n] =3D a[i + j*n /* + 512 */] + b[i + j*n];
}

double a[1024];
double b[1024];=20

int main(int argc, char **argv)
{
  int m =3D atoi (argv[1]);
  for (long i =3D 0; i < 1000000000; ++i)
    test (a + 4, b + 4, 4, m);
}


Shows that when we apply loop masking with --param vect-partial-vector-usage
then masked stores will generally prohibit store-to-load forwarding,
especially when there's only a partial overlap with a following load like
when traversing a multi-dimensional array as above.  The above runs
noticable slower compared to when the loads are offset
(uncomment the /* + 512 */).

The situation is difficult to avoid in general but there might be easy
heuristics that could be implemented like avoiding loop masking when
there's a read-modify-write operation to the same memory location in
a loop (with or without an immediately visible outer loop).  For
unknown dependences and thus runtime disambiguation a proper distance
of any read/write operation could be ensured as well.=