From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 54F8B3858CDB; Tue, 24 Jan 2023 15:35:14 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 54F8B3858CDB
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1674574514;
	bh=FAMbFCsz5AYsROGulzad6Qy1FvattEoZLzqeZ4wfRsA=;
	h=From:To:Subject:Date:In-Reply-To:References:From;
	b=La6m3LzzMLpdtCwhnQ1PuPYbVGiH08nj1Ao8gYrdpEMM/feSg1n6sL4TxqOWuKbUA
	 S1y4eJf37b/Fj9OjJmp4YsiAzuiWhAVavtSRva770jUkOSpmhN4akEbGk6tZpzUGjW
	 4q6gFyWZKbzukDwEk36WBXoLxjkLuf0pMqSQrbWY=
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/108498] [11/12/13 Regression] ppc64 big
 endian generates uninitialized reads with -fstore-merging
Date: Tue, 24 Jan 2023 15:35:13 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: wrong-code
X-Bugzilla-Severity: normal
X-Bugzilla-Who: jakub at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 11.4
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: cc
Message-ID: <bug-108498-4-KXQPLbO5TV@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-108498-4@http.gcc.gnu.org/bugzilla/>
References: <bug-108498-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108498

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ebotcazou at gcc dot gnu.o=
rg
--- Comment #19 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, the problem is the separate string_concatenate mode introduced in that
r11-1810-ge362a897655e3b92949b65a2b53e00fb3ab8ded0 commit.
This mode is turned on whenever at least one of the stores is using a
STRING_CST and is successfully merged with the other stores.
Now, this mode assumes that the start of the store is on a byte boundary and
that the store ends on a byte boundary too, it simply creates a character a=
rray
with try_size / BITS_PER_UNIT and ignores any masking.
It is true that merged_store_group::can_be_merged_into does various checks,=
 if
string_concatenation mode is on, it will reject merging with MEM_REF, etc.,
plus
  /* STRING_CST is compatible with INTEGER_CST if no BIT_INSERT_EXPR.  */
  if (info->rhs_code =3D=3D STRING_CST
      && stores[0]->rhs_code =3D=3D INTEGER_CST
      && stores[0]->bitsize =3D=3D CHAR_BIT)
    return !bit_insertion;

  if (stores[0]->rhs_code =3D=3D STRING_CST
      && info->rhs_code =3D=3D INTEGER_CST
      && info->bitsize =3D=3D CHAR_BIT)
    return !bit_insertion;
and also merged_store_group::do_merge turns this mode off if:
  /* But we cannot use it if we don't have consecutive stores.  */
  if (!consecutive)
    string_concatenation =3D false;
and on if:
  /* We want to use concatenation if there is any string.  */
  if (info->rhs_code =3D=3D STRING_CST)
    {
      string_concatenation =3D true;
      gcc_assert (!bit_insertion);
    }
Now, if the goal is to only do the string_concatenation mode if it is some =
set
single character stores and at least one or more STRING_CST stores, the abo=
ve
doesn't achieve that by far.
The first cited hunk above from can_be_merged_into allows it if the first s=
tore
is char sized integral store and some later is STRING_CST.  It doesn't chec=
k if
that
first store is byte aligned, and doesn't check all the other stores in betw=
een.
All do_merge then does is if the stores aren't consecutive the
string_concatenation mode is turned off.  So e.g. we reject merging if ther=
e is
2 byte INTEGER_CST store followed by consecutive STRING_CST, but allow 1, 2=
 and
4 byte INTEGER_CST stores followed by STRING_CST (all consecutive), or say 1
byte, then 3 bits, then STRING_CST etc.
The second one is more strict, when the first store is STRING_CST, we punt =
on
any following INTEGER_CST store even when it is consecutive, if it isn't 1 =
byte
store.

Consider e.g. on x86_64-linux with -O2:
struct U { char c[16]; };
struct V { char c[16]; };
struct S { unsigned int a : 3, b : 8, c : 21; struct U d; unsigned int e;
struct V f; unsigned int g : 5, h : 27; };
struct T { unsigned int a : 16, b : 8, c : 8; struct U d; unsigned int e;
struct V f; unsigned int g : 5, h : 27; };

__attribute__((noipa)) void
foo (struct S *p)
{
  p->b =3D 231;
  p->c =3D 42;
  p->d =3D (struct U) { "abcdefghijklmno" };
  p->e =3D 0xdeadbeef;
  p->f =3D (struct V) { "ABCDEFGHIJKLMNO" };
}

__attribute__((noipa)) void
bar (struct S *p)
{
  p->b =3D 231;
  p->c =3D 42;
  p->d =3D (struct U) { "abcdefghijklmno" };
  p->e =3D 0xdeadbeef;
  p->f =3D (struct V) { "ABCDEFGHIJKLMNO" };
  p->g =3D 12;
}

__attribute__((noipa)) void
baz (struct T *p)
{
  p->c =3D 42;
  p->d =3D (struct U) { "abcdefghijklmno" };
  p->e =3D 0xdeadbeef;
  p->f =3D (struct V) { "ABCDEFGHIJKLMNO" };
  p->g =3D 12;
}

int
main ()
{
  struct S s =3D {};
  struct T t =3D {};
  foo (&s);
  if (s.a !=3D 0 || s.b !=3D 231 || s.c !=3D 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 0 || =
s.h !=3D
0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  s.a =3D 7;
  s.g =3D 31;
  s.h =3D (1U << 27) - 1;
  foo (&s);
  if (s.a !=3D 7 || s.b !=3D 231 || s.c !=3D 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 31 ||=
 s.h
!=3D (1U << 27) - 1)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  bar (&s);
  if (s.a !=3D 0 || s.b !=3D 231 || s.c !=3D 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 12 ||=
 s.h
!=3D 0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  s.a =3D 7;
  s.g =3D 31;
  s.h =3D (1U << 27) - 1;
  bar (&s);
  if (s.a !=3D 7 || s.b !=3D 231 || s.c !=3D 42
      || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 12 ||=
 s.h
!=3D (1U << 27) - 1)
    __builtin_abort ();
  baz (&t);
  if (t.a !=3D 0 || t.b !=3D 0 || t.c !=3D 42
      || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g !=3D 12 ||=
 t.h
!=3D 0)
    __builtin_abort ();
  __builtin_memset (&s, 0, sizeof (s));
  t.a =3D 7;
  t.b =3D 255;
  t.g =3D 31;
  t.h =3D (1U << 27) - 1;
  baz (&t);
  if (t.a !=3D 7 || t.b !=3D 255 || t.c !=3D 42
      || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e !=3D 0xdea=
dbeef
      || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g !=3D 12 ||=
 t.h
!=3D (1U << 27) - 1)
    __builtin_abort ();
  return 0;
}

The 3 functions show the various issues, non-byte boundary start, non-byte
boundary start & end and non-byte boundary end.

We have lots of options to fix this, the question is what results in best c=
ode
generation.  One is to only do what has been probably originally meant, i.e.
only
allow merging of STRING_CSTs with byte aligned byte sized INTEGER_CST stores
(of course consecutive).  Unless we'd want to risk O(n^2) complexity, we mi=
ght
need a flag that is set in a group with non-byte sized or non-byte aligned
INTEGER_CST stores and if that flag is set, don't allow merging it with a l=
ater
STRING_CST.

Another option is keep the current (admittedly weird) behavior but say in
merged_store_group::apply_stores clear string_merging flag if start or width
aren't divisible by BITS_PER_UNIT.  After all, we already clear it also for
buf_size <=3D MOVE_MAX.

Or some combination of the above, I mean, e.g. in the original testcase as =
well
as
the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108498#c18 one the only
STRING_CST in there is small and even power of two sized (and aligned the
same); treating such 4 byte
store differently from INTEGER_CST store in how store merging behaves in th=
at
case is weird.  Of course, another example could be fairly long STRING_CST,
especially with
length not multiple of word size.  We don't really try to estimate that the
STRING_CST
store is actually more expensive than a simple scalar store, so if we decid=
e to
merge
such STRING_CST store with other stores in a group, it could be that we just
punt on it
because the multiple scalar stores look more expensive than the original st=
ores
which have one or more expensive STRING_CST stores.  And if we punt, it mig=
ht
be the case that
if we didn't merge the STRING_CST store in at first, say the group before t=
he
STRING_CST or after it or both could each store merge successfully and
beneficially.

That said, I think especially for backports to 11/12 and perhaps even for 13
the easiest will be the second option, clear string_merging on non-aligned
start/width.

Eric, your thoughts on this?=