From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 54F8B3858CDB; Tue, 24 Jan 2023 15:35:14 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 54F8B3858CDB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674574514; bh=FAMbFCsz5AYsROGulzad6Qy1FvattEoZLzqeZ4wfRsA=; h=From:To:Subject:Date:In-Reply-To:References:From; b=La6m3LzzMLpdtCwhnQ1PuPYbVGiH08nj1Ao8gYrdpEMM/feSg1n6sL4TxqOWuKbUA S1y4eJf37b/Fj9OjJmp4YsiAzuiWhAVavtSRva770jUkOSpmhN4akEbGk6tZpzUGjW 4q6gFyWZKbzukDwEk36WBXoLxjkLuf0pMqSQrbWY= From: "jakub at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/108498] [11/12/13 Regression] ppc64 big endian generates uninitialized reads with -fstore-merging Date: Tue, 24 Jan 2023 15:35:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: jakub at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: 11.4 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108498 Jakub Jelinek changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ebotcazou at gcc dot gnu.o= rg --- Comment #19 from Jakub Jelinek --- So, the problem is the separate string_concatenate mode introduced in that r11-1810-ge362a897655e3b92949b65a2b53e00fb3ab8ded0 commit. This mode is turned on whenever at least one of the stores is using a STRING_CST and is successfully merged with the other stores. Now, this mode assumes that the start of the store is on a byte boundary and that the store ends on a byte boundary too, it simply creates a character a= rray with try_size / BITS_PER_UNIT and ignores any masking. It is true that merged_store_group::can_be_merged_into does various checks,= if string_concatenation mode is on, it will reject merging with MEM_REF, etc., plus /* STRING_CST is compatible with INTEGER_CST if no BIT_INSERT_EXPR. */ if (info->rhs_code =3D=3D STRING_CST && stores[0]->rhs_code =3D=3D INTEGER_CST && stores[0]->bitsize =3D=3D CHAR_BIT) return !bit_insertion; if (stores[0]->rhs_code =3D=3D STRING_CST && info->rhs_code =3D=3D INTEGER_CST && info->bitsize =3D=3D CHAR_BIT) return !bit_insertion; and also merged_store_group::do_merge turns this mode off if: /* But we cannot use it if we don't have consecutive stores. */ if (!consecutive) string_concatenation =3D false; and on if: /* We want to use concatenation if there is any string. */ if (info->rhs_code =3D=3D STRING_CST) { string_concatenation =3D true; gcc_assert (!bit_insertion); } Now, if the goal is to only do the string_concatenation mode if it is some = set single character stores and at least one or more STRING_CST stores, the abo= ve doesn't achieve that by far. The first cited hunk above from can_be_merged_into allows it if the first s= tore is char sized integral store and some later is STRING_CST. It doesn't chec= k if that first store is byte aligned, and doesn't check all the other stores in betw= een. All do_merge then does is if the stores aren't consecutive the string_concatenation mode is turned off. So e.g. we reject merging if ther= e is 2 byte INTEGER_CST store followed by consecutive STRING_CST, but allow 1, 2= and 4 byte INTEGER_CST stores followed by STRING_CST (all consecutive), or say 1 byte, then 3 bits, then STRING_CST etc. The second one is more strict, when the first store is STRING_CST, we punt = on any following INTEGER_CST store even when it is consecutive, if it isn't 1 = byte store. Consider e.g. on x86_64-linux with -O2: struct U { char c[16]; }; struct V { char c[16]; }; struct S { unsigned int a : 3, b : 8, c : 21; struct U d; unsigned int e; struct V f; unsigned int g : 5, h : 27; }; struct T { unsigned int a : 16, b : 8, c : 8; struct U d; unsigned int e; struct V f; unsigned int g : 5, h : 27; }; __attribute__((noipa)) void foo (struct S *p) { p->b =3D 231; p->c =3D 42; p->d =3D (struct U) { "abcdefghijklmno" }; p->e =3D 0xdeadbeef; p->f =3D (struct V) { "ABCDEFGHIJKLMNO" }; } __attribute__((noipa)) void bar (struct S *p) { p->b =3D 231; p->c =3D 42; p->d =3D (struct U) { "abcdefghijklmno" }; p->e =3D 0xdeadbeef; p->f =3D (struct V) { "ABCDEFGHIJKLMNO" }; p->g =3D 12; } __attribute__((noipa)) void baz (struct T *p) { p->c =3D 42; p->d =3D (struct U) { "abcdefghijklmno" }; p->e =3D 0xdeadbeef; p->f =3D (struct V) { "ABCDEFGHIJKLMNO" }; p->g =3D 12; } int main () { struct S s =3D {}; struct T t =3D {}; foo (&s); if (s.a !=3D 0 || s.b !=3D 231 || s.c !=3D 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea= dbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 0 || = s.h !=3D 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); s.a =3D 7; s.g =3D 31; s.h =3D (1U << 27) - 1; foo (&s); if (s.a !=3D 7 || s.b !=3D 231 || s.c !=3D 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea= dbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 31 ||= s.h !=3D (1U << 27) - 1) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); bar (&s); if (s.a !=3D 0 || s.b !=3D 231 || s.c !=3D 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea= dbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 12 ||= s.h !=3D 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); s.a =3D 7; s.g =3D 31; s.h =3D (1U << 27) - 1; bar (&s); if (s.a !=3D 7 || s.b !=3D 231 || s.c !=3D 42 || __builtin_memcmp (&s.d.c, "abcdefghijklmno", 16) || s.e !=3D 0xdea= dbeef || __builtin_memcmp (&s.f.c, "ABCDEFGHIJKLMNO", 16) || s.g !=3D 12 ||= s.h !=3D (1U << 27) - 1) __builtin_abort (); baz (&t); if (t.a !=3D 0 || t.b !=3D 0 || t.c !=3D 42 || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e !=3D 0xdea= dbeef || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g !=3D 12 ||= t.h !=3D 0) __builtin_abort (); __builtin_memset (&s, 0, sizeof (s)); t.a =3D 7; t.b =3D 255; t.g =3D 31; t.h =3D (1U << 27) - 1; baz (&t); if (t.a !=3D 7 || t.b !=3D 255 || t.c !=3D 42 || __builtin_memcmp (&t.d.c, "abcdefghijklmno", 16) || t.e !=3D 0xdea= dbeef || __builtin_memcmp (&t.f.c, "ABCDEFGHIJKLMNO", 16) || t.g !=3D 12 ||= t.h !=3D (1U << 27) - 1) __builtin_abort (); return 0; } The 3 functions show the various issues, non-byte boundary start, non-byte boundary start & end and non-byte boundary end. We have lots of options to fix this, the question is what results in best c= ode generation. One is to only do what has been probably originally meant, i.e. only allow merging of STRING_CSTs with byte aligned byte sized INTEGER_CST stores (of course consecutive). Unless we'd want to risk O(n^2) complexity, we mi= ght need a flag that is set in a group with non-byte sized or non-byte aligned INTEGER_CST stores and if that flag is set, don't allow merging it with a l= ater STRING_CST. Another option is keep the current (admittedly weird) behavior but say in merged_store_group::apply_stores clear string_merging flag if start or width aren't divisible by BITS_PER_UNIT. After all, we already clear it also for buf_size <=3D MOVE_MAX. Or some combination of the above, I mean, e.g. in the original testcase as = well as the https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108498#c18 one the only STRING_CST in there is small and even power of two sized (and aligned the same); treating such 4 byte store differently from INTEGER_CST store in how store merging behaves in th= at case is weird. Of course, another example could be fairly long STRING_CST, especially with length not multiple of word size. We don't really try to estimate that the STRING_CST store is actually more expensive than a simple scalar store, so if we decid= e to merge such STRING_CST store with other stores in a group, it could be that we just punt on it because the multiple scalar stores look more expensive than the original st= ores which have one or more expensive STRING_CST stores. And if we punt, it mig= ht be the case that if we didn't merge the STRING_CST store in at first, say the group before t= he STRING_CST or after it or both could each store merge successfully and beneficially. That said, I think especially for backports to 11/12 and perhaps even for 13 the easiest will be the second option, clear string_merging on non-aligned start/width. Eric, your thoughts on this?=