* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores [not found] <20230621074951.F3C3C3858433@sourceware.org> @ 2023-06-21 15:29 ` Jeff Law 2023-06-22 6:39 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Jeff Law @ 2023-06-21 15:29 UTC (permalink / raw) To: Richard Biener, gcc-patches On 6/21/23 01:49, Richard Biener via Gcc-patches wrote: > The following addresses a miscompilation by RTL scheduling related > to the representation of masked stores. For that we have > > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > (vec_merge:V16SI (reg:V16SI 20 xmm0 [118]) > (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > > and specifically the memory attributes > > [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32] > > are problematic. They tell us the instruction stores and reads a full > vector which it if course does not. There isn't any good MEM_EXPR > we can use here (we lack a way to just specify a pointer and restrict > info for example), and since the MEMs have a vector mode it's > difficult in general as passes do not need to look at the memory > attributes at all. > > The easiest way to avoid running into the alias analysis problem is > to scrap the MEM_EXPR when we expand the internal functions for > partial loads/stores. That avoids the disambiguation we run into > which is realizing that we store to an object of less size as > the size of the mode we appear to store. > > After the patch we see just > > [1 S64 A32] > > so we preserve the alias set, the alignment and the size (the size > is redundant if the MEM insn't BLKmode). That's still not good > in case the RTL alias oracle would implement the same > disambiguation but it fends off the gimple one. > > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > and --param=vect-partial-vector-usage=1. > > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL > side we might instead want to use a BLKmode MEM? Any better > ideas here? I'd expect that using BLKmode will fend off the RTL aliasing code. jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-06-21 15:29 ` [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores Jeff Law @ 2023-06-22 6:39 ` Richard Biener 2023-06-24 14:32 ` Jeff Law 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2023-06-22 6:39 UTC (permalink / raw) To: Jeff Law; +Cc: gcc-patches On Wed, 21 Jun 2023, Jeff Law wrote: > > > On 6/21/23 01:49, Richard Biener via Gcc-patches wrote: > > The following addresses a miscompilation by RTL scheduling related > > to the representation of masked stores. For that we have > > > > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] > > [90]) > > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] > > <var_decl 0x7ffff6e28d80 b>) > > (const_int -4 [0xfffffffffffffffc])))) [1 MEM > > <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > > (vec_merge:V16SI (reg:V16SI 20 xmm0 [118]) > > (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] > > <var_decl 0x7ffff6e28d80 b>) > > (const_int -4 [0xfffffffffffffffc])))) [1 MEM > > <vector(16) int> [(int *)vectp_b.12_28]+0 S64 > > A32]) > > > > and specifically the memory attributes > > > > [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32] > > > > are problematic. They tell us the instruction stores and reads a full > > vector which it if course does not. There isn't any good MEM_EXPR > > we can use here (we lack a way to just specify a pointer and restrict > > info for example), and since the MEMs have a vector mode it's > > difficult in general as passes do not need to look at the memory > > attributes at all. > > > > The easiest way to avoid running into the alias analysis problem is > > to scrap the MEM_EXPR when we expand the internal functions for > > partial loads/stores. That avoids the disambiguation we run into > > which is realizing that we store to an object of less size as > > the size of the mode we appear to store. > > > > After the patch we see just > > > > [1 S64 A32] > > > > so we preserve the alias set, the alignment and the size (the size > > is redundant if the MEM insn't BLKmode). That's still not good > > in case the RTL alias oracle would implement the same > > disambiguation but it fends off the gimple one. > > > > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > > and --param=vect-partial-vector-usage=1. > > > > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL > > side we might instead want to use a BLKmode MEM? Any better > > ideas here? > I'd expect that using BLKmode will fend off the RTL aliasing code. I suspect there's no way to specify the desired semantics? OTOH code that looks at the MEM operand only and not the insn (which should have some UNSPEC wrapped) needs to be conservative, so maybe the alias code shouldn't assume that a (mem:V16SI ..) actually performs an access of the size of V16SI at the specified location? Anyway, any opinion on the actual patch? It's enough to fix the observed miscompilation. Thanks, Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-06-22 6:39 ` Richard Biener @ 2023-06-24 14:32 ` Jeff Law 0 siblings, 0 replies; 15+ messages in thread From: Jeff Law @ 2023-06-24 14:32 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On 6/22/23 00:39, Richard Biener wrote: > > I suspect there's no way to specify the desired semantics? OTOH > code that looks at the MEM operand only and not the insn (which > should have some UNSPEC wrapped) needs to be conservative, so maybe > the alias code shouldn't assume that a (mem:V16SI ..) actually > performs an access of the size of V16SI at the specified location? I'm not aware of a way to express the semantics fully right now. We'd need some way to indicate that the MEM is a partial and pass along the actual length. We could do both through MEM_ATTRS with some work. For example we could declare that for vector modes full semantic information is carried in the MEM_ATTRS rather than by the mode itself. So it falls into a space between how we currently think of something like V16SI and BLK. The mode specifies a maximum size and how to interpret the elements. But actual size and perhaps mask info would be found in MEM_ATTRS. jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20230621075019.7CA813858033@sourceware.org>]
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores [not found] <20230621075019.7CA813858033@sourceware.org> @ 2023-11-27 12:39 ` Robin Dapp 2023-11-27 15:45 ` Jeff Law 0 siblings, 1 reply; 15+ messages in thread From: Robin Dapp @ 2023-11-27 12:39 UTC (permalink / raw) To: Richard Biener, gcc-patches; +Cc: rdapp.gcc, Li, Pan2 > The easiest way to avoid running into the alias analysis problem is > to scrap the MEM_EXPR when we expand the internal functions for > partial loads/stores. That avoids the disambiguation we run into > which is realizing that we store to an object of less size as > the size of the mode we appear to store. > > After the patch we see just > > [1 S64 A32] > > so we preserve the alias set, the alignment and the size (the size > is redundant if the MEM insn't BLKmode). That's still not good > in case the RTL alias oracle would implement the same > disambiguation but it fends off the gimple one. > > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > and --param=vect-partial-vector-usage=1. On riscv we're seeing a similar problem across the testsuite and several execution failures as a result. In the case I looked at we move a scalar load upwards over a partial store that aliases the load. I independently arrived at the spot mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 before knowing about the PR. I can confirm that your RFC patch fixes at least two of the failures, I haven't checked the others but very likely they are similar. Regards Robin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-27 12:39 ` Robin Dapp @ 2023-11-27 15:45 ` Jeff Law 2023-11-28 7:50 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Jeff Law @ 2023-11-27 15:45 UTC (permalink / raw) To: Robin Dapp, Richard Biener, gcc-patches; +Cc: Li, Pan2 On 11/27/23 05:39, Robin Dapp wrote: >> The easiest way to avoid running into the alias analysis problem is >> to scrap the MEM_EXPR when we expand the internal functions for >> partial loads/stores. That avoids the disambiguation we run into >> which is realizing that we store to an object of less size as >> the size of the mode we appear to store. >> >> After the patch we see just >> >> [1 S64 A32] >> >> so we preserve the alias set, the alignment and the size (the size >> is redundant if the MEM insn't BLKmode). That's still not good >> in case the RTL alias oracle would implement the same >> disambiguation but it fends off the gimple one. >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 >> and --param=vect-partial-vector-usage=1. > > On riscv we're seeing a similar problem across the testsuite > and several execution failures as a result. In the case I > looked at we move a scalar load upwards over a partial store > that aliases the load. > > I independently arrived at the spot mentioned in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 > before knowing about the PR. > > I can confirm that your RFC patch fixes at least two of the > failures, I haven't checked the others but very likely > they are similar. FWIW, it should always be safe to ignore the memory attributes. So if there's a reasonable condition here, then we can use it and just ignore the attribute. Does the attribute on a partial load/store indicate the potential load/store size or does it indicate the actual known load/store size. If the former, then we probably need to treat it as a may-read/may-write kind of reference. Jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-27 15:45 ` Jeff Law @ 2023-11-28 7:50 ` Richard Biener 2023-11-28 10:31 ` Richard Sandiford 2023-11-28 15:00 ` Jeff Law 0 siblings, 2 replies; 15+ messages in thread From: Richard Biener @ 2023-11-28 7:50 UTC (permalink / raw) To: Jeff Law; +Cc: Robin Dapp, gcc-patches, Li, Pan2 On Mon, 27 Nov 2023, Jeff Law wrote: > > > On 11/27/23 05:39, Robin Dapp wrote: > >> The easiest way to avoid running into the alias analysis problem is > >> to scrap the MEM_EXPR when we expand the internal functions for > >> partial loads/stores. That avoids the disambiguation we run into > >> which is realizing that we store to an object of less size as > >> the size of the mode we appear to store. > >> > >> After the patch we see just > >> > >> [1 S64 A32] > >> > >> so we preserve the alias set, the alignment and the size (the size > >> is redundant if the MEM insn't BLKmode). That's still not good > >> in case the RTL alias oracle would implement the same > >> disambiguation but it fends off the gimple one. > >> > >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > >> and --param=vect-partial-vector-usage=1. > > > > On riscv we're seeing a similar problem across the testsuite > > and several execution failures as a result. In the case I > > looked at we move a scalar load upwards over a partial store > > that aliases the load. > > > > I independently arrived at the spot mentioned in > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 > > before knowing about the PR. > > > > I can confirm that your RFC patch fixes at least two of the > > failures, I haven't checked the others but very likely > > they are similar. > FWIW, it should always be safe to ignore the memory attributes. So if > there's a reasonable condition here, then we can use it and just ignore the > attribute. > > Does the attribute on a partial load/store indicate the potential load/store > size or does it indicate the actual known load/store size. If the former, then > we probably need to treat it as a may-read/may-write kind of reference. There's no way to distinguish a partial vs. non-partial MEM on RTL and while without the bogus MEM_ATTR the alias oracle pieces that miscompiled the original case are fended off we still see the load/store as full given they have a mode with a size - that for example means that DSE can elide a previous store to a masked part. Eventually that's fended off by using an UNSPEC, but whether the RTL IL has the correct semantics is questionable. That said, I did propose scrapping the MEM_EXPR which I think is the correct thing to do unless we want to put a CALL_EXPR into it (nothing would use that at the moment) or re-do MEM_EXPR and instead have an ao_ref (or sth slightly more complete) instead of the current MEM_ATTRs - but that would be a lot of work. This leaves the question wrt. semantics of for example x86 mask_store: (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) (unspec:V4DF [ (reg:V4DI 104 [ mask__16.8 ]) (reg:V4DF 105 [ vect_cst__42 ]) (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} (nil)) it uses a read-modify-write which makes it safe for DSE. mask_load looks like (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) (unspec:V4DF [ (reg:V4DI 114 [ mask__8.8 ]) (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} (nil)) both have (as operand of the UNSPEC) a MEM with V4DFmode (and a MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads are _not_ partial. That means the disambiguation against a store to an object that's smaller than V4DF is still possible. Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look at the mode. As discussed using a BLKmode MEM _might_ be a way out but I didn't try what will happen then (patterns would need to be adjusted I guess). That said, I'm happy to commit the partial fix, scrapping the bogus MEM_EXPRs. OK for that? Thanks, Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 7:50 ` Richard Biener @ 2023-11-28 10:31 ` Richard Sandiford 2023-11-28 11:21 ` Richard Biener 2023-11-28 15:00 ` Jeff Law 1 sibling, 1 reply; 15+ messages in thread From: Richard Sandiford @ 2023-11-28 10:31 UTC (permalink / raw) To: Richard Biener; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2 Richard Biener <rguenther@suse.de> writes: > On Mon, 27 Nov 2023, Jeff Law wrote: > >> >> >> On 11/27/23 05:39, Robin Dapp wrote: >> >> The easiest way to avoid running into the alias analysis problem is >> >> to scrap the MEM_EXPR when we expand the internal functions for >> >> partial loads/stores. That avoids the disambiguation we run into >> >> which is realizing that we store to an object of less size as >> >> the size of the mode we appear to store. >> >> >> >> After the patch we see just >> >> >> >> [1 S64 A32] >> >> >> >> so we preserve the alias set, the alignment and the size (the size >> >> is redundant if the MEM insn't BLKmode). That's still not good >> >> in case the RTL alias oracle would implement the same >> >> disambiguation but it fends off the gimple one. >> >> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 >> >> and --param=vect-partial-vector-usage=1. >> > >> > On riscv we're seeing a similar problem across the testsuite >> > and several execution failures as a result. In the case I >> > looked at we move a scalar load upwards over a partial store >> > that aliases the load. >> > >> > I independently arrived at the spot mentioned in >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 >> > before knowing about the PR. >> > >> > I can confirm that your RFC patch fixes at least two of the >> > failures, I haven't checked the others but very likely >> > they are similar. >> FWIW, it should always be safe to ignore the memory attributes. So if >> there's a reasonable condition here, then we can use it and just ignore the >> attribute. >> >> Does the attribute on a partial load/store indicate the potential load/store >> size or does it indicate the actual known load/store size. If the former, then >> we probably need to treat it as a may-read/may-write kind of reference. > > There's no way to distinguish a partial vs. non-partial MEM on RTL and > while without the bogus MEM_ATTR the alias oracle pieces that > miscompiled the original case are fended off we still see the load/store > as full given they have a mode with a size - that for example means > that DSE can elide a previous store to a masked part. Eventually > that's fended off by using an UNSPEC, but whether the RTL IL has > the correct semantics is questionable. > > That said, I did propose scrapping the MEM_EXPR which I think is > the correct thing to do unless we want to put a CALL_EXPR into it > (nothing would use that at the moment) or re-do MEM_EXPR and instead > have an ao_ref (or sth slightly more complete) instead of the current > MEM_ATTRs - but that would be a lot of work. > > This leaves the question wrt. semantics of for example x86 mask_store: > > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > (unspec:V4DF [ > (reg:V4DI 104 [ mask__16.8 ]) > (reg:V4DF 105 [ vect_cst__42 ]) > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} > (nil)) > > it uses a read-modify-write which makes it safe for DSE. mask_load > looks like > > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) > (unspec:V4DF [ > (reg:V4DI 114 [ mask__8.8 ]) > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} > (nil)) > > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads > are _not_ partial. That means the disambiguation against a store > to an object that's smaller than V4DF is still possible. > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look > at the mode. As discussed using a BLKmode MEM _might_ be a way > out but I didn't try what will happen then (patterns would need to > be adjusted I guess). > > That said, I'm happy to commit the partial fix, scrapping the > bogus MEM_EXPRs. > > OK for that? Could we restrict it to cases where the offset+size might overstep the bounds? I.e. do the equivalent of: /* Drop the object if the new right end is not within its bounds. */ if (adjust_object && maybe_gt (offset + size, attrs.size)) { attrs.expr = NULL_TREE; attrs.alias = 0; } (from adjust_address_1, although it might be difficult to reuse that code in this context). Richard ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 10:31 ` Richard Sandiford @ 2023-11-28 11:21 ` Richard Biener 2023-11-28 11:32 ` Richard Sandiford 0 siblings, 1 reply; 15+ messages in thread From: Richard Biener @ 2023-11-28 11:21 UTC (permalink / raw) To: Richard Sandiford; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2 On Tue, 28 Nov 2023, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Mon, 27 Nov 2023, Jeff Law wrote: > > > >> > >> > >> On 11/27/23 05:39, Robin Dapp wrote: > >> >> The easiest way to avoid running into the alias analysis problem is > >> >> to scrap the MEM_EXPR when we expand the internal functions for > >> >> partial loads/stores. That avoids the disambiguation we run into > >> >> which is realizing that we store to an object of less size as > >> >> the size of the mode we appear to store. > >> >> > >> >> After the patch we see just > >> >> > >> >> [1 S64 A32] > >> >> > >> >> so we preserve the alias set, the alignment and the size (the size > >> >> is redundant if the MEM insn't BLKmode). That's still not good > >> >> in case the RTL alias oracle would implement the same > >> >> disambiguation but it fends off the gimple one. > >> >> > >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > >> >> and --param=vect-partial-vector-usage=1. > >> > > >> > On riscv we're seeing a similar problem across the testsuite > >> > and several execution failures as a result. In the case I > >> > looked at we move a scalar load upwards over a partial store > >> > that aliases the load. > >> > > >> > I independently arrived at the spot mentioned in > >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 > >> > before knowing about the PR. > >> > > >> > I can confirm that your RFC patch fixes at least two of the > >> > failures, I haven't checked the others but very likely > >> > they are similar. > >> FWIW, it should always be safe to ignore the memory attributes. So if > >> there's a reasonable condition here, then we can use it and just ignore the > >> attribute. > >> > >> Does the attribute on a partial load/store indicate the potential load/store > >> size or does it indicate the actual known load/store size. If the former, then > >> we probably need to treat it as a may-read/may-write kind of reference. > > > > There's no way to distinguish a partial vs. non-partial MEM on RTL and > > while without the bogus MEM_ATTR the alias oracle pieces that > > miscompiled the original case are fended off we still see the load/store > > as full given they have a mode with a size - that for example means > > that DSE can elide a previous store to a masked part. Eventually > > that's fended off by using an UNSPEC, but whether the RTL IL has > > the correct semantics is questionable. > > > > That said, I did propose scrapping the MEM_EXPR which I think is > > the correct thing to do unless we want to put a CALL_EXPR into it > > (nothing would use that at the moment) or re-do MEM_EXPR and instead > > have an ao_ref (or sth slightly more complete) instead of the current > > MEM_ATTRs - but that would be a lot of work. > > > > This leaves the question wrt. semantics of for example x86 mask_store: > > > > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> > > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > > (unspec:V4DF [ > > (reg:V4DI 104 [ mask__16.8 ]) > > (reg:V4DF 105 [ vect_cst__42 ]) > > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) > > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} > > (nil)) > > > > it uses a read-modify-write which makes it safe for DSE. mask_load > > looks like > > > > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) > > (unspec:V4DF [ > > (reg:V4DI 114 [ mask__8.8 ]) > > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) > > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) > > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) > > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} > > (nil)) > > > > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a > > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads > > are _not_ partial. That means the disambiguation against a store > > to an object that's smaller than V4DF is still possible. > > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look > > at the mode. As discussed using a BLKmode MEM _might_ be a way > > out but I didn't try what will happen then (patterns would need to > > be adjusted I guess). > > > > That said, I'm happy to commit the partial fix, scrapping the > > bogus MEM_EXPRs. > > > > OK for that? > > Could we restrict it to cases where the offset+size might overstep the > bounds? I.e. do the equivalent of: > > /* Drop the object if the new right end is not within its bounds. */ > if (adjust_object && maybe_gt (offset + size, attrs.size)) > { > attrs.expr = NULL_TREE; > attrs.alias = 0; > } > > (from adjust_address_1, although it might be difficult to reuse that > code in this context). Well, the MEM_EXPR would still not correctly represent the performed access. But yes, if the base decl is visible and the offset is all constant we might be able to prove the access is not out of bounds. But that doesn't seem to be the most likely case? Note we have lost restrict info already when producing the .MASK_LOAD/STORE and TBAA info is also preserved without the MEM_EXPR. The MEM_EXPR itself adds points-to info only (besides the wrong access size). With ao_ref we also don't have a way to carry only points-to info we also have to build a fake access (which then also carries restrict info). Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 11:21 ` Richard Biener @ 2023-11-28 11:32 ` Richard Sandiford 2023-11-28 12:17 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Richard Sandiford @ 2023-11-28 11:32 UTC (permalink / raw) To: Richard Biener; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2 Richard Biener <rguenther@suse.de> writes: > On Tue, 28 Nov 2023, Richard Sandiford wrote: > >> Richard Biener <rguenther@suse.de> writes: >> > On Mon, 27 Nov 2023, Jeff Law wrote: >> > >> >> >> >> >> >> On 11/27/23 05:39, Robin Dapp wrote: >> >> >> The easiest way to avoid running into the alias analysis problem is >> >> >> to scrap the MEM_EXPR when we expand the internal functions for >> >> >> partial loads/stores. That avoids the disambiguation we run into >> >> >> which is realizing that we store to an object of less size as >> >> >> the size of the mode we appear to store. >> >> >> >> >> >> After the patch we see just >> >> >> >> >> >> [1 S64 A32] >> >> >> >> >> >> so we preserve the alias set, the alignment and the size (the size >> >> >> is redundant if the MEM insn't BLKmode). That's still not good >> >> >> in case the RTL alias oracle would implement the same >> >> >> disambiguation but it fends off the gimple one. >> >> >> >> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 >> >> >> and --param=vect-partial-vector-usage=1. >> >> > >> >> > On riscv we're seeing a similar problem across the testsuite >> >> > and several execution failures as a result. In the case I >> >> > looked at we move a scalar load upwards over a partial store >> >> > that aliases the load. >> >> > >> >> > I independently arrived at the spot mentioned in >> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 >> >> > before knowing about the PR. >> >> > >> >> > I can confirm that your RFC patch fixes at least two of the >> >> > failures, I haven't checked the others but very likely >> >> > they are similar. >> >> FWIW, it should always be safe to ignore the memory attributes. So if >> >> there's a reasonable condition here, then we can use it and just ignore the >> >> attribute. >> >> >> >> Does the attribute on a partial load/store indicate the potential load/store >> >> size or does it indicate the actual known load/store size. If the former, then >> >> we probably need to treat it as a may-read/may-write kind of reference. >> > >> > There's no way to distinguish a partial vs. non-partial MEM on RTL and >> > while without the bogus MEM_ATTR the alias oracle pieces that >> > miscompiled the original case are fended off we still see the load/store >> > as full given they have a mode with a size - that for example means >> > that DSE can elide a previous store to a masked part. Eventually >> > that's fended off by using an UNSPEC, but whether the RTL IL has >> > the correct semantics is questionable. >> > >> > That said, I did propose scrapping the MEM_EXPR which I think is >> > the correct thing to do unless we want to put a CALL_EXPR into it >> > (nothing would use that at the moment) or re-do MEM_EXPR and instead >> > have an ao_ref (or sth slightly more complete) instead of the current >> > MEM_ATTRs - but that would be a lot of work. >> > >> > This leaves the question wrt. semantics of for example x86 mask_store: >> > >> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) >> > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> >> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) >> > (unspec:V4DF [ >> > (reg:V4DI 104 [ mask__16.8 ]) >> > (reg:V4DF 105 [ vect_cst__42 ]) >> > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) >> > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) >> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) >> > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} >> > (nil)) >> > >> > it uses a read-modify-write which makes it safe for DSE. mask_load >> > looks like >> > >> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) >> > (unspec:V4DF [ >> > (reg:V4DI 114 [ mask__8.8 ]) >> > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) >> > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) >> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) >> > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} >> > (nil)) >> > >> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a >> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads >> > are _not_ partial. That means the disambiguation against a store >> > to an object that's smaller than V4DF is still possible. >> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look >> > at the mode. As discussed using a BLKmode MEM _might_ be a way >> > out but I didn't try what will happen then (patterns would need to >> > be adjusted I guess). >> > >> > That said, I'm happy to commit the partial fix, scrapping the >> > bogus MEM_EXPRs. >> > >> > OK for that? >> >> Could we restrict it to cases where the offset+size might overstep the >> bounds? I.e. do the equivalent of: >> >> /* Drop the object if the new right end is not within its bounds. */ >> if (adjust_object && maybe_gt (offset + size, attrs.size)) >> { >> attrs.expr = NULL_TREE; >> attrs.alias = 0; >> } >> >> (from adjust_address_1, although it might be difficult to reuse that >> code in this context). > > Well, the MEM_EXPR would still not correctly represent the performed > access. But yes, if the base decl is visible and the offset is > all constant we might be able to prove the access is not out of bounds. > But that doesn't seem to be the most likely case? Agree it's not the most likely case. Just thought it was still a distinction worth making, since it's one we already make elsewhere. It would also make the condition more self-documenting, and make it less likely that we're using this to cover up other problems. > Note we have lost restrict info already when producing the > .MASK_LOAD/STORE and TBAA info is also preserved without the > MEM_EXPR. The MEM_EXPR itself adds points-to info only > (besides the wrong access size). With ao_ref we also don't > have a way to carry only points-to info we also have to > build a fake access (which then also carries restrict info). Understood. It's just that one valid use of masked accesses is to optimise something like: int a[1024]; ... for (int i = 0; i < 1024; ++i) a[i] = foo ? bar : a[i]; ... so that no load takes place and the store is masked. In that context it seems like the access really is accurately described as a RMW access, and points-to info can be used in the same way as it would be for an unmasked load and an unmasked store. It would be good to retain the same level of disambiguation, however limited that is in practice. Thanks, Richard ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 11:32 ` Richard Sandiford @ 2023-11-28 12:17 ` Richard Biener 0 siblings, 0 replies; 15+ messages in thread From: Richard Biener @ 2023-11-28 12:17 UTC (permalink / raw) To: Richard Sandiford; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2 On Tue, 28 Nov 2023, Richard Sandiford wrote: > Richard Biener <rguenther@suse.de> writes: > > On Tue, 28 Nov 2023, Richard Sandiford wrote: > > > >> Richard Biener <rguenther@suse.de> writes: > >> > On Mon, 27 Nov 2023, Jeff Law wrote: > >> > > >> >> > >> >> > >> >> On 11/27/23 05:39, Robin Dapp wrote: > >> >> >> The easiest way to avoid running into the alias analysis problem is > >> >> >> to scrap the MEM_EXPR when we expand the internal functions for > >> >> >> partial loads/stores. That avoids the disambiguation we run into > >> >> >> which is realizing that we store to an object of less size as > >> >> >> the size of the mode we appear to store. > >> >> >> > >> >> >> After the patch we see just > >> >> >> > >> >> >> [1 S64 A32] > >> >> >> > >> >> >> so we preserve the alias set, the alignment and the size (the size > >> >> >> is redundant if the MEM insn't BLKmode). That's still not good > >> >> >> in case the RTL alias oracle would implement the same > >> >> >> disambiguation but it fends off the gimple one. > >> >> >> > >> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > >> >> >> and --param=vect-partial-vector-usage=1. > >> >> > > >> >> > On riscv we're seeing a similar problem across the testsuite > >> >> > and several execution failures as a result. In the case I > >> >> > looked at we move a scalar load upwards over a partial store > >> >> > that aliases the load. > >> >> > > >> >> > I independently arrived at the spot mentioned in > >> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4 > >> >> > before knowing about the PR. > >> >> > > >> >> > I can confirm that your RFC patch fixes at least two of the > >> >> > failures, I haven't checked the others but very likely > >> >> > they are similar. > >> >> FWIW, it should always be safe to ignore the memory attributes. So if > >> >> there's a reasonable condition here, then we can use it and just ignore the > >> >> attribute. > >> >> > >> >> Does the attribute on a partial load/store indicate the potential load/store > >> >> size or does it indicate the actual known load/store size. If the former, then > >> >> we probably need to treat it as a may-read/may-write kind of reference. > >> > > >> > There's no way to distinguish a partial vs. non-partial MEM on RTL and > >> > while without the bogus MEM_ATTR the alias oracle pieces that > >> > miscompiled the original case are fended off we still see the load/store > >> > as full given they have a mode with a size - that for example means > >> > that DSE can elide a previous store to a masked part. Eventually > >> > that's fended off by using an UNSPEC, but whether the RTL IL has > >> > the correct semantics is questionable. > >> > > >> > That said, I did propose scrapping the MEM_EXPR which I think is > >> > the correct thing to do unless we want to put a CALL_EXPR into it > >> > (nothing would use that at the moment) or re-do MEM_EXPR and instead > >> > have an ao_ref (or sth slightly more complete) instead of the current > >> > MEM_ATTRs - but that would be a lot of work. > >> > > >> > This leaves the question wrt. semantics of for example x86 mask_store: > >> > > >> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > >> > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> > >> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > >> > (unspec:V4DF [ > >> > (reg:V4DI 104 [ mask__16.8 ]) > >> > (reg:V4DF 105 [ vect_cst__42 ]) > >> > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > >> > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) > >> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > >> > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} > >> > (nil)) > >> > > >> > it uses a read-modify-write which makes it safe for DSE. mask_load > >> > looks like > >> > > >> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) > >> > (unspec:V4DF [ > >> > (reg:V4DI 114 [ mask__8.8 ]) > >> > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) > >> > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) > >> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) > >> > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} > >> > (nil)) > >> > > >> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a > >> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads > >> > are _not_ partial. That means the disambiguation against a store > >> > to an object that's smaller than V4DF is still possible. > >> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look > >> > at the mode. As discussed using a BLKmode MEM _might_ be a way > >> > out but I didn't try what will happen then (patterns would need to > >> > be adjusted I guess). > >> > > >> > That said, I'm happy to commit the partial fix, scrapping the > >> > bogus MEM_EXPRs. > >> > > >> > OK for that? > >> > >> Could we restrict it to cases where the offset+size might overstep the > >> bounds? I.e. do the equivalent of: > >> > >> /* Drop the object if the new right end is not within its bounds. */ > >> if (adjust_object && maybe_gt (offset + size, attrs.size)) > >> { > >> attrs.expr = NULL_TREE; > >> attrs.alias = 0; > >> } > >> > >> (from adjust_address_1, although it might be difficult to reuse that > >> code in this context). > > > > Well, the MEM_EXPR would still not correctly represent the performed > > access. But yes, if the base decl is visible and the offset is > > all constant we might be able to prove the access is not out of bounds. > > But that doesn't seem to be the most likely case? > > Agree it's not the most likely case. Just thought it was still a > distinction worth making, since it's one we already make elsewhere. > > It would also make the condition more self-documenting, and make it > less likely that we're using this to cover up other problems. > > > Note we have lost restrict info already when producing the > > .MASK_LOAD/STORE and TBAA info is also preserved without the > > MEM_EXPR. The MEM_EXPR itself adds points-to info only > > (besides the wrong access size). With ao_ref we also don't > > have a way to carry only points-to info we also have to > > build a fake access (which then also carries restrict info). > > Understood. It's just that one valid use of masked accesses > is to optimise something like: > > int a[1024]; > ... > for (int i = 0; i < 1024; ++i) > a[i] = foo ? bar : a[i]; > ... > > so that no load takes place and the store is masked. In that context > it seems like the access really is accurately described as a RMW access, > and points-to info can be used in the same way as it would be for an > unmasked load and an unmasked store. It would be good to retain the > same level of disambiguation, however limited that is in practice. Sure, the analysis we can perform at RTL expansion time is limited though. Note the actual problem is not the alias info but the pretended size / offset of the access which is wrong. Richard. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 7:50 ` Richard Biener 2023-11-28 10:31 ` Richard Sandiford @ 2023-11-28 15:00 ` Jeff Law 2023-11-29 7:16 ` Richard Biener 1 sibling, 1 reply; 15+ messages in thread From: Jeff Law @ 2023-11-28 15:00 UTC (permalink / raw) To: Richard Biener; +Cc: Robin Dapp, gcc-patches, Li, Pan2 On 11/28/23 00:50, Richard Biener wrote: > > There's no way to distinguish a partial vs. non-partial MEM on RTL and > while without the bogus MEM_ATTR the alias oracle pieces that > miscompiled the original case are fended off we still see the load/store > as full given they have a mode with a size - that for example means > that DSE can elide a previous store to a masked part. Eventually > that's fended off by using an UNSPEC, but whether the RTL IL has > the correct semantics is questionable. > > That said, I did propose scrapping the MEM_EXPR which I think is > the correct thing to do unless we want to put a CALL_EXPR into it > (nothing would use that at the moment) or re-do MEM_EXPR and instead > have an ao_ref (or sth slightly more complete) instead of the current > MEM_ATTRs - but that would be a lot of work. > > This leaves the question wrt. semantics of for example x86 mask_store: > > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > (unspec:V4DF [ > (reg:V4DI 104 [ mask__16.8 ]) > (reg:V4DF 105 [ vect_cst__42 ]) > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} > (nil)) > > it uses a read-modify-write which makes it safe for DSE. Agreed. mask_load > looks like > > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) > (unspec:V4DF [ > (reg:V4DI 114 [ mask__8.8 ]) > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} > (nil)) So with the mem:V4DF inside the unspec, ISTM we must treat that as a potential full read, but we can't rely on it being a full read. I don't think UNSPEC semantics are that it must read/consume all its operands in full, just that it might. That might be worth a documentation clarification. > > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads > are _not_ partial. That means the disambiguation against a store > to an object that's smaller than V4DF is still possible. > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look > at the mode. As discussed using a BLKmode MEM _might_ be a way > out but I didn't try what will happen then (patterns would need to > be adjusted I guess). > > That said, I'm happy to commit the partial fix, scrapping the > bogus MEM_EXPRs. > > OK for that? Works for me. jeff ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-11-28 15:00 ` Jeff Law @ 2023-11-29 7:16 ` Richard Biener 0 siblings, 0 replies; 15+ messages in thread From: Richard Biener @ 2023-11-29 7:16 UTC (permalink / raw) To: Jeff Law; +Cc: Robin Dapp, gcc-patches, Li, Pan2 On Tue, 28 Nov 2023, Jeff Law wrote: > > > On 11/28/23 00:50, Richard Biener wrote: > > > > > There's no way to distinguish a partial vs. non-partial MEM on RTL and > > while without the bogus MEM_ATTR the alias oracle pieces that > > miscompiled the original case are fended off we still see the load/store > > as full given they have a mode with a size - that for example means > > that DSE can elide a previous store to a masked part. Eventually > > that's fended off by using an UNSPEC, but whether the RTL IL has > > the correct semantics is questionable. > > > > That said, I did propose scrapping the MEM_EXPR which I think is > > the correct thing to do unless we want to put a CALL_EXPR into it > > (nothing would use that at the moment) or re-do MEM_EXPR and instead > > have an ao_ref (or sth slightly more complete) instead of the current > > MEM_ATTRs - but that would be a lot of work. > > > > This leaves the question wrt. semantics of for example x86 mask_store: > > > > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> > > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > > (unspec:V4DF [ > > (reg:V4DI 104 [ mask__16.8 ]) > > (reg:V4DF 105 [ vect_cst__42 ]) > > (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ]) > > (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) > > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64]) > > ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256} > > (nil)) > > > > it uses a read-modify-write which makes it safe for DSE. > Agreed. > > > mask_load > > looks like > > > > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ]) > > (unspec:V4DF [ > > (reg:V4DI 114 [ mask__8.8 ]) > > (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ]) > > (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) > > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64]) > > ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256} > > (nil)) > So with the mem:V4DF inside the unspec, ISTM we must treat that as a potential > full read, but we can't rely on it being a full read. I don't think UNSPEC > semantics are that it must read/consume all its operands in full, just that it > might. That might be worth a documentation clarification. > > > > > > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a > > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads > > are _not_ partial. That means the disambiguation against a store > > to an object that's smaller than V4DF is still possible. > > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look > > at the mode. As discussed using a BLKmode MEM _might_ be a way > > out but I didn't try what will happen then (patterns would need to > > be adjusted I guess). > > > > That said, I'm happy to commit the partial fix, scrapping the > > bogus MEM_EXPRs. > > > > OK for that? > Works for me. I'm re-testing the change and will push. If the UNSPEC uses are really OK I think we're set. We can incrementally try to restore missing alias info. Richard. > jeff > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <20230621074956.1174B3858288@sourceware.org>]
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores [not found] <20230621074956.1174B3858288@sourceware.org> @ 2023-06-26 8:29 ` Hongtao Liu 2023-06-26 8:41 ` Richard Biener 0 siblings, 1 reply; 15+ messages in thread From: Hongtao Liu @ 2023-06-26 8:29 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches On Wed, Jun 21, 2023 at 3:49 PM Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > The following addresses a miscompilation by RTL scheduling related > to the representation of masked stores. For that we have > > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > (vec_merge:V16SI (reg:V16SI 20 xmm0 [118]) > (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > > and specifically the memory attributes > > [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32] > > are problematic. They tell us the instruction stores and reads a full > vector which it if course does not. There isn't any good MEM_EXPR > we can use here (we lack a way to just specify a pointer and restrict > info for example), and since the MEMs have a vector mode it's > difficult in general as passes do not need to look at the memory > attributes at all. > > The easiest way to avoid running into the alias analysis problem is > to scrap the MEM_EXPR when we expand the internal functions for > partial loads/stores. That avoids the disambiguation we run into > which is realizing that we store to an object of less size as > the size of the mode we appear to store. > > After the patch we see just > > [1 S64 A32] > > so we preserve the alias set, the alignment and the size (the size > is redundant if the MEM insn't BLKmode). That's still not good > in case the RTL alias oracle would implement the same > disambiguation but it fends off the gimple one. > > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > and --param=vect-partial-vector-usage=1. > > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL > side we might instead want to use a BLKmode MEM? Any better > ideas here? Can we introduce a new member in class mem_attrs and ao_ref, initial the member (named partial_access_p) in expand_partial_load_optab_fn and expand_partial_store_optab_fn, pass partial_access_p from mem_attrs to ao_ref in djust ao_ref_from_mem. It looks to me we only want to avoid the below rule in alias analysis. For others, size, max_size, offset is still meaningful, even for rtx_addr_can_trap_p, if size can't trap, partial access must not trap? /* If the pointer based access is bigger than the variable they cannot alias. This is similar to the check below where we use TBAA to increase the size of the pointer based access based on the dynamic type of a containing object we can infer from it. */ poly_int64 dsize2; if (known_size_p (size1) --- should be unknown?? && poly_int_tree_p (DECL_SIZE (base2), &dsize2) && known_lt (dsize2, size1)) return false; > > Thanks, > Richard. > > PR middle-end/110237 > * internal-fn.cc (expand_partial_load_optab_fn): Clear > MEM_EXPR and MEM_OFFSET. > (expand_partial_store_optab_fn): Likewise. > --- > gcc/internal-fn.cc | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > index c911ae790cb..2dc685e7d85 100644 > --- a/gcc/internal-fn.cc > +++ b/gcc/internal-fn.cc > @@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > > mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > gcc_assert (MEM_P (mem)); > + /* The built MEM_REF does not accurately reflect that the load > + is only partial. Clear it. */ > + set_mem_expr (mem, NULL_TREE); > + clear_mem_offset (mem); > mask = expand_normal (maskt); > target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > create_output_operand (&ops[0], target, TYPE_MODE (type)); > @@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > > mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > gcc_assert (MEM_P (mem)); > + /* The built MEM_REF does not accurately reflect that the store > + is only partial. Clear it. */ > + set_mem_expr (mem, NULL_TREE); > + clear_mem_offset (mem); > mask = expand_normal (maskt); > reg = expand_normal (rhs); > create_fixed_operand (&ops[0], mem); > -- > 2.35.3 -- BR, Hongtao ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores 2023-06-26 8:29 ` Hongtao Liu @ 2023-06-26 8:41 ` Richard Biener 0 siblings, 0 replies; 15+ messages in thread From: Richard Biener @ 2023-06-26 8:41 UTC (permalink / raw) To: Hongtao Liu; +Cc: gcc-patches On Mon, 26 Jun 2023, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 3:49?PM Richard Biener via Gcc-patches > <gcc-patches@gcc.gnu.org> wrote: > > > > The following addresses a miscompilation by RTL scheduling related > > to the representation of masked stores. For that we have > > > > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > > (vec_merge:V16SI (reg:V16SI 20 xmm0 [118]) > > (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) > > (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) > > (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) > > > > and specifically the memory attributes > > > > [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32] > > > > are problematic. They tell us the instruction stores and reads a full > > vector which it if course does not. There isn't any good MEM_EXPR > > we can use here (we lack a way to just specify a pointer and restrict > > info for example), and since the MEMs have a vector mode it's > > difficult in general as passes do not need to look at the memory > > attributes at all. > > > > The easiest way to avoid running into the alias analysis problem is > > to scrap the MEM_EXPR when we expand the internal functions for > > partial loads/stores. That avoids the disambiguation we run into > > which is realizing that we store to an object of less size as > > the size of the mode we appear to store. > > > > After the patch we see just > > > > [1 S64 A32] > > > > so we preserve the alias set, the alignment and the size (the size > > is redundant if the MEM insn't BLKmode). That's still not good > > in case the RTL alias oracle would implement the same > > disambiguation but it fends off the gimple one. > > > > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 > > and --param=vect-partial-vector-usage=1. > > > > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL > > side we might instead want to use a BLKmode MEM? Any better > > ideas here? > Can we introduce a new member in class mem_attrs and ao_ref, initial > the member (named partial_access_p) in expand_partial_load_optab_fn > and expand_partial_store_optab_fn, pass partial_access_p from > mem_attrs to ao_ref in djust ao_ref_from_mem. > It looks to me we only want to avoid the below rule in alias analysis. > For others, size, max_size, offset is still meaningful, even for > rtx_addr_can_trap_p, if size can't trap, partial access must not trap? The GIMPLE oracle makes sure to set 'size' to -1 (not known) when seeing the IFNs for masked stores/loads. To fix the RTL MEM_EXPR side I would rather try putting a CALL_EXPR there, preserving the masked internal function, instead of using a mem-ref and additional info. That leaves the MEM RTX itself - for a MEM with non-BLKmode I thin MEM_ATTRS are completely optional and it's OK to drop them, we'd have to special case the partial MEM RTX then. Jeff agreed that eventually using BLKmode for them would be "OK", then we can make MEM_SIZE unknown as well. I think the issue is latent on branches so I first wanted to find some minimal change to mitigate the miscompile and then maybe try options to not lose most of the alias info here. Richard. > /* If the pointer based access is bigger than the variable they cannot > alias. This is similar to the check below where we use TBAA to > increase the size of the pointer based access based on the dynamic > type of a containing object we can infer from it. */ > poly_int64 dsize2; > if (known_size_p (size1) --- should be unknown?? > && poly_int_tree_p (DECL_SIZE (base2), &dsize2) > && known_lt (dsize2, size1)) > return false; > > > > > Thanks, > > Richard. > > > > PR middle-end/110237 > > * internal-fn.cc (expand_partial_load_optab_fn): Clear > > MEM_EXPR and MEM_OFFSET. > > (expand_partial_store_optab_fn): Likewise. > > --- > > gcc/internal-fn.cc | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc > > index c911ae790cb..2dc685e7d85 100644 > > --- a/gcc/internal-fn.cc > > +++ b/gcc/internal-fn.cc > > @@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > > > > mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > gcc_assert (MEM_P (mem)); > > + /* The built MEM_REF does not accurately reflect that the load > > + is only partial. Clear it. */ > > + set_mem_expr (mem, NULL_TREE); > > + clear_mem_offset (mem); > > mask = expand_normal (maskt); > > target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > create_output_operand (&ops[0], target, TYPE_MODE (type)); > > @@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) > > > > mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); > > gcc_assert (MEM_P (mem)); > > + /* The built MEM_REF does not accurately reflect that the store > > + is only partial. Clear it. */ > > + set_mem_expr (mem, NULL_TREE); > > + clear_mem_offset (mem); > > mask = expand_normal (maskt); > > reg = expand_normal (rhs); > > create_fixed_operand (&ops[0], mem); > > -- > > 2.35.3 > > > > -- Richard Biener <rguenther@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores @ 2023-06-21 7:49 Richard Biener 0 siblings, 0 replies; 15+ messages in thread From: Richard Biener @ 2023-06-21 7:49 UTC (permalink / raw) To: gcc-patches The following addresses a miscompilation by RTL scheduling related to the representation of masked stores. For that we have (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) (vec_merge:V16SI (reg:V16SI 20 xmm0 [118]) (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90]) (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2] <var_decl 0x7ffff6e28d80 b>) (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]) and specifically the memory attributes [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32] are problematic. They tell us the instruction stores and reads a full vector which it if course does not. There isn't any good MEM_EXPR we can use here (we lack a way to just specify a pointer and restrict info for example), and since the MEMs have a vector mode it's difficult in general as passes do not need to look at the memory attributes at all. The easiest way to avoid running into the alias analysis problem is to scrap the MEM_EXPR when we expand the internal functions for partial loads/stores. That avoids the disambiguation we run into which is realizing that we store to an object of less size as the size of the mode we appear to store. After the patch we see just [1 S64 A32] so we preserve the alias set, the alignment and the size (the size is redundant if the MEM insn't BLKmode). That's still not good in case the RTL alias oracle would implement the same disambiguation but it fends off the gimple one. This fixes gcc.dg/torture/pr58955-2.c when built with AVX512 and --param=vect-partial-vector-usage=1. On the MEM_EXPR side we could use a CALL_EXPR and on the RTL side we might instead want to use a BLKmode MEM? Any better ideas here? Thanks, Richard. PR middle-end/110237 * internal-fn.cc (expand_partial_load_optab_fn): Clear MEM_EXPR and MEM_OFFSET. (expand_partial_store_optab_fn): Likewise. --- gcc/internal-fn.cc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index c911ae790cb..2dc685e7d85 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab) mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE); gcc_assert (MEM_P (mem)); + /* The built MEM_REF does not accurately reflect that the load + is only partial. Clear it. */ + set_mem_expr (mem, NULL_TREE); + clear_mem_offset (mem); mask = expand_normal (maskt); target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); create_output_operand (&ops[0], target, TYPE_MODE (type)); @@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab) mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); gcc_assert (MEM_P (mem)); + /* The built MEM_REF does not accurately reflect that the store + is only partial. Clear it. */ + set_mem_expr (mem, NULL_TREE); + clear_mem_offset (mem); mask = expand_normal (maskt); reg = expand_normal (rhs); create_fixed_operand (&ops[0], mem); -- 2.35.3 ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-11-29 7:20 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20230621074951.F3C3C3858433@sourceware.org> 2023-06-21 15:29 ` [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores Jeff Law 2023-06-22 6:39 ` Richard Biener 2023-06-24 14:32 ` Jeff Law [not found] <20230621075019.7CA813858033@sourceware.org> 2023-11-27 12:39 ` Robin Dapp 2023-11-27 15:45 ` Jeff Law 2023-11-28 7:50 ` Richard Biener 2023-11-28 10:31 ` Richard Sandiford 2023-11-28 11:21 ` Richard Biener 2023-11-28 11:32 ` Richard Sandiford 2023-11-28 12:17 ` Richard Biener 2023-11-28 15:00 ` Jeff Law 2023-11-29 7:16 ` Richard Biener [not found] <20230621074956.1174B3858288@sourceware.org> 2023-06-26 8:29 ` Hongtao Liu 2023-06-26 8:41 ` Richard Biener 2023-06-21 7:49 Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).