public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
       [not found] <20230621074956.1174B3858288@sourceware.org>
@ 2023-06-26  8:29 ` Hongtao Liu
  2023-06-26  8:41   ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Hongtao Liu @ 2023-06-26  8:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches

On Wed, Jun 21, 2023 at 3:49 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> The following addresses a miscompilation by RTL scheduling related
> to the representation of masked stores.  For that we have
>
> (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
>                 (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
>                         (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
>         (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
>             (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
>                     (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
>                             (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
>
> and specifically the memory attributes
>
>   [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]
>
> are problematic.  They tell us the instruction stores and reads a full
> vector which it if course does not.  There isn't any good MEM_EXPR
> we can use here (we lack a way to just specify a pointer and restrict
> info for example), and since the MEMs have a vector mode it's
> difficult in general as passes do not need to look at the memory
> attributes at all.
>
> The easiest way to avoid running into the alias analysis problem is
> to scrap the MEM_EXPR when we expand the internal functions for
> partial loads/stores.  That avoids the disambiguation we run into
> which is realizing that we store to an object of less size as
> the size of the mode we appear to store.
>
> After the patch we see just
>
>   [1  S64 A32]
>
> so we preserve the alias set, the alignment and the size (the size
> is redundant if the MEM insn't BLKmode).  That's still not good
> in case the RTL alias oracle would implement the same
> disambiguation but it fends off the gimple one.
>
> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> and --param=vect-partial-vector-usage=1.
>
> On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> side we might instead want to use a BLKmode MEM?  Any better
> ideas here?
Can we introduce a new member in class mem_attrs and ao_ref, initial
the member (named partial_access_p) in expand_partial_load_optab_fn
and expand_partial_store_optab_fn, pass partial_access_p from
mem_attrs to ao_ref in djust ao_ref_from_mem.
It looks to me we only want to avoid the below rule in alias analysis.
For others, size, max_size, offset is still meaningful, even for
rtx_addr_can_trap_p, if size can't trap, partial access must not trap?

  /* If the pointer based access is bigger than the variable they cannot
     alias.  This is similar to the check below where we use TBAA to
     increase the size of the pointer based access based on the dynamic
     type of a containing object we can infer from it.  */
  poly_int64 dsize2;
  if (known_size_p (size1) --- should be unknown??
      && poly_int_tree_p (DECL_SIZE (base2), &dsize2)
      && known_lt (dsize2, size1))
    return false;

>
> Thanks,
> Richard.
>
>         PR middle-end/110237
>         * internal-fn.cc (expand_partial_load_optab_fn): Clear
>         MEM_EXPR and MEM_OFFSET.
>         (expand_partial_store_optab_fn): Likewise.
> ---
>  gcc/internal-fn.cc | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index c911ae790cb..2dc685e7d85 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>
>    mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>    gcc_assert (MEM_P (mem));
> +  /* The built MEM_REF does not accurately reflect that the load
> +     is only partial.  Clear it.  */
> +  set_mem_expr (mem, NULL_TREE);
> +  clear_mem_offset (mem);
>    mask = expand_normal (maskt);
>    target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>    create_output_operand (&ops[0], target, TYPE_MODE (type));
> @@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>
>    mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>    gcc_assert (MEM_P (mem));
> +  /* The built MEM_REF does not accurately reflect that the store
> +     is only partial.  Clear it.  */
> +  set_mem_expr (mem, NULL_TREE);
> +  clear_mem_offset (mem);
>    mask = expand_normal (maskt);
>    reg = expand_normal (rhs);
>    create_fixed_operand (&ops[0], mem);
> --
> 2.35.3



-- 
BR,
Hongtao

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-06-26  8:29 ` [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores Hongtao Liu
@ 2023-06-26  8:41   ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2023-06-26  8:41 UTC (permalink / raw)
  To: Hongtao Liu; +Cc: gcc-patches

On Mon, 26 Jun 2023, Hongtao Liu wrote:

> On Wed, Jun 21, 2023 at 3:49?PM Richard Biener via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > The following addresses a miscompilation by RTL scheduling related
> > to the representation of masked stores.  For that we have
> >
> > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
> >                 (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
> >                         (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
> >         (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
> >             (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
> >                     (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
> >                             (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
> >
> > and specifically the memory attributes
> >
> >   [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]
> >
> > are problematic.  They tell us the instruction stores and reads a full
> > vector which it if course does not.  There isn't any good MEM_EXPR
> > we can use here (we lack a way to just specify a pointer and restrict
> > info for example), and since the MEMs have a vector mode it's
> > difficult in general as passes do not need to look at the memory
> > attributes at all.
> >
> > The easiest way to avoid running into the alias analysis problem is
> > to scrap the MEM_EXPR when we expand the internal functions for
> > partial loads/stores.  That avoids the disambiguation we run into
> > which is realizing that we store to an object of less size as
> > the size of the mode we appear to store.
> >
> > After the patch we see just
> >
> >   [1  S64 A32]
> >
> > so we preserve the alias set, the alignment and the size (the size
> > is redundant if the MEM insn't BLKmode).  That's still not good
> > in case the RTL alias oracle would implement the same
> > disambiguation but it fends off the gimple one.
> >
> > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> > and --param=vect-partial-vector-usage=1.
> >
> > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> > side we might instead want to use a BLKmode MEM?  Any better
> > ideas here?
> Can we introduce a new member in class mem_attrs and ao_ref, initial
> the member (named partial_access_p) in expand_partial_load_optab_fn
> and expand_partial_store_optab_fn, pass partial_access_p from
> mem_attrs to ao_ref in djust ao_ref_from_mem.
> It looks to me we only want to avoid the below rule in alias analysis.
> For others, size, max_size, offset is still meaningful, even for
> rtx_addr_can_trap_p, if size can't trap, partial access must not trap?

The GIMPLE oracle makes sure to set 'size' to -1 (not known) when
seeing the IFNs for masked stores/loads.  To fix the RTL MEM_EXPR
side I would rather try putting a CALL_EXPR there, preserving the
masked internal function, instead of using a mem-ref and additional
info.

That leaves the MEM RTX itself - for a MEM with non-BLKmode I thin
MEM_ATTRS are completely optional and it's OK to drop them, we'd
have to special case the partial MEM RTX then.  Jeff agreed that
eventually using BLKmode for them would be "OK", then we can
make MEM_SIZE unknown as well.

I think the issue is latent on branches so I first wanted to find
some minimal change to mitigate the miscompile and then maybe try
options to not lose most of the alias info here.

Richard.

>   /* If the pointer based access is bigger than the variable they cannot
>      alias.  This is similar to the check below where we use TBAA to
>      increase the size of the pointer based access based on the dynamic
>      type of a containing object we can infer from it.  */
>   poly_int64 dsize2;
>   if (known_size_p (size1) --- should be unknown??
>       && poly_int_tree_p (DECL_SIZE (base2), &dsize2)
>       && known_lt (dsize2, size1))
>     return false;
> 
> >
> > Thanks,
> > Richard.
> >
> >         PR middle-end/110237
> >         * internal-fn.cc (expand_partial_load_optab_fn): Clear
> >         MEM_EXPR and MEM_OFFSET.
> >         (expand_partial_store_optab_fn): Likewise.
> > ---
> >  gcc/internal-fn.cc | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> > index c911ae790cb..2dc685e7d85 100644
> > --- a/gcc/internal-fn.cc
> > +++ b/gcc/internal-fn.cc
> > @@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> >
> >    mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >    gcc_assert (MEM_P (mem));
> > +  /* The built MEM_REF does not accurately reflect that the load
> > +     is only partial.  Clear it.  */
> > +  set_mem_expr (mem, NULL_TREE);
> > +  clear_mem_offset (mem);
> >    mask = expand_normal (maskt);
> >    target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >    create_output_operand (&ops[0], target, TYPE_MODE (type));
> > @@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> >
> >    mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> >    gcc_assert (MEM_P (mem));
> > +  /* The built MEM_REF does not accurately reflect that the store
> > +     is only partial.  Clear it.  */
> > +  set_mem_expr (mem, NULL_TREE);
> > +  clear_mem_offset (mem);
> >    mask = expand_normal (maskt);
> >    reg = expand_normal (rhs);
> >    create_fixed_operand (&ops[0], mem);
> > --
> > 2.35.3
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28 15:00       ` Jeff Law
@ 2023-11-29  7:16         ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2023-11-29  7:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: Robin Dapp, gcc-patches, Li, Pan2

On Tue, 28 Nov 2023, Jeff Law wrote:

> 
> 
> On 11/28/23 00:50, Richard Biener wrote:
> 
> > 
> > There's no way to distinguish a partial vs. non-partial MEM on RTL and
> > while without the bogus MEM_ATTR the alias oracle pieces that
> > miscompiled the original case are fended off we still see the load/store
> > as full given they have a mode with a size - that for example means
> > that DSE can elide a previous store to a masked part.  Eventually
> > that's fended off by using an UNSPEC, but whether the RTL IL has
> > the correct semantics is questionable.
> > 
> > That said, I did propose scrapping the MEM_EXPR which I think is
> > the correct thing to do unless we want to put a CALL_EXPR into it
> > (nothing would use that at the moment) or re-do MEM_EXPR and instead
> > have an ao_ref (or sth slightly more complete) instead of the current
> > MEM_ATTRs - but that would be a lot of work.
> > 
> > This leaves the question wrt. semantics of for example x86 mask_store:
> > 
> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >                  (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double>
> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >          (unspec:V4DF [
> >                  (reg:V4DI 104 [ mask__16.8 ])
> >                  (reg:V4DF 105 [ vect_cst__42 ])
> >                  (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >                          (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4)
> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >              ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
> >       (nil))
> > 
> > it uses a read-modify-write which makes it safe for DSE.
> Agreed.
> 
> 
>   mask_load
> > looks like
> > 
> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
> >          (unspec:V4DF [
> >                  (reg:V4DI 114 [ mask__8.8 ])
> >                  (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
> >                          (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4)
> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
> >              ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
> >       (nil))
> So with the mem:V4DF inside the unspec, ISTM we must treat that as a potential
> full read, but we can't rely on it being a full read.  I don't think UNSPEC
> semantics are that it must read/consume all its operands in full, just that it
> might.  That might be worth a documentation clarification.
> 
> 
> > 
> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> > are _not_ partial.  That means the disambiguation against a store
> > to an object that's smaller than V4DF is still possible.
> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> > at the mode.  As discussed using a BLKmode MEM _might_ be a way
> > out but I didn't try what will happen then (patterns would need to
> > be adjusted I guess).
> > 
> > That said, I'm happy to commit the partial fix, scrapping the
> > bogus MEM_EXPRs.
> > 
> > OK for that?
> Works for me.

I'm re-testing the change and will push.  If the UNSPEC uses are really
OK I think we're set.  We can incrementally try to restore missing
alias info.

Richard.

> jeff
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28  7:50     ` Richard Biener
  2023-11-28 10:31       ` Richard Sandiford
@ 2023-11-28 15:00       ` Jeff Law
  2023-11-29  7:16         ` Richard Biener
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Law @ 2023-11-28 15:00 UTC (permalink / raw)
  To: Richard Biener; +Cc: Robin Dapp, gcc-patches, Li, Pan2



On 11/28/23 00:50, Richard Biener wrote:

> 
> There's no way to distinguish a partial vs. non-partial MEM on RTL and
> while without the bogus MEM_ATTR the alias oracle pieces that
> miscompiled the original case are fended off we still see the load/store
> as full given they have a mode with a size - that for example means
> that DSE can elide a previous store to a masked part.  Eventually
> that's fended off by using an UNSPEC, but whether the RTL IL has
> the correct semantics is questionable.
> 
> That said, I did propose scrapping the MEM_EXPR which I think is
> the correct thing to do unless we want to put a CALL_EXPR into it
> (nothing would use that at the moment) or re-do MEM_EXPR and instead
> have an ao_ref (or sth slightly more complete) instead of the current
> MEM_ATTRs - but that would be a lot of work.
> 
> This leaves the question wrt. semantics of for example x86 mask_store:
> 
> (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>                  (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double>
> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>          (unspec:V4DF [
>                  (reg:V4DI 104 [ mask__16.8 ])
>                  (reg:V4DF 105 [ vect_cst__42 ])
>                  (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>                          (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4)
> double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>              ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
>       (nil))
> 
> it uses a read-modify-write which makes it safe for DSE.
Agreed.


   mask_load
> looks like
> 
> (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
>          (unspec:V4DF [
>                  (reg:V4DI 114 [ mask__8.8 ])
>                  (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
>                          (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4)
> double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
>              ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
>       (nil))
So with the mem:V4DF inside the unspec, ISTM we must treat that as a 
potential full read, but we can't rely on it being a full read.  I don't 
think UNSPEC semantics are that it must read/consume all its operands in 
full, just that it might.  That might be worth a documentation 
clarification.


> 
> both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> are _not_ partial.  That means the disambiguation against a store
> to an object that's smaller than V4DF is still possible.
> Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> at the mode.  As discussed using a BLKmode MEM _might_ be a way
> out but I didn't try what will happen then (patterns would need to
> be adjusted I guess).
> 
> That said, I'm happy to commit the partial fix, scrapping the
> bogus MEM_EXPRs.
> 
> OK for that?
Works for me.
jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28 11:32           ` Richard Sandiford
@ 2023-11-28 12:17             ` Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2023-11-28 12:17 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2

On Tue, 28 Nov 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Tue, 28 Nov 2023, Richard Sandiford wrote:
> >
> >> Richard Biener <rguenther@suse.de> writes:
> >> > On Mon, 27 Nov 2023, Jeff Law wrote:
> >> >
> >> >> 
> >> >> 
> >> >> On 11/27/23 05:39, Robin Dapp wrote:
> >> >> >> The easiest way to avoid running into the alias analysis problem is
> >> >> >> to scrap the MEM_EXPR when we expand the internal functions for
> >> >> >> partial loads/stores.  That avoids the disambiguation we run into
> >> >> >> which is realizing that we store to an object of less size as
> >> >> >> the size of the mode we appear to store.
> >> >> >>
> >> >> >> After the patch we see just
> >> >> >>
> >> >> >>    [1  S64 A32]
> >> >> >>
> >> >> >> so we preserve the alias set, the alignment and the size (the size
> >> >> >> is redundant if the MEM insn't BLKmode).  That's still not good
> >> >> >> in case the RTL alias oracle would implement the same
> >> >> >> disambiguation but it fends off the gimple one.
> >> >> >>
> >> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> >> >> >> and --param=vect-partial-vector-usage=1.
> >> >> > 
> >> >> > On riscv we're seeing a similar problem across the testsuite
> >> >> > and several execution failures as a result.  In the case I
> >> >> > looked at we move a scalar load upwards over a partial store
> >> >> > that aliases the load.
> >> >> > 
> >> >> > I independently arrived at the spot mentioned in
> >> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
> >> >> > before knowing about the PR.
> >> >> > 
> >> >> > I can confirm that your RFC patch fixes at least two of the
> >> >> > failures,  I haven't checked the others but very likely
> >> >> > they are similar.
> >> >> FWIW, it should always be safe to ignore the memory attributes.   So if
> >> >> there's a reasonable condition here, then we can use it and just ignore the
> >> >> attribute.
> >> >> 
> >> >> Does the attribute on a partial load/store indicate the potential load/store
> >> >> size or does it indicate the actual known load/store size. If the former, then
> >> >> we probably need to treat it as a may-read/may-write kind of reference.
> >> >
> >> > There's no way to distinguish a partial vs. non-partial MEM on RTL and
> >> > while without the bogus MEM_ATTR the alias oracle pieces that
> >> > miscompiled the original case are fended off we still see the load/store
> >> > as full given they have a mode with a size - that for example means
> >> > that DSE can elide a previous store to a masked part.  Eventually
> >> > that's fended off by using an UNSPEC, but whether the RTL IL has
> >> > the correct semantics is questionable.
> >> >
> >> > That said, I did propose scrapping the MEM_EXPR which I think is
> >> > the correct thing to do unless we want to put a CALL_EXPR into it
> >> > (nothing would use that at the moment) or re-do MEM_EXPR and instead
> >> > have an ao_ref (or sth slightly more complete) instead of the current
> >> > MEM_ATTRs - but that would be a lot of work.
> >> >
> >> > This leaves the question wrt. semantics of for example x86 mask_store:
> >> >
> >> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >> >                 (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> 
> >> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >> >         (unspec:V4DF [
> >> >                 (reg:V4DI 104 [ mask__16.8 ])
> >> >                 (reg:V4DF 105 [ vect_cst__42 ])
> >> >                 (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >> >                         (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) 
> >> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >> >             ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
> >> >      (nil))
> >> >
> >> > it uses a read-modify-write which makes it safe for DSE.  mask_load
> >> > looks like
> >> >
> >> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
> >> >         (unspec:V4DF [
> >> >                 (reg:V4DI 114 [ mask__8.8 ])
> >> >                 (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
> >> >                         (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) 
> >> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
> >> >             ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
> >> >      (nil))
> >> >
> >> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> >> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> >> > are _not_ partial.  That means the disambiguation against a store
> >> > to an object that's smaller than V4DF is still possible.
> >> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> >> > at the mode.  As discussed using a BLKmode MEM _might_ be a way
> >> > out but I didn't try what will happen then (patterns would need to
> >> > be adjusted I guess).
> >> >
> >> > That said, I'm happy to commit the partial fix, scrapping the
> >> > bogus MEM_EXPRs.
> >> >
> >> > OK for that?
> >> 
> >> Could we restrict it to cases where the offset+size might overstep the
> >> bounds?  I.e. do the equivalent of:
> >> 
> >>       /* Drop the object if the new right end is not within its bounds.  */
> >>       if (adjust_object && maybe_gt (offset + size, attrs.size))
> >> 	{
> >> 	  attrs.expr = NULL_TREE;
> >> 	  attrs.alias = 0;
> >> 	}
> >> 
> >> (from adjust_address_1, although it might be difficult to reuse that
> >> code in this context).
> >
> > Well, the MEM_EXPR would still not correctly represent the performed
> > access.  But yes, if the base decl is visible and the offset is
> > all constant we might be able to prove the access is not out of bounds.
> > But that doesn't seem to be the most likely case?
> 
> Agree it's not the most likely case.  Just thought it was still a
> distinction worth making, since it's one we already make elsewhere.
> 
> It would also make the condition more self-documenting, and make it
> less likely that we're using this to cover up other problems.
> 
> > Note we have lost restrict info already when producing the
> > .MASK_LOAD/STORE and TBAA info is also preserved without the
> > MEM_EXPR.  The MEM_EXPR itself adds points-to info only
> > (besides the wrong access size).  With ao_ref we also don't
> > have a way to carry only points-to info we also have to
> > build a fake access (which then also carries restrict info).
> 
> Understood.  It's just that one valid use of masked accesses
> is to optimise something like:
> 
> int a[1024];
> ...
>   for (int i = 0; i < 1024; ++i)
>     a[i] = foo ? bar : a[i];
> ...
> 
> so that no load takes place and the store is masked.  In that context
> it seems like the access really is accurately described as a RMW access,
> and points-to info can be used in the same way as it would be for an
> unmasked load and an unmasked store.  It would be good to retain the
> same level of disambiguation, however limited that is in practice.

Sure, the analysis we can perform at RTL expansion time is limited
though.  Note the actual problem is not the alias info but the
pretended size / offset of the access which is wrong.

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28 11:21         ` Richard Biener
@ 2023-11-28 11:32           ` Richard Sandiford
  2023-11-28 12:17             ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2023-11-28 11:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2

Richard Biener <rguenther@suse.de> writes:
> On Tue, 28 Nov 2023, Richard Sandiford wrote:
>
>> Richard Biener <rguenther@suse.de> writes:
>> > On Mon, 27 Nov 2023, Jeff Law wrote:
>> >
>> >> 
>> >> 
>> >> On 11/27/23 05:39, Robin Dapp wrote:
>> >> >> The easiest way to avoid running into the alias analysis problem is
>> >> >> to scrap the MEM_EXPR when we expand the internal functions for
>> >> >> partial loads/stores.  That avoids the disambiguation we run into
>> >> >> which is realizing that we store to an object of less size as
>> >> >> the size of the mode we appear to store.
>> >> >>
>> >> >> After the patch we see just
>> >> >>
>> >> >>    [1  S64 A32]
>> >> >>
>> >> >> so we preserve the alias set, the alignment and the size (the size
>> >> >> is redundant if the MEM insn't BLKmode).  That's still not good
>> >> >> in case the RTL alias oracle would implement the same
>> >> >> disambiguation but it fends off the gimple one.
>> >> >>
>> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
>> >> >> and --param=vect-partial-vector-usage=1.
>> >> > 
>> >> > On riscv we're seeing a similar problem across the testsuite
>> >> > and several execution failures as a result.  In the case I
>> >> > looked at we move a scalar load upwards over a partial store
>> >> > that aliases the load.
>> >> > 
>> >> > I independently arrived at the spot mentioned in
>> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
>> >> > before knowing about the PR.
>> >> > 
>> >> > I can confirm that your RFC patch fixes at least two of the
>> >> > failures,  I haven't checked the others but very likely
>> >> > they are similar.
>> >> FWIW, it should always be safe to ignore the memory attributes.   So if
>> >> there's a reasonable condition here, then we can use it and just ignore the
>> >> attribute.
>> >> 
>> >> Does the attribute on a partial load/store indicate the potential load/store
>> >> size or does it indicate the actual known load/store size. If the former, then
>> >> we probably need to treat it as a may-read/may-write kind of reference.
>> >
>> > There's no way to distinguish a partial vs. non-partial MEM on RTL and
>> > while without the bogus MEM_ATTR the alias oracle pieces that
>> > miscompiled the original case are fended off we still see the load/store
>> > as full given they have a mode with a size - that for example means
>> > that DSE can elide a previous store to a masked part.  Eventually
>> > that's fended off by using an UNSPEC, but whether the RTL IL has
>> > the correct semantics is questionable.
>> >
>> > That said, I did propose scrapping the MEM_EXPR which I think is
>> > the correct thing to do unless we want to put a CALL_EXPR into it
>> > (nothing would use that at the moment) or re-do MEM_EXPR and instead
>> > have an ao_ref (or sth slightly more complete) instead of the current
>> > MEM_ATTRs - but that would be a lot of work.
>> >
>> > This leaves the question wrt. semantics of for example x86 mask_store:
>> >
>> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>> >                 (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> 
>> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>> >         (unspec:V4DF [
>> >                 (reg:V4DI 104 [ mask__16.8 ])
>> >                 (reg:V4DF 105 [ vect_cst__42 ])
>> >                 (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>> >                         (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) 
>> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>> >             ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
>> >      (nil))
>> >
>> > it uses a read-modify-write which makes it safe for DSE.  mask_load
>> > looks like
>> >
>> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
>> >         (unspec:V4DF [
>> >                 (reg:V4DI 114 [ mask__8.8 ])
>> >                 (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
>> >                         (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) 
>> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
>> >             ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
>> >      (nil))
>> >
>> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
>> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
>> > are _not_ partial.  That means the disambiguation against a store
>> > to an object that's smaller than V4DF is still possible.
>> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
>> > at the mode.  As discussed using a BLKmode MEM _might_ be a way
>> > out but I didn't try what will happen then (patterns would need to
>> > be adjusted I guess).
>> >
>> > That said, I'm happy to commit the partial fix, scrapping the
>> > bogus MEM_EXPRs.
>> >
>> > OK for that?
>> 
>> Could we restrict it to cases where the offset+size might overstep the
>> bounds?  I.e. do the equivalent of:
>> 
>>       /* Drop the object if the new right end is not within its bounds.  */
>>       if (adjust_object && maybe_gt (offset + size, attrs.size))
>> 	{
>> 	  attrs.expr = NULL_TREE;
>> 	  attrs.alias = 0;
>> 	}
>> 
>> (from adjust_address_1, although it might be difficult to reuse that
>> code in this context).
>
> Well, the MEM_EXPR would still not correctly represent the performed
> access.  But yes, if the base decl is visible and the offset is
> all constant we might be able to prove the access is not out of bounds.
> But that doesn't seem to be the most likely case?

Agree it's not the most likely case.  Just thought it was still a
distinction worth making, since it's one we already make elsewhere.

It would also make the condition more self-documenting, and make it
less likely that we're using this to cover up other problems.

> Note we have lost restrict info already when producing the
> .MASK_LOAD/STORE and TBAA info is also preserved without the
> MEM_EXPR.  The MEM_EXPR itself adds points-to info only
> (besides the wrong access size).  With ao_ref we also don't
> have a way to carry only points-to info we also have to
> build a fake access (which then also carries restrict info).

Understood.  It's just that one valid use of masked accesses
is to optimise something like:

int a[1024];
...
  for (int i = 0; i < 1024; ++i)
    a[i] = foo ? bar : a[i];
...

so that no load takes place and the store is masked.  In that context
it seems like the access really is accurately described as a RMW access,
and points-to info can be used in the same way as it would be for an
unmasked load and an unmasked store.  It would be good to retain the
same level of disambiguation, however limited that is in practice.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28 10:31       ` Richard Sandiford
@ 2023-11-28 11:21         ` Richard Biener
  2023-11-28 11:32           ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2023-11-28 11:21 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2

On Tue, 28 Nov 2023, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 27 Nov 2023, Jeff Law wrote:
> >
> >> 
> >> 
> >> On 11/27/23 05:39, Robin Dapp wrote:
> >> >> The easiest way to avoid running into the alias analysis problem is
> >> >> to scrap the MEM_EXPR when we expand the internal functions for
> >> >> partial loads/stores.  That avoids the disambiguation we run into
> >> >> which is realizing that we store to an object of less size as
> >> >> the size of the mode we appear to store.
> >> >>
> >> >> After the patch we see just
> >> >>
> >> >>    [1  S64 A32]
> >> >>
> >> >> so we preserve the alias set, the alignment and the size (the size
> >> >> is redundant if the MEM insn't BLKmode).  That's still not good
> >> >> in case the RTL alias oracle would implement the same
> >> >> disambiguation but it fends off the gimple one.
> >> >>
> >> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> >> >> and --param=vect-partial-vector-usage=1.
> >> > 
> >> > On riscv we're seeing a similar problem across the testsuite
> >> > and several execution failures as a result.  In the case I
> >> > looked at we move a scalar load upwards over a partial store
> >> > that aliases the load.
> >> > 
> >> > I independently arrived at the spot mentioned in
> >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
> >> > before knowing about the PR.
> >> > 
> >> > I can confirm that your RFC patch fixes at least two of the
> >> > failures,  I haven't checked the others but very likely
> >> > they are similar.
> >> FWIW, it should always be safe to ignore the memory attributes.   So if
> >> there's a reasonable condition here, then we can use it and just ignore the
> >> attribute.
> >> 
> >> Does the attribute on a partial load/store indicate the potential load/store
> >> size or does it indicate the actual known load/store size. If the former, then
> >> we probably need to treat it as a may-read/may-write kind of reference.
> >
> > There's no way to distinguish a partial vs. non-partial MEM on RTL and
> > while without the bogus MEM_ATTR the alias oracle pieces that
> > miscompiled the original case are fended off we still see the load/store
> > as full given they have a mode with a size - that for example means
> > that DSE can elide a previous store to a masked part.  Eventually
> > that's fended off by using an UNSPEC, but whether the RTL IL has
> > the correct semantics is questionable.
> >
> > That said, I did propose scrapping the MEM_EXPR which I think is
> > the correct thing to do unless we want to put a CALL_EXPR into it
> > (nothing would use that at the moment) or re-do MEM_EXPR and instead
> > have an ao_ref (or sth slightly more complete) instead of the current
> > MEM_ATTRs - but that would be a lot of work.
> >
> > This leaves the question wrt. semantics of for example x86 mask_store:
> >
> > (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >                 (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> 
> > [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >         (unspec:V4DF [
> >                 (reg:V4DI 104 [ mask__16.8 ])
> >                 (reg:V4DF 105 [ vect_cst__42 ])
> >                 (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
> >                         (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) 
> > double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
> >             ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
> >      (nil))
> >
> > it uses a read-modify-write which makes it safe for DSE.  mask_load
> > looks like
> >
> > (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
> >         (unspec:V4DF [
> >                 (reg:V4DI 114 [ mask__8.8 ])
> >                 (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
> >                         (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) 
> > double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
> >             ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
> >      (nil))
> >
> > both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> > MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> > are _not_ partial.  That means the disambiguation against a store
> > to an object that's smaller than V4DF is still possible.
> > Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> > at the mode.  As discussed using a BLKmode MEM _might_ be a way
> > out but I didn't try what will happen then (patterns would need to
> > be adjusted I guess).
> >
> > That said, I'm happy to commit the partial fix, scrapping the
> > bogus MEM_EXPRs.
> >
> > OK for that?
> 
> Could we restrict it to cases where the offset+size might overstep the
> bounds?  I.e. do the equivalent of:
> 
>       /* Drop the object if the new right end is not within its bounds.  */
>       if (adjust_object && maybe_gt (offset + size, attrs.size))
> 	{
> 	  attrs.expr = NULL_TREE;
> 	  attrs.alias = 0;
> 	}
> 
> (from adjust_address_1, although it might be difficult to reuse that
> code in this context).

Well, the MEM_EXPR would still not correctly represent the performed
access.  But yes, if the base decl is visible and the offset is
all constant we might be able to prove the access is not out of bounds.
But that doesn't seem to be the most likely case?

Note we have lost restrict info already when producing the
.MASK_LOAD/STORE and TBAA info is also preserved without the
MEM_EXPR.  The MEM_EXPR itself adds points-to info only
(besides the wrong access size).  With ao_ref we also don't
have a way to carry only points-to info we also have to
build a fake access (which then also carries restrict info).

Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-28  7:50     ` Richard Biener
@ 2023-11-28 10:31       ` Richard Sandiford
  2023-11-28 11:21         ` Richard Biener
  2023-11-28 15:00       ` Jeff Law
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2023-11-28 10:31 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Robin Dapp, gcc-patches, Li, Pan2

Richard Biener <rguenther@suse.de> writes:
> On Mon, 27 Nov 2023, Jeff Law wrote:
>
>> 
>> 
>> On 11/27/23 05:39, Robin Dapp wrote:
>> >> The easiest way to avoid running into the alias analysis problem is
>> >> to scrap the MEM_EXPR when we expand the internal functions for
>> >> partial loads/stores.  That avoids the disambiguation we run into
>> >> which is realizing that we store to an object of less size as
>> >> the size of the mode we appear to store.
>> >>
>> >> After the patch we see just
>> >>
>> >>    [1  S64 A32]
>> >>
>> >> so we preserve the alias set, the alignment and the size (the size
>> >> is redundant if the MEM insn't BLKmode).  That's still not good
>> >> in case the RTL alias oracle would implement the same
>> >> disambiguation but it fends off the gimple one.
>> >>
>> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
>> >> and --param=vect-partial-vector-usage=1.
>> > 
>> > On riscv we're seeing a similar problem across the testsuite
>> > and several execution failures as a result.  In the case I
>> > looked at we move a scalar load upwards over a partial store
>> > that aliases the load.
>> > 
>> > I independently arrived at the spot mentioned in
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
>> > before knowing about the PR.
>> > 
>> > I can confirm that your RFC patch fixes at least two of the
>> > failures,  I haven't checked the others but very likely
>> > they are similar.
>> FWIW, it should always be safe to ignore the memory attributes.   So if
>> there's a reasonable condition here, then we can use it and just ignore the
>> attribute.
>> 
>> Does the attribute on a partial load/store indicate the potential load/store
>> size or does it indicate the actual known load/store size. If the former, then
>> we probably need to treat it as a may-read/may-write kind of reference.
>
> There's no way to distinguish a partial vs. non-partial MEM on RTL and
> while without the bogus MEM_ATTR the alias oracle pieces that
> miscompiled the original case are fended off we still see the load/store
> as full given they have a mode with a size - that for example means
> that DSE can elide a previous store to a masked part.  Eventually
> that's fended off by using an UNSPEC, but whether the RTL IL has
> the correct semantics is questionable.
>
> That said, I did propose scrapping the MEM_EXPR which I think is
> the correct thing to do unless we want to put a CALL_EXPR into it
> (nothing would use that at the moment) or re-do MEM_EXPR and instead
> have an ao_ref (or sth slightly more complete) instead of the current
> MEM_ATTRs - but that would be a lot of work.
>
> This leaves the question wrt. semantics of for example x86 mask_store:
>
> (insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>                 (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> 
> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>         (unspec:V4DF [
>                 (reg:V4DI 104 [ mask__16.8 ])
>                 (reg:V4DF 105 [ vect_cst__42 ])
>                 (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
>                         (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) 
> double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
>             ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
>      (nil))
>
> it uses a read-modify-write which makes it safe for DSE.  mask_load
> looks like
>
> (insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
>         (unspec:V4DF [
>                 (reg:V4DI 114 [ mask__8.8 ])
>                 (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
>                         (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) 
> double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
>             ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
>      (nil))
>
> both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
> MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
> are _not_ partial.  That means the disambiguation against a store
> to an object that's smaller than V4DF is still possible.
> Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
> at the mode.  As discussed using a BLKmode MEM _might_ be a way
> out but I didn't try what will happen then (patterns would need to
> be adjusted I guess).
>
> That said, I'm happy to commit the partial fix, scrapping the
> bogus MEM_EXPRs.
>
> OK for that?

Could we restrict it to cases where the offset+size might overstep the
bounds?  I.e. do the equivalent of:

      /* Drop the object if the new right end is not within its bounds.  */
      if (adjust_object && maybe_gt (offset + size, attrs.size))
	{
	  attrs.expr = NULL_TREE;
	  attrs.alias = 0;
	}

(from adjust_address_1, although it might be difficult to reuse that
code in this context).

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-27 15:45   ` Jeff Law
@ 2023-11-28  7:50     ` Richard Biener
  2023-11-28 10:31       ` Richard Sandiford
  2023-11-28 15:00       ` Jeff Law
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Biener @ 2023-11-28  7:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: Robin Dapp, gcc-patches, Li, Pan2

On Mon, 27 Nov 2023, Jeff Law wrote:

> 
> 
> On 11/27/23 05:39, Robin Dapp wrote:
> >> The easiest way to avoid running into the alias analysis problem is
> >> to scrap the MEM_EXPR when we expand the internal functions for
> >> partial loads/stores.  That avoids the disambiguation we run into
> >> which is realizing that we store to an object of less size as
> >> the size of the mode we appear to store.
> >>
> >> After the patch we see just
> >>
> >>    [1  S64 A32]
> >>
> >> so we preserve the alias set, the alignment and the size (the size
> >> is redundant if the MEM insn't BLKmode).  That's still not good
> >> in case the RTL alias oracle would implement the same
> >> disambiguation but it fends off the gimple one.
> >>
> >> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> >> and --param=vect-partial-vector-usage=1.
> > 
> > On riscv we're seeing a similar problem across the testsuite
> > and several execution failures as a result.  In the case I
> > looked at we move a scalar load upwards over a partial store
> > that aliases the load.
> > 
> > I independently arrived at the spot mentioned in
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
> > before knowing about the PR.
> > 
> > I can confirm that your RFC patch fixes at least two of the
> > failures,  I haven't checked the others but very likely
> > they are similar.
> FWIW, it should always be safe to ignore the memory attributes.   So if
> there's a reasonable condition here, then we can use it and just ignore the
> attribute.
> 
> Does the attribute on a partial load/store indicate the potential load/store
> size or does it indicate the actual known load/store size. If the former, then
> we probably need to treat it as a may-read/may-write kind of reference.

There's no way to distinguish a partial vs. non-partial MEM on RTL and
while without the bogus MEM_ATTR the alias oracle pieces that
miscompiled the original case are fended off we still see the load/store
as full given they have a mode with a size - that for example means
that DSE can elide a previous store to a masked part.  Eventually
that's fended off by using an UNSPEC, but whether the RTL IL has
the correct semantics is questionable.

That said, I did propose scrapping the MEM_EXPR which I think is
the correct thing to do unless we want to put a CALL_EXPR into it
(nothing would use that at the moment) or re-do MEM_EXPR and instead
have an ao_ref (or sth slightly more complete) instead of the current
MEM_ATTRs - but that would be a lot of work.

This leaves the question wrt. semantics of for example x86 mask_store:

(insn 23 22 24 5 (set (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
                (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) double> 
[(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
        (unspec:V4DF [
                (reg:V4DI 104 [ mask__16.8 ])
                (reg:V4DF 105 [ vect_cst__42 ])
                (mem:V4DF (plus:DI (reg/v/f:DI 106 [ x ])
                        (reg:DI 101 [ ivtmp.15 ])) [2 MEM <vector(4) 
double> [(double *)x_11(D) + ivtmp.15_33 * 1]+0 S32 A64])
            ] UNSPEC_MASKMOV)) "t.c":5:12 8523 {avx_maskstorepd256}
     (nil))

it uses a read-modify-write which makes it safe for DSE.  mask_load
looks like

(insn 28 27 29 6 (set (reg:V4DF 115 [ vect__7.11 ])
        (unspec:V4DF [
                (reg:V4DI 114 [ mask__8.8 ])
                (mem:V4DF (plus:DI (reg/v/f:DI 118 [ val ])
                        (reg:DI 103 [ ivtmp.29 ])) [2 MEM <vector(4) 
double> [(double *)val_13(D) + ivtmp.29_22 * 1]+0 S32 A64])
            ] UNSPEC_MASKMOV)) "t.c":5:17 8515 {avx_maskloadpd256}
     (nil))

both have (as operand of the UNSPEC) a MEM with V4DFmode (and a
MEM_EXPR with a similarly bougs MEM_EXPR) indicating the loads
are _not_ partial.  That means the disambiguation against a store
to an object that's smaller than V4DF is still possible.
Setting MEM_SIZE to UNKNOWN doesn't help - that just asks to look
at the mode.  As discussed using a BLKmode MEM _might_ be a way
out but I didn't try what will happen then (patterns would need to
be adjusted I guess).

That said, I'm happy to commit the partial fix, scrapping the
bogus MEM_EXPRs.

OK for that?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-11-27 12:39 ` Robin Dapp
@ 2023-11-27 15:45   ` Jeff Law
  2023-11-28  7:50     ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Law @ 2023-11-27 15:45 UTC (permalink / raw)
  To: Robin Dapp, Richard Biener, gcc-patches; +Cc: Li, Pan2



On 11/27/23 05:39, Robin Dapp wrote:
>> The easiest way to avoid running into the alias analysis problem is
>> to scrap the MEM_EXPR when we expand the internal functions for
>> partial loads/stores.  That avoids the disambiguation we run into
>> which is realizing that we store to an object of less size as
>> the size of the mode we appear to store.
>>
>> After the patch we see just
>>
>>    [1  S64 A32]
>>
>> so we preserve the alias set, the alignment and the size (the size
>> is redundant if the MEM insn't BLKmode).  That's still not good
>> in case the RTL alias oracle would implement the same
>> disambiguation but it fends off the gimple one.
>>
>> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
>> and --param=vect-partial-vector-usage=1.
> 
> On riscv we're seeing a similar problem across the testsuite
> and several execution failures as a result.  In the case I
> looked at we move a scalar load upwards over a partial store
> that aliases the load.
> 
> I independently arrived at the spot mentioned in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
> before knowing about the PR.
> 
> I can confirm that your RFC patch fixes at least two of the
> failures,  I haven't checked the others but very likely
> they are similar.
FWIW, it should always be safe to ignore the memory attributes.   So if 
there's a reasonable condition here, then we can use it and just ignore 
the attribute.

Does the attribute on a partial load/store indicate the potential 
load/store size or does it indicate the actual known load/store size. 
If the former, then we probably need to treat it as a may-read/may-write 
kind of reference.

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
       [not found] <20230621075019.7CA813858033@sourceware.org>
@ 2023-11-27 12:39 ` Robin Dapp
  2023-11-27 15:45   ` Jeff Law
  0 siblings, 1 reply; 15+ messages in thread
From: Robin Dapp @ 2023-11-27 12:39 UTC (permalink / raw)
  To: Richard Biener, gcc-patches; +Cc: rdapp.gcc, Li, Pan2

> The easiest way to avoid running into the alias analysis problem is
> to scrap the MEM_EXPR when we expand the internal functions for
> partial loads/stores.  That avoids the disambiguation we run into
> which is realizing that we store to an object of less size as
> the size of the mode we appear to store.
> 
> After the patch we see just
> 
>   [1  S64 A32]
> 
> so we preserve the alias set, the alignment and the size (the size
> is redundant if the MEM insn't BLKmode).  That's still not good
> in case the RTL alias oracle would implement the same
> disambiguation but it fends off the gimple one.
> 
> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> and --param=vect-partial-vector-usage=1.

On riscv we're seeing a similar problem across the testsuite
and several execution failures as a result.  In the case I
looked at we move a scalar load upwards over a partial store
that aliases the load.

I independently arrived at the spot mentioned in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237#c4
before knowing about the PR.

I can confirm that your RFC patch fixes at least two of the
failures,  I haven't checked the others but very likely
they are similar.

Regards
 Robin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-06-22  6:39   ` Richard Biener
@ 2023-06-24 14:32     ` Jeff Law
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2023-06-24 14:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches



On 6/22/23 00:39, Richard Biener wrote:


> 
> I suspect there's no way to specify the desired semantics?  OTOH
> code that looks at the MEM operand only and not the insn (which
> should have some UNSPEC wrapped) needs to be conservative, so maybe
> the alias code shouldn't assume that a (mem:V16SI ..) actually
> performs an access of the size of V16SI at the specified location?
I'm not aware of a way to express the semantics fully right now.  We'd 
need some way to indicate that the MEM is a partial and pass along the 
actual length.

We could do both through MEM_ATTRS with some work. For example we could 
declare that for vector modes full semantic information is carried in 
the MEM_ATTRS rather than by the mode itself.  So it falls into a space 
between how we currently think of something like V16SI and BLK.  The 
mode specifies a maximum size and how to interpret the elements.  But 
actual size and perhaps mask info would be found in MEM_ATTRS.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
  2023-06-21 15:29 ` Jeff Law
@ 2023-06-22  6:39   ` Richard Biener
  2023-06-24 14:32     ` Jeff Law
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Biener @ 2023-06-22  6:39 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, 21 Jun 2023, Jeff Law wrote:

> 
> 
> On 6/21/23 01:49, Richard Biener via Gcc-patches wrote:
> > The following addresses a miscompilation by RTL scheduling related
> > to the representation of masked stores.  For that we have
> > 
> > (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ]
> > [90])
> >                  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >                  <var_decl 0x7ffff6e28d80 b>)
> >                          (const_int -4 [0xfffffffffffffffc])))) [1 MEM
> >          <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
> >          (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
> >              (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
> >                      (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]
> >                      <var_decl 0x7ffff6e28d80 b>)
> >                              (const_int -4 [0xfffffffffffffffc])))) [1 MEM
> >                              <vector(16) int> [(int *)vectp_b.12_28]+0 S64
> >                              A32])
> > 
> > and specifically the memory attributes
> > 
> >    [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]
> > 
> > are problematic.  They tell us the instruction stores and reads a full
> > vector which it if course does not.  There isn't any good MEM_EXPR
> > we can use here (we lack a way to just specify a pointer and restrict
> > info for example), and since the MEMs have a vector mode it's
> > difficult in general as passes do not need to look at the memory
> > attributes at all.
> > 
> > The easiest way to avoid running into the alias analysis problem is
> > to scrap the MEM_EXPR when we expand the internal functions for
> > partial loads/stores.  That avoids the disambiguation we run into
> > which is realizing that we store to an object of less size as
> > the size of the mode we appear to store.
> > 
> > After the patch we see just
> > 
> >    [1  S64 A32]
> > 
> > so we preserve the alias set, the alignment and the size (the size
> > is redundant if the MEM insn't BLKmode).  That's still not good
> > in case the RTL alias oracle would implement the same
> > disambiguation but it fends off the gimple one.
> > 
> > This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> > and --param=vect-partial-vector-usage=1.
> > 
> > On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> > side we might instead want to use a BLKmode MEM?  Any better
> > ideas here?
> I'd expect that using BLKmode will fend off the RTL aliasing code.

I suspect there's no way to specify the desired semantics?  OTOH
code that looks at the MEM operand only and not the insn (which
should have some UNSPEC wrapped) needs to be conservative, so maybe
the alias code shouldn't assume that a (mem:V16SI ..) actually
performs an access of the size of V16SI at the specified location?

Anyway, any opinion on the actual patch?  It's enough to fix the
observed miscompilation.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
       [not found] <20230621074951.F3C3C3858433@sourceware.org>
@ 2023-06-21 15:29 ` Jeff Law
  2023-06-22  6:39   ` Richard Biener
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Law @ 2023-06-21 15:29 UTC (permalink / raw)
  To: Richard Biener, gcc-patches



On 6/21/23 01:49, Richard Biener via Gcc-patches wrote:
> The following addresses a miscompilation by RTL scheduling related
> to the representation of masked stores.  For that we have
> 
> (insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
>                  (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
>                          (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
>          (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
>              (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
>                      (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
>                              (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
> 
> and specifically the memory attributes
> 
>    [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]
> 
> are problematic.  They tell us the instruction stores and reads a full
> vector which it if course does not.  There isn't any good MEM_EXPR
> we can use here (we lack a way to just specify a pointer and restrict
> info for example), and since the MEMs have a vector mode it's
> difficult in general as passes do not need to look at the memory
> attributes at all.
> 
> The easiest way to avoid running into the alias analysis problem is
> to scrap the MEM_EXPR when we expand the internal functions for
> partial loads/stores.  That avoids the disambiguation we run into
> which is realizing that we store to an object of less size as
> the size of the mode we appear to store.
> 
> After the patch we see just
> 
>    [1  S64 A32]
> 
> so we preserve the alias set, the alignment and the size (the size
> is redundant if the MEM insn't BLKmode).  That's still not good
> in case the RTL alias oracle would implement the same
> disambiguation but it fends off the gimple one.
> 
> This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
> and --param=vect-partial-vector-usage=1.
> 
> On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
> side we might instead want to use a BLKmode MEM?  Any better
> ideas here?
I'd expect that using BLKmode will fend off the RTL aliasing code.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
@ 2023-06-21  7:49 Richard Biener
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Biener @ 2023-06-21  7:49 UTC (permalink / raw)
  To: gcc-patches

The following addresses a miscompilation by RTL scheduling related
to the representation of masked stores.  For that we have

(insn 38 35 39 3 (set (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
                (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
                        (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])
        (vec_merge:V16SI (reg:V16SI 20 xmm0 [118])
            (mem:V16SI (plus:DI (reg:DI 40 r12 [orig:90 _22 ] [90])
                    (const:DI (plus:DI (symbol_ref:DI ("b") [flags 0x2]  <var_decl 0x7ffff6e28d80 b>)
                            (const_int -4 [0xfffffffffffffffc])))) [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32])

and specifically the memory attributes

  [1 MEM <vector(16) int> [(int *)vectp_b.12_28]+0 S64 A32]

are problematic.  They tell us the instruction stores and reads a full
vector which it if course does not.  There isn't any good MEM_EXPR
we can use here (we lack a way to just specify a pointer and restrict
info for example), and since the MEMs have a vector mode it's
difficult in general as passes do not need to look at the memory
attributes at all.

The easiest way to avoid running into the alias analysis problem is
to scrap the MEM_EXPR when we expand the internal functions for
partial loads/stores.  That avoids the disambiguation we run into
which is realizing that we store to an object of less size as
the size of the mode we appear to store.

After the patch we see just

  [1  S64 A32]

so we preserve the alias set, the alignment and the size (the size
is redundant if the MEM insn't BLKmode).  That's still not good
in case the RTL alias oracle would implement the same
disambiguation but it fends off the gimple one.

This fixes gcc.dg/torture/pr58955-2.c when built with AVX512
and --param=vect-partial-vector-usage=1.

On the MEM_EXPR side we could use a CALL_EXPR and on the RTL
side we might instead want to use a BLKmode MEM?  Any better
ideas here?

Thanks,
Richard.

	PR middle-end/110237
	* internal-fn.cc (expand_partial_load_optab_fn): Clear
	MEM_EXPR and MEM_OFFSET.
	(expand_partial_store_optab_fn): Likewise.
---
 gcc/internal-fn.cc | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index c911ae790cb..2dc685e7d85 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2903,6 +2903,10 @@ expand_partial_load_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
   mem = expand_expr (rhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   gcc_assert (MEM_P (mem));
+  /* The built MEM_REF does not accurately reflect that the load
+     is only partial.  Clear it.  */
+  set_mem_expr (mem, NULL_TREE);
+  clear_mem_offset (mem);
   mask = expand_normal (maskt);
   target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   create_output_operand (&ops[0], target, TYPE_MODE (type));
@@ -2971,6 +2975,10 @@ expand_partial_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
   mem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
   gcc_assert (MEM_P (mem));
+  /* The built MEM_REF does not accurately reflect that the store
+     is only partial.  Clear it.  */
+  set_mem_expr (mem, NULL_TREE);
+  clear_mem_offset (mem);
   mask = expand_normal (maskt);
   reg = expand_normal (rhs);
   create_fixed_operand (&ops[0], mem);
-- 
2.35.3

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-11-29  7:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230621074956.1174B3858288@sourceware.org>
2023-06-26  8:29 ` [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores Hongtao Liu
2023-06-26  8:41   ` Richard Biener
     [not found] <20230621075019.7CA813858033@sourceware.org>
2023-11-27 12:39 ` Robin Dapp
2023-11-27 15:45   ` Jeff Law
2023-11-28  7:50     ` Richard Biener
2023-11-28 10:31       ` Richard Sandiford
2023-11-28 11:21         ` Richard Biener
2023-11-28 11:32           ` Richard Sandiford
2023-11-28 12:17             ` Richard Biener
2023-11-28 15:00       ` Jeff Law
2023-11-29  7:16         ` Richard Biener
     [not found] <20230621074951.F3C3C3858433@sourceware.org>
2023-06-21 15:29 ` Jeff Law
2023-06-22  6:39   ` Richard Biener
2023-06-24 14:32     ` Jeff Law
2023-06-21  7:49 Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).