[PATCH 0/3] lower more cases of memcpy [PR102125]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 0/3] lower more cases of memcpy [PR102125]
@ 2021-09-06 10:40 Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125] Richard Earnshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 10:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther, bernd.edlinger

This short patch series is designed to address some more cases where we
can usefully lower memcpy operations during gimple fold.  The current
code restricts this lowering to a maximum size of MOVE_MAX, ie the size
of a single integer register on the machine, but with modern architectures
this is likely too restrictive.  The motivating example is

uint64_t bar64(const uint8_t *rData1)
{
    uint64_t buffer;
    __builtin_memcpy(&buffer, rData1, sizeof(buffer));
    return buffer;
}

which on a 32-bit machine ends up with an inlined memcpy followed by a load
from the copied buffer.

The patch series is in three parts, although the middle patch is an
arm-specific tweak to handle unaligned 64-bit moves on more versions
of the Arm architecture.

Patch 1 relaxes slightly the restriction added by Bernd for PR 100106.
That patch was intended to prevent a situation such as (subreg:DI
(mem:SI (A32)) was forming a misaligned DI mem when the underlying mem
was naturally aligned.  However, if the underlying mem is already unaligned
for the inner mode, forming a new unaligned mem shouldn't be a problem, so
permit this case.

Patch 2 addresses an issue in the arm backend.  Currently movmisaligndi
only supports vector targets.  This patch reworks the code so that
the pattern can work on any architecture version that supports misaligned
accesses.

Patch 3 then relaxes the gimple fold simplification of memcpy to allow
larger memcpy operations to be folded away, provided that the total size
is less than MOVE_MAX * MOVE_RATIO and provided that the machine has a
suitable SET insn for the appropriate integer mode.

With these three changes, the testcase above now optimizes to

        mov     r3, r0
        ldr     r0, [r0]        @ unaligned
        ldr     r1, [r3, #4]    @ unaligned
        bx      lr
R.

Richard Earnshaw (3):
  rtl: allow forming subregs of already unaligned mems [PR102125]
  arm: expand handling of movmisalign for DImode [PR102125]
  gimple: allow more folding of memcpy [PR102125]

 gcc/config/arm/arm.md        | 34 ++++++++++++++++++++++++++++++++++
 gcc/config/arm/vec-common.md |  4 ++--
 gcc/gimple-fold.c            | 16 +++++++++++-----
 gcc/simplify-rtx.c           |  6 +++++-
 4 files changed, 52 insertions(+), 8 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 10:40 [PATCH 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
@ 2021-09-06 10:40 ` Richard Earnshaw
  2021-09-06 10:58   ` Richard Biener
  2021-09-06 10:40 ` [PATCH 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
  2 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 10:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther, bernd.edlinger

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]


GCC was recently changed to prevent simplify_subreg from simplifying
a subreg of a mem when the mode of the new mem would have stricter alignment
constraints than the inner mem already has when the target requires
STRICT_ALIGNMENT.

However, such targets may have specialist patterns that can handle
unaligned accesses and this restriction turns out to be unduly restrictive.
So limit this restriction to only apply when the inner mem is naturally
aligned to the inner mode.

gcc/ChangeLog:

	PR target/102125
	* simplify-rtx.c (simplify_context::simplify_subreg): Allow
	simplifying (subreg (mem())) when the inner mem is already
	misaligned for its type.
---
 gcc/simplify-rtx.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-rtl-allow-forming-subregs-of-already-unaligned-mems-.patch --]
[-- Type: text/x-patch; name="0001-rtl-allow-forming-subregs-of-already-unaligned-mems-.patch", Size: 821 bytes --]

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index ebad5cb5a79..1baa50cb1b9 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -7317,7 +7317,11 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op,
          have instruction to move the whole thing.  */
       && (! MEM_VOLATILE_P (op)
 	  || ! have_insn_for (SET, innermode))
-      && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT (outermode))
+      /* On STRICT_ALIGNMENT targets, don't allow the alignment to be increased
+	 if the inner object is already naturally aligned.  */
+      && !(STRICT_ALIGNMENT
+	   && MEM_ALIGN (op) >= GET_MODE_ALIGNMENT (innermode)
+	   && MEM_ALIGN (op) < GET_MODE_ALIGNMENT (outermode))
       && known_le (outersize, innersize))
     return adjust_address_nv (op, outermode, byte);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 10:40 ` [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125] Richard Earnshaw
@ 2021-09-06 10:58   ` Richard Biener
  2021-09-06 11:08     ` Richard Earnshaw
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2021-09-06 10:58 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches, Bernd Edlinger

On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>
>
> GCC was recently changed to prevent simplify_subreg from simplifying
> a subreg of a mem when the mode of the new mem would have stricter alignment
> constraints than the inner mem already has when the target requires
> STRICT_ALIGNMENT.
>
> However, such targets may have specialist patterns that can handle
> unaligned accesses and this restriction turns out to be unduly restrictive.
> So limit this restriction to only apply when the inner mem is naturally
> aligned to the inner mode.

Hmm, I think this can end up either generating wrong code or
recog fails.  The specific combination of alignment and mode of 'op'
has been validated to be supported, replacing the mode with sth
else would need re-validation of the combination.  I'm not sure
we can for example just query movmisalign support here and
hope for LRA to reload the mem with that.

So - where do you run into this?  Is it possible to catch the
situation on a higher level where more context as in the whole insn
is visible?

Thanks,
Richard.

>
> gcc/ChangeLog:
>
>         PR target/102125
>         * simplify-rtx.c (simplify_context::simplify_subreg): Allow
>         simplifying (subreg (mem())) when the inner mem is already
>         misaligned for its type.
> ---
>  gcc/simplify-rtx.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 10:58   ` Richard Biener
@ 2021-09-06 11:08     ` Richard Earnshaw
  2021-09-06 11:13       ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 11:08 UTC (permalink / raw)
  To: Richard Biener, Richard Earnshaw; +Cc: Bernd Edlinger, GCC Patches



On 06/09/2021 11:58, Richard Biener via Gcc-patches wrote:
> On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>>
>> GCC was recently changed to prevent simplify_subreg from simplifying
>> a subreg of a mem when the mode of the new mem would have stricter alignment
>> constraints than the inner mem already has when the target requires
>> STRICT_ALIGNMENT.
>>
>> However, such targets may have specialist patterns that can handle
>> unaligned accesses and this restriction turns out to be unduly restrictive.
>> So limit this restriction to only apply when the inner mem is naturally
>> aligned to the inner mode.
> 
> Hmm, I think this can end up either generating wrong code or
> recog fails.  The specific combination of alignment and mode of 'op'
> has been validated to be supported, replacing the mode with sth
> else would need re-validation of the combination.  I'm not sure
> we can for example just query movmisalign support here and
> hope for LRA to reload the mem with that.
> 
> So - where do you run into this?  Is it possible to catch the
> situation on a higher level where more context as in the whole insn
> is visible?

I ran into it with patch 2 of this series when calling gen_highpart on a 
misaligned mem.  IIRC gen_highpart would end up returning (subreg:SI 
(mem:DI (addr [A8])) 4), while gen_lowpart would simplify the operation 
to (mem:SI (addr [A8])) as expected.

(subreg:SI (mem:DI (addr [A8])) 4) is really problematic, because it's 
not a memory_operand (from the manual: it will get reloaded into a 
register later on).  But that's no good here, I don't want this 
reloading into a wide register later, I need it to be narrowed to the 
component part now.

R.

> 
> Thanks,
> Richard.
> 
>>
>> gcc/ChangeLog:
>>
>>          PR target/102125
>>          * simplify-rtx.c (simplify_context::simplify_subreg): Allow
>>          simplifying (subreg (mem())) when the inner mem is already
>>          misaligned for its type.
>> ---
>>   gcc/simplify-rtx.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 11:08     ` Richard Earnshaw
@ 2021-09-06 11:13       ` Richard Biener
  2021-09-06 11:23         ` Richard Earnshaw
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Biener @ 2021-09-06 11:13 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Richard Earnshaw, Bernd Edlinger, GCC Patches

On Mon, Sep 6, 2021 at 1:08 PM Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 06/09/2021 11:58, Richard Biener via Gcc-patches wrote:
> > On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
> >>
> >>
> >> GCC was recently changed to prevent simplify_subreg from simplifying
> >> a subreg of a mem when the mode of the new mem would have stricter alignment
> >> constraints than the inner mem already has when the target requires
> >> STRICT_ALIGNMENT.
> >>
> >> However, such targets may have specialist patterns that can handle
> >> unaligned accesses and this restriction turns out to be unduly restrictive.
> >> So limit this restriction to only apply when the inner mem is naturally
> >> aligned to the inner mode.
> >
> > Hmm, I think this can end up either generating wrong code or
> > recog fails.  The specific combination of alignment and mode of 'op'
> > has been validated to be supported, replacing the mode with sth
> > else would need re-validation of the combination.  I'm not sure
> > we can for example just query movmisalign support here and
> > hope for LRA to reload the mem with that.
> >
> > So - where do you run into this?  Is it possible to catch the
> > situation on a higher level where more context as in the whole insn
> > is visible?
>
> I ran into it with patch 2 of this series when calling gen_highpart on a
> misaligned mem.  IIRC gen_highpart would end up returning (subreg:SI
> (mem:DI (addr [A8])) 4), while gen_lowpart would simplify the operation
> to (mem:SI (addr [A8])) as expected.
>
> (subreg:SI (mem:DI (addr [A8])) 4) is really problematic, because it's
> not a memory_operand (from the manual: it will get reloaded into a
> register later on).  But that's no good here, I don't want this
> reloading into a wide register later, I need it to be narrowed to the
> component part now.

So maybe calling gen_highpart is not what you want then?
adjust_address is IIRC what one uses to offset a MEM and change
its mode.

Richard.

>
> R.
>
> >
> > Thanks,
> > Richard.
> >
> >>
> >> gcc/ChangeLog:
> >>
> >>          PR target/102125
> >>          * simplify-rtx.c (simplify_context::simplify_subreg): Allow
> >>          simplifying (subreg (mem())) when the inner mem is already
> >>          misaligned for its type.
> >> ---
> >>   gcc/simplify-rtx.c | 6 +++++-
> >>   1 file changed, 5 insertions(+), 1 deletion(-)
> >>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 11:13       ` Richard Biener
@ 2021-09-06 11:23         ` Richard Earnshaw
  2021-09-06 12:01           ` Richard Biener
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 11:23 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Earnshaw, Bernd Edlinger, GCC Patches



On 06/09/2021 12:13, Richard Biener wrote:
> On Mon, Sep 6, 2021 at 1:08 PM Richard Earnshaw
> <Richard.Earnshaw@foss.arm.com> wrote:
>>
>>
>>
>> On 06/09/2021 11:58, Richard Biener via Gcc-patches wrote:
>>> On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>>>>
>>>>
>>>> GCC was recently changed to prevent simplify_subreg from simplifying
>>>> a subreg of a mem when the mode of the new mem would have stricter alignment
>>>> constraints than the inner mem already has when the target requires
>>>> STRICT_ALIGNMENT.
>>>>
>>>> However, such targets may have specialist patterns that can handle
>>>> unaligned accesses and this restriction turns out to be unduly restrictive.
>>>> So limit this restriction to only apply when the inner mem is naturally
>>>> aligned to the inner mode.
>>>
>>> Hmm, I think this can end up either generating wrong code or
>>> recog fails.  The specific combination of alignment and mode of 'op'
>>> has been validated to be supported, replacing the mode with sth
>>> else would need re-validation of the combination.  I'm not sure
>>> we can for example just query movmisalign support here and
>>> hope for LRA to reload the mem with that.
>>>
>>> So - where do you run into this?  Is it possible to catch the
>>> situation on a higher level where more context as in the whole insn
>>> is visible?
>>
>> I ran into it with patch 2 of this series when calling gen_highpart on a
>> misaligned mem.  IIRC gen_highpart would end up returning (subreg:SI
>> (mem:DI (addr [A8])) 4), while gen_lowpart would simplify the operation
>> to (mem:SI (addr [A8])) as expected.
>>
>> (subreg:SI (mem:DI (addr [A8])) 4) is really problematic, because it's
>> not a memory_operand (from the manual: it will get reloaded into a
>> register later on).  But that's no good here, I don't want this
>> reloading into a wide register later, I need it to be narrowed to the
>> component part now.
> 
> So maybe calling gen_highpart is not what you want then?
> adjust_address is IIRC what one uses to offset a MEM and change
> its mode.

It was based on looking at the patch for PR 100106 
(r12-163-c33db31d9ad9).  That patch added the MEM_ALIGN constraint when 
previously there was none here and my call would have been simplified. 
Are you saying that GCC was always wrong in this respect?  All I've done 
was to tightened the check that Bernd added.

R.

> 
> Richard.
> 
>>
>> R.
>>
>>>
>>> Thanks,
>>> Richard.
>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>           PR target/102125
>>>>           * simplify-rtx.c (simplify_context::simplify_subreg): Allow
>>>>           simplifying (subreg (mem())) when the inner mem is already
>>>>           misaligned for its type.
>>>> ---
>>>>    gcc/simplify-rtx.c | 6 +++++-
>>>>    1 file changed, 5 insertions(+), 1 deletion(-)
>>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125]
  2021-09-06 11:23         ` Richard Earnshaw
@ 2021-09-06 12:01           ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2021-09-06 12:01 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Richard Earnshaw, Bernd Edlinger, GCC Patches

On Mon, Sep 6, 2021 at 1:23 PM Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
>
>
>
> On 06/09/2021 12:13, Richard Biener wrote:
> > On Mon, Sep 6, 2021 at 1:08 PM Richard Earnshaw
> > <Richard.Earnshaw@foss.arm.com> wrote:
> >>
> >>
> >>
> >> On 06/09/2021 11:58, Richard Biener via Gcc-patches wrote:
> >>> On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
> >>>>
> >>>>
> >>>> GCC was recently changed to prevent simplify_subreg from simplifying
> >>>> a subreg of a mem when the mode of the new mem would have stricter alignment
> >>>> constraints than the inner mem already has when the target requires
> >>>> STRICT_ALIGNMENT.
> >>>>
> >>>> However, such targets may have specialist patterns that can handle
> >>>> unaligned accesses and this restriction turns out to be unduly restrictive.
> >>>> So limit this restriction to only apply when the inner mem is naturally
> >>>> aligned to the inner mode.
> >>>
> >>> Hmm, I think this can end up either generating wrong code or
> >>> recog fails.  The specific combination of alignment and mode of 'op'
> >>> has been validated to be supported, replacing the mode with sth
> >>> else would need re-validation of the combination.  I'm not sure
> >>> we can for example just query movmisalign support here and
> >>> hope for LRA to reload the mem with that.
> >>>
> >>> So - where do you run into this?  Is it possible to catch the
> >>> situation on a higher level where more context as in the whole insn
> >>> is visible?
> >>
> >> I ran into it with patch 2 of this series when calling gen_highpart on a
> >> misaligned mem.  IIRC gen_highpart would end up returning (subreg:SI
> >> (mem:DI (addr [A8])) 4), while gen_lowpart would simplify the operation
> >> to (mem:SI (addr [A8])) as expected.
> >>
> >> (subreg:SI (mem:DI (addr [A8])) 4) is really problematic, because it's
> >> not a memory_operand (from the manual: it will get reloaded into a
> >> register later on).  But that's no good here, I don't want this
> >> reloading into a wide register later, I need it to be narrowed to the
> >> component part now.
> >
> > So maybe calling gen_highpart is not what you want then?
> > adjust_address is IIRC what one uses to offset a MEM and change
> > its mode.
>
> It was based on looking at the patch for PR 100106
> (r12-163-c33db31d9ad9).  That patch added the MEM_ALIGN constraint when
> previously there was none here and my call would have been simplified.
> Are you saying that GCC was always wrong in this respect?

Yes.

> All I've done was to tightened the check that Bernd added.

The check was supposed to verify that the generated MEM is
aligned according to its mode.  If you'd have a misaligned
valid DImode according to movmisalign the target might not
have a move insn to move a misaligned SImode so tightening
the check to more closely match the originally motivating testcase
isn't correct IMHO.

Richard.

>
> R.
>
> >
> > Richard.
> >
> >>
> >> R.
> >>
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>>           PR target/102125
> >>>>           * simplify-rtx.c (simplify_context::simplify_subreg): Allow
> >>>>           simplifying (subreg (mem())) when the inner mem is already
> >>>>           misaligned for its type.
> >>>> ---
> >>>>    gcc/simplify-rtx.c | 6 +++++-
> >>>>    1 file changed, 5 insertions(+), 1 deletion(-)
> >>>>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/3] arm: expand handling of movmisalign for DImode [PR102125]
  2021-09-06 10:40 [PATCH 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125] Richard Earnshaw
@ 2021-09-06 10:40 ` Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
  2 siblings, 0 replies; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 10:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 483 bytes --]


DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.

gcc/ChangeLog:

	PR target/102125
	* config/arm/arm.md (movmisaligndi): New define_expand.
	* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.
---
 gcc/config/arm/arm.md        | 34 ++++++++++++++++++++++++++++++++++
 gcc/config/arm/vec-common.md |  4 ++--
 2 files changed, 36 insertions(+), 2 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-arm-expand-handling-of-movmisalign-for-DImode-PR1021.patch --]
[-- Type: text/x-patch; name="0002-arm-expand-handling-of-movmisalign-for-DImode-PR1021.patch", Size: 2165 bytes --]

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5d3f21b91c4..f66ed7da012 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12617,6 +12617,40 @@ (define_expand "copysigndf3"
   }"
 )
 
+;; movmisalign for DImode
+(define_expand "movmisaligndi"
+  [(match_operand:DI 0 "general_operand")
+   (match_operand:DI 1 "general_operand")]
+  "unaligned_access"
+{
+  /* Avoid breaking up an aligned load or store, this avoids problems if
+     that operand might be volatile.  */
+  if (MEM_P (operands[0])
+      && MEM_ALIGN (operands[0]) >= GET_MODE_ALIGNMENT (DImode))
+    {
+      rtx tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_movmisaligndi (tmp, operands[1]));
+      emit_insn (gen_movdi (operands[0], tmp));
+      DONE;
+    }
+  else if (MEM_P (operands[1])
+      && MEM_ALIGN (operands[1]) >= GET_MODE_ALIGNMENT (DImode))
+    {
+      rtx tmp = gen_reg_rtx (DImode);
+      emit_insn (gen_movdi (tmp, operands[1]));
+      operands[1] = tmp;
+    }
+
+  rtx lo_op0 = gen_lowpart (SImode, operands[0]);
+  rtx lo_op1 = gen_lowpart (SImode, operands[1]);
+  rtx hi_op0 = gen_highpart_mode (SImode, DImode, operands[0]);
+  rtx hi_op1 = gen_highpart_mode (SImode, DImode, operands[1]);
+
+  emit_insn (gen_movmisalignsi (lo_op0, lo_op1));
+  emit_insn (gen_movmisalignsi (hi_op0, hi_op1));
+  DONE;
+})
+
 ;; movmisalign patterns for HImode and SImode.
 (define_expand "movmisalign<mode>"
   [(match_operand:HSI 0 "general_operand")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 68de4f0f943..e71d9b3811f 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -281,8 +281,8 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
 })
 
 (define_expand "movmisalign<mode>"
- [(set (match_operand:VDQX 0 "neon_perm_struct_or_reg_operand")
-	(unspec:VDQX [(match_operand:VDQX 1 "neon_perm_struct_or_reg_operand")]
+ [(set (match_operand:VDQ 0 "neon_perm_struct_or_reg_operand")
+	(unspec:VDQ [(match_operand:VDQ 1 "neon_perm_struct_or_reg_operand")]
 	 UNSPEC_MISALIGNED_ACCESS))]
  "ARM_HAVE_<MODE>_LDST && !BYTES_BIG_ENDIAN
   && unaligned_access && !TARGET_REALLY_IWMMXT"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/3] gimple: allow more folding of memcpy [PR102125]
  2021-09-06 10:40 [PATCH 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125] Richard Earnshaw
  2021-09-06 10:40 ` [PATCH 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
@ 2021-09-06 10:40 ` Richard Earnshaw
  2021-09-06 10:51   ` Richard Biener
  2 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2021-09-06 10:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]


The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations.  So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.

Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.

On Arm this now permits optimizing

uint64_t bar64(const uint8_t *rData1)
{
    uint64_t buffer;
    memcpy(&buffer, rData1, sizeof(buffer));
    return buffer;
}

from
        ldr     r2, [r0]        @ unaligned
        sub     sp, sp, #8
        ldr     r3, [r0, #4]    @ unaligned
        strd    r2, [sp]
        ldrd    r0, [sp]
        add     sp, sp, #8

to
        mov     r3, r0
        ldr     r0, [r0]        @ unaligned
        ldr     r1, [r3, #4]    @ unaligned

PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations

gcc/ChangeLog:

	PR target/102125
	* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
	memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
---
 gcc/gimple-fold.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-gimple-allow-more-folding-of-memcpy-PR102125.patch --]
[-- Type: text/x-patch; name="0003-gimple-allow-more-folding-of-memcpy-PR102125.patch", Size: 2038 bytes --]

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3f2c176cff6..d9ffb5006f5 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -67,6 +67,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vector-builder.h"
 #include "tree-ssa-strlen.h"
 #include "varasm.h"
+#include "memmodel.h"
+#include "optabs.h"
 
 enum strlen_range_kind {
   /* Compute the exact constant string length.  */
@@ -957,14 +959,17 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	= build_int_cst (build_pointer_type_for_mode (char_type_node,
 						      ptr_mode, true), 0);
 
-      /* If we can perform the copy efficiently with first doing all loads
-         and then all stores inline it that way.  Currently efficiently
-	 means that we can load all the memory into a single integer
-	 register which is what MOVE_MAX gives us.  */
+      /* If we can perform the copy efficiently with first doing all loads and
+	 then all stores inline it that way.  Currently efficiently means that
+	 we can load all the memory with a single set operation and that the
+	 total size is less than MOVE_MAX * MOVE_RATIO.  */
       src_align = get_pointer_alignment (src);
       dest_align = get_pointer_alignment (dest);
       if (tree_fits_uhwi_p (len)
-	  && compare_tree_int (len, MOVE_MAX) <= 0
+	  && (compare_tree_int
+	      (len, (MOVE_MAX
+		     * MOVE_RATIO (optimize_function_for_size_p (cfun))))
+	      <= 0)
 	  /* FIXME: Don't transform copies from strings with known length.
 	     Until GCC 9 this prevented a case in gcc.dg/strlenopt-8.c
 	     from being handled, and the case was XFAILed for that reason.
@@ -1000,6 +1005,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
 	      if (type
 		  && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
 		  && GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
+		  && have_insn_for (SET, mode)
 		  /* If the destination pointer is not aligned we must be able
 		     to emit an unaligned store.  */
 		  && (dest_align >= GET_MODE_ALIGNMENT (mode)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] gimple: allow more folding of memcpy [PR102125]
  2021-09-06 10:40 ` [PATCH 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
@ 2021-09-06 10:51   ` Richard Biener
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Biener @ 2021-09-06 10:51 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

On Mon, Sep 6, 2021 at 12:40 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>
>
> The current restriction on folding memcpy to a single element of size
> MOVE_MAX is excessively cautious on most machines and limits some
> significant further optimizations.  So relax the restriction provided
> the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
> insn exists for moving the value into machine registers.
>
> Note that there were already checks in place for having misaligned
> move operations when one or more of the operands were unaligned.
>
> On Arm this now permits optimizing
>
> uint64_t bar64(const uint8_t *rData1)
> {
>     uint64_t buffer;
>     memcpy(&buffer, rData1, sizeof(buffer));
>     return buffer;
> }
>
> from
>         ldr     r2, [r0]        @ unaligned
>         sub     sp, sp, #8
>         ldr     r3, [r0, #4]    @ unaligned
>         strd    r2, [sp]
>         ldrd    r0, [sp]
>         add     sp, sp, #8
>
> to
>         mov     r3, r0
>         ldr     r0, [r0]        @ unaligned
>         ldr     r1, [r3, #4]    @ unaligned
>
> PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         PR target/102125
>         * gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
>         memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
> ---
>  gcc/gimple-fold.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-06 12:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-06 10:40 [PATCH 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-06 10:40 ` [PATCH 1/3] rtl: allow forming subregs of already unaligned mems [PR102125] Richard Earnshaw
2021-09-06 10:58   ` Richard Biener
2021-09-06 11:08     ` Richard Earnshaw
2021-09-06 11:13       ` Richard Biener
2021-09-06 11:23         ` Richard Earnshaw
2021-09-06 12:01           ` Richard Biener
2021-09-06 10:40 ` [PATCH 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
2021-09-06 10:40 ` [PATCH 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
2021-09-06 10:51   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).