public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores
@ 2024-06-18 18:14 Siarhei Volkau
  2024-06-19 12:19 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 4+ messages in thread
From: Siarhei Volkau @ 2024-06-18 18:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: Siarhei Volkau

If the address register is dead after load/store operation it looks
beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
at least if optimizing for size.

Changes v1 -> v2:
 - switching to peephole2 approach
 - added test case

gcc/ChangeLog:

        * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
        (peephole2 to rewrite DI/DF store): New.
        (thumb1_movdi_insn): Handle overlapped regs ldmia case.
        (thumb_movdf_insn): Likewise.

	* config/arm/iterators.md (DIDF): New.

gcc/testsuite:

        * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
---
 gcc/config/arm/iterators.md                   |  3 +++
 gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
 .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
 3 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 8d066fcf05d..09046bff83b 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
 ;; A list of the 32bit and 64bit integer modes
 (define_mode_iterator SIDI [SI DI])
 
+;; A list of the 64bit modes for thumb1.
+(define_mode_iterator DIDF [DI DF])
+
 ;; A list of atomic compare and swap success return modes
 (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
 
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index d7074b43f60..ed4b706773a 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
       gcc_assert (TARGET_HAVE_MOVT);
       return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
     case 4:
+      if (reg_overlap_mentioned_p (operands[0], operands[1]))
+	return \"ldmia\\t%m1, {%0, %H0}\";
       return \"ldmia\\t%1, {%0, %H0}\";
     case 5:
       return \"stmia\\t%0, {%1, %H1}\";
@@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
 	return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
       return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
     case 1:
+      if (reg_overlap_mentioned_p (operands[0], operands[1]))
+	return \"ldmia\\t%m1, {%0, %H0}\";
       return \"ldmia\\t%1, {%0, %H0}\";
     case 2:
       return \"stmia\\t%0, {%1, %H1}\";
@@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
    (set_attr "conds" "clob")
    (set_attr "type" "multiple")]
 )
-\f
+
+;; match patterns usable by ldmia/stmia
+(define_peephole2
+  [(set (match_operand:DIDF 0 "low_register_operand" "")
+	(mem:DIDF (match_operand:SI 1 "low_register_operand")))]
+  "TARGET_THUMB1
+   && (peep2_reg_dead_p (1, operands[1])
+       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
+  [(set (match_dup 0)
+	(mem:DIDF (post_inc:SI (match_dup 1))))]
+  ""
+)
+
+(define_peephole2
+  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
+	(match_operand:DIDF 0 "low_register_operand" ""))]
+  "TARGET_THUMB1
+   && peep2_reg_dead_p (1, operands[1])"
+  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
+	(match_dup 0))]
+  ""
+)
diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
new file mode 100644
index 00000000000..167fa9ec876
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mthumb -Os" }  */
+/* { dg-require-effective-target arm_thumb1_ok } */
+
+void copy_df(double *dst, const double *src)
+{
+    *dst = *src;
+}
+
+void copy_di(unsigned long long *dst, const unsigned long long *src)
+{
+    *dst = *src;
+}
+
+/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
+/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
-- 
2.45.2


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores
  2024-06-18 18:14 [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores Siarhei Volkau
@ 2024-06-19 12:19 ` Richard Earnshaw (lists)
  2024-06-19 15:11   ` Siarhei Volkau
  0 siblings, 1 reply; 4+ messages in thread
From: Richard Earnshaw (lists) @ 2024-06-19 12:19 UTC (permalink / raw)
  To: Siarhei Volkau, gcc-patches

On 18/06/2024 19:14, Siarhei Volkau wrote:
> If the address register is dead after load/store operation it looks
> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> at least if optimizing for size.
> 
> Changes v1 -> v2:
>  - switching to peephole2 approach
>  - added test case
> 
> gcc/ChangeLog:
> 
>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>         (peephole2 to rewrite DI/DF store): New.
>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>         (thumb_movdf_insn): Likewise.
> 
> 	* config/arm/iterators.md (DIDF): New.
> 
> gcc/testsuite:
> 
>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> 
> Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
> ---
>  gcc/config/arm/iterators.md                   |  3 +++
>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>  3 files changed, 45 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 8d066fcf05d..09046bff83b 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>  ;; A list of the 32bit and 64bit integer modes
>  (define_mode_iterator SIDI [SI DI])
>  
> +;; A list of the 64bit modes for thumb1.
> +(define_mode_iterator DIDF [DI DF])
> +
>  ;; A list of atomic compare and swap success return modes
>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>  
> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> index d7074b43f60..ed4b706773a 100644
> --- a/gcc/config/arm/thumb1.md
> +++ b/gcc/config/arm/thumb1.md
> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>        gcc_assert (TARGET_HAVE_MOVT);
>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>      case 4:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";

See below for why I don't think this is a case we need to consider here.

>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 5:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>  	return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>      case 1:
> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> +	return \"ldmia\\t%m1, {%0, %H0}\";
>        return \"ldmia\\t%1, {%0, %H0}\";
>      case 2:
>        return \"stmia\\t%0, {%1, %H1}\";
> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>     (set_attr "conds" "clob")
>     (set_attr "type" "multiple")]
>  )
> -\f
> +
> +;; match patterns usable by ldmia/stmia
> +(define_peephole2
> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> +	(mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> +  "TARGET_THUMB1
> +   && (peep2_reg_dead_p (1, operands[1])
> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"

I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).

> +  [(set (match_dup 0)
> +	(mem:DIDF (post_inc:SI (match_dup 1))))]
> +  ""

This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:

     operands[1] = change_address (operands[1], VOIDmode,
				   gen_rtx_POST_INC (SImode,
						     XEXP (operands[1], 0)));

> +)
> +
> +(define_peephole2
> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> +	(match_operand:DIDF 0 "low_register_operand" ""))]
> +  "TARGET_THUMB1
> +   && peep2_reg_dead_p (1, operands[1])"
> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> +	(match_dup 0))]

There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:

  stmia r0!, {r0, r1}

is ok, but

  stmia r1!, {r0, r1}

is not.

> +  ""
> +)
> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> new file mode 100644
> index 00000000000..167fa9ec876
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mthumb -Os" }  */
> +/* { dg-require-effective-target arm_thumb1_ok } */
> +
> +void copy_df(double *dst, const double *src)
> +{
> +    *dst = *src;
> +}
> +
> +void copy_di(unsigned long long *dst, const unsigned long long *src)
> +{
> +    *dst = *src;
> +}
> +
> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */

R.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores
  2024-06-19 12:19 ` Richard Earnshaw (lists)
@ 2024-06-19 15:11   ` Siarhei Volkau
  2024-06-19 15:32     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 4+ messages in thread
From: Siarhei Volkau @ 2024-06-19 15:11 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com>:
>
> On 18/06/2024 19:14, Siarhei Volkau wrote:
> > If the address register is dead after load/store operation it looks
> > beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
> > at least if optimizing for size.
> >
> > Changes v1 -> v2:
> >  - switching to peephole2 approach
> >  - added test case
> >
> > gcc/ChangeLog:
> >
> >         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
> >         (peephole2 to rewrite DI/DF store): New.
> >         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
> >         (thumb_movdf_insn): Likewise.
> >
> >       * config/arm/iterators.md (DIDF): New.
> >
> > gcc/testsuite:
> >
> >         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
> >
> > Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
> > ---
> >  gcc/config/arm/iterators.md                   |  3 +++
> >  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
> >  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
> >  3 files changed, 45 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> >
> > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> > index 8d066fcf05d..09046bff83b 100644
> > --- a/gcc/config/arm/iterators.md
> > +++ b/gcc/config/arm/iterators.md
> > @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
> >  ;; A list of the 32bit and 64bit integer modes
> >  (define_mode_iterator SIDI [SI DI])
> >
> > +;; A list of the 64bit modes for thumb1.
> > +(define_mode_iterator DIDF [DI DF])
> > +
> >  ;; A list of atomic compare and swap success return modes
> >  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
> >
> > diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
> > index d7074b43f60..ed4b706773a 100644
> > --- a/gcc/config/arm/thumb1.md
> > +++ b/gcc/config/arm/thumb1.md
> > @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
> >        gcc_assert (TARGET_HAVE_MOVT);
> >        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
> >      case 4:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
>
> See below for why I don't think this is a case we need to consider here.
>
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 5:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
> >       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
> >        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
> >      case 1:
> > +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
> > +     return \"ldmia\\t%m1, {%0, %H0}\";
> >        return \"ldmia\\t%1, {%0, %H0}\";
> >      case 2:
> >        return \"stmia\\t%0, {%1, %H1}\";
> > @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
> >     (set_attr "conds" "clob")
> >     (set_attr "type" "multiple")]
> >  )
> > -
> > +
> > +;; match patterns usable by ldmia/stmia
> > +(define_peephole2
> > +  [(set (match_operand:DIDF 0 "low_register_operand" "")
> > +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
> > +  "TARGET_THUMB1
> > +   && (peep2_reg_dead_p (1, operands[1])
> > +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>
> I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).
>

The first condition here is for ldmia cases with writeback, the second
condition is for cases without writeback. I falsely thought that base
register has to be the last register in the list similarly like for
stmia it has to be first.
So, I have to reject the no-writeback case there because it emits
incorrect RTL, this is clear now.

However, to not miss possible optimization, no-writeback cases check
might be moved to instruction itself as it was done in v1 patch
(incorrect instruction length will appear again then).
Any objections on that?

> > +  [(set (match_dup 0)
> > +     (mem:DIDF (post_inc:SI (match_dup 1))))]
> > +  ""
>
> This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:
>
>      operands[1] = change_address (operands[1], VOIDmode,
>                                    gen_rtx_POST_INC (SImode,
>                                                      XEXP (operands[1], 0)));
>

Got it, will rewrite, thanks.

> > +)
> > +
> > +(define_peephole2
> > +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
> > +     (match_operand:DIDF 0 "low_register_operand" ""))]
> > +  "TARGET_THUMB1
> > +   && peep2_reg_dead_p (1, operands[1])"
> > +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
> > +     (match_dup 0))]
>
> There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:
>
>   stmia r0!, {r0, r1}
>
> is ok, but
>
>   stmia r1!, {r0, r1}
>
> is not.
>

I don't think that base register will appear also in the list here,
because it can't be Pmode and DI/DFmode at the same time, can it?

> > +  ""
> > +)
> > diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > new file mode 100644
> > index 00000000000..167fa9ec876
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-mthumb -Os" }  */
> > +/* { dg-require-effective-target arm_thumb1_ok } */
> > +
> > +void copy_df(double *dst, const double *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +void copy_di(unsigned long long *dst, const unsigned long long *src)
> > +{
> > +    *dst = *src;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
> > +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
>
> R.
>

BR,
Siarhei

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores
  2024-06-19 15:11   ` Siarhei Volkau
@ 2024-06-19 15:32     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 4+ messages in thread
From: Richard Earnshaw (lists) @ 2024-06-19 15:32 UTC (permalink / raw)
  To: Siarhei Volkau; +Cc: gcc-patches

On 19/06/2024 16:11, Siarhei Volkau wrote:
> ср, 19 июн. 2024 г. в 15:19, Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com>:
>>
>> On 18/06/2024 19:14, Siarhei Volkau wrote:
>>> If the address register is dead after load/store operation it looks
>>> beneficial to use LDMIA/STMIA instead of pair of LDR/STR instructions,
>>> at least if optimizing for size.
>>>
>>> Changes v1 -> v2:
>>>  - switching to peephole2 approach
>>>  - added test case
>>>
>>> gcc/ChangeLog:
>>>
>>>         * config/arm/thumb1.md (peephole2 to rewrite DI/DF load): New.
>>>         (peephole2 to rewrite DI/DF store): New.
>>>         (thumb1_movdi_insn): Handle overlapped regs ldmia case.
>>>         (thumb_movdf_insn): Likewise.
>>>
>>>       * config/arm/iterators.md (DIDF): New.
>>>
>>> gcc/testsuite:
>>>
>>>         * gcc.target/arm/thumb1-load-store-64bit.c: Add new test.
>>>
>>> Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
>>> ---
>>>  gcc/config/arm/iterators.md                   |  3 +++
>>>  gcc/config/arm/thumb1.md                      | 27 ++++++++++++++++++-
>>>  .../gcc.target/arm/thumb1-load-store-64bit.c  | 16 +++++++++++
>>>  3 files changed, 45 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>>
>>> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
>>> index 8d066fcf05d..09046bff83b 100644
>>> --- a/gcc/config/arm/iterators.md
>>> +++ b/gcc/config/arm/iterators.md
>>> @@ -50,6 +50,9 @@ (define_mode_iterator QHSD [QI HI SI DI])
>>>  ;; A list of the 32bit and 64bit integer modes
>>>  (define_mode_iterator SIDI [SI DI])
>>>
>>> +;; A list of the 64bit modes for thumb1.
>>> +(define_mode_iterator DIDF [DI DF])
>>> +
>>>  ;; A list of atomic compare and swap success return modes
>>>  (define_mode_iterator CCSI [(CC_Z "TARGET_32BIT") (SI "TARGET_THUMB1")])
>>>
>>> diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
>>> index d7074b43f60..ed4b706773a 100644
>>> --- a/gcc/config/arm/thumb1.md
>>> +++ b/gcc/config/arm/thumb1.md
>>> @@ -633,6 +633,8 @@ (define_insn "*thumb1_movdi_insn"
>>>        gcc_assert (TARGET_HAVE_MOVT);
>>>        return \"movw\\t%Q0, %L1\;movs\\tR0, #0\";
>>>      case 4:
>>> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> +     return \"ldmia\\t%m1, {%0, %H0}\";
>>
>> See below for why I don't think this is a case we need to consider here.
>>
>>>        return \"ldmia\\t%1, {%0, %H0}\";
>>>      case 5:
>>>        return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -966,6 +968,8 @@ (define_insn "*thumb_movdf_insn"
>>>       return \"adds\\t%0, %1, #0\;adds\\t%H0, %H1, #0\";
>>>        return \"adds\\t%H0, %H1, #0\;adds\\t%0, %1, #0\";
>>>      case 1:
>>> +      if (reg_overlap_mentioned_p (operands[0], operands[1]))
>>> +     return \"ldmia\\t%m1, {%0, %H0}\";
>>>        return \"ldmia\\t%1, {%0, %H0}\";
>>>      case 2:
>>>        return \"stmia\\t%0, {%1, %H1}\";
>>> @@ -2055,4 +2059,25 @@ (define_insn "thumb1_stack_protect_test_insn"
>>>     (set_attr "conds" "clob")
>>>     (set_attr "type" "multiple")]
>>>  )
>>> -
>>> +
>>> +;; match patterns usable by ldmia/stmia
>>> +(define_peephole2
>>> +  [(set (match_operand:DIDF 0 "low_register_operand" "")
>>> +     (mem:DIDF (match_operand:SI 1 "low_register_operand")))]
>>> +  "TARGET_THUMB1
>>> +   && (peep2_reg_dead_p (1, operands[1])
>>> +       || REGNO (operands[0]) + 1 == REGNO (operands[1]))"
>>
>> I don't understand this second condition (partial overlap of the base address with the value loaded), what are you guarding against here?  The instruction specification says that there is no writeback if the base register has any overlap with any of the loaded registers, so really we should reject the peephole if that's true (we'd have invalid RTL as well as there would end up being two writes to the same register).
>>
> 
> The first condition here is for ldmia cases with writeback, the second
> condition is for cases without writeback. I falsely thought that base
> register has to be the last register in the list similarly like for
> stmia it has to be first.
> So, I have to reject the no-writeback case there because it emits
> incorrect RTL, this is clear now.
> 
> However, to not miss possible optimization, no-writeback cases check
> might be moved to instruction itself as it was done in v1 patch
> (incorrect instruction length will appear again then).
> Any objections on that?

Yes, I think the no-writeback case will need to be handled inside the code that prints the instruction (like you were doing originally).  We'll just have to forgo the knowledge that this case only needs 2 bytes for the instruction here (it's not a major loss).

> 
>>> +  [(set (match_dup 0)
>>> +     (mem:DIDF (post_inc:SI (match_dup 1))))]
>>> +  ""
>>
>> This is not enough, unfortunately.  MEM() objects carry attributes about the memory accessed (alias sets, known alignment, etc) and these will not be propagated correctly if you rewrite the pattern this way.  The correct solution is to match the entire mem as operand1, then use change_address to rewrite that.  Something like:
>>
>>      operands[1] = change_address (operands[1], VOIDmode,
>>                                    gen_rtx_POST_INC (SImode,
>>                                                      XEXP (operands[1], 0)));
>>
> 
> Got it, will rewrite, thanks.
> 
>>> +)
>>> +
>>> +(define_peephole2
>>> +  [(set (mem:DIDF (match_operand:SI 1 "low_register_operand"))
>>> +     (match_operand:DIDF 0 "low_register_operand" ""))]
>>> +  "TARGET_THUMB1
>>> +   && peep2_reg_dead_p (1, operands[1])"
>>> +  [(set (mem:DIDF (post_inc:SI (match_dup 1)))
>>> +     (match_dup 0))]
>>
>> There's a similar address rewriting issue here as well, but we also have to guard against the (unlikely) case where the base register is part of the value stored if it isn't the lowest numbered register in the list.  In that case the value stored would be undefined.  That is:
>>
>>   stmia r0!, {r0, r1}
>>
>> is ok, but
>>
>>   stmia r1!, {r0, r1}
>>
>> is not.
>>
> 
> I don't think that base register will appear also in the list here,
> because it can't be Pmode and DI/DFmode at the same time, can it?

It's highly unlikely (as I said), but it could theoretically happen with some very weird casts.  Something a bit like

int64_t f;
...
*(int64_t *)f = f;

might trigger it.  This would be OK when compiling little-endian code, as the low bits used for the pointer are in the first transfer value; but incorrect on big-endian since then we'd be using the second transfer register for the pointer.

But compilers are all about weird corner cases :)

R.

> 
>>> +  ""
>>> +)
>>> diff --git a/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>> new file mode 100644
>>> index 00000000000..167fa9ec876
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/thumb1-load-store-64bit.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mthumb -Os" }  */
>>> +/* { dg-require-effective-target arm_thumb1_ok } */
>>> +
>>> +void copy_df(double *dst, const double *src)
>>> +{
>>> +    *dst = *src;
>>> +}
>>> +
>>> +void copy_di(unsigned long long *dst, const unsigned long long *src)
>>> +{
>>> +    *dst = *src;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler-times "ldmia\tr\[0-7\]" 2 } } */
>>> +/* { dg-final { scan-assembler-times "stmia\tr\[0-7\]!" 2 } } */
>>
>> R.
>>
> 
> BR,
> Siarhei


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-19 15:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-18 18:14 [PATCH v2] ARM: thumb1: Use LDMIA/STMIA for DI/DF loads/stores Siarhei Volkau
2024-06-19 12:19 ` Richard Earnshaw (lists)
2024-06-19 15:11   ` Siarhei Volkau
2024-06-19 15:32     ` Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).