* [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125]
2021-09-09 11:09 [PATCH v2 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
@ 2021-09-09 11:09 ` Richard Earnshaw
2021-09-09 12:23 ` Richard Biener
2021-09-09 11:09 ` [PATCH v2 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
2 siblings, 1 reply; 6+ messages in thread
From: Richard Earnshaw @ 2021-09-09 11:09 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther
[-- Attachment #1: Type: text/plain, Size: 596 bytes --]
gen_lowpart_general handles forming a SUBREG of a MEM by using
adjust_address to rework and validate a new version of the MEM.
However, gen_highpart does not attempt this and simply returns (SUBREG
(MEM)) if the change is not 'obviously' safe. Improve on that by
using a similar approach so that gen_lowpart and gen_highpart are
mostly symmetrical in this regard.
gcc/ChangeLog:
PR target/102125
* emit-rtl.c (gen_highpart): If simplify_gen_subreg returns
SUBREG (MEM) for a MEM, use adjust_address to produce a new
MEM.
---
gcc/emit-rtl.c | 8 ++++++++
1 file changed, 8 insertions(+)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0001-rtl-properly-handle-subreg-mem-in-gen_highpart-PR.patch --]
[-- Type: text/x-patch; name="v2-0001-rtl-properly-handle-subreg-mem-in-gen_highpart-PR.patch", Size: 599 bytes --]
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 77ea8948ee8..bacf6fffa22 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1597,6 +1597,14 @@ gen_highpart (machine_mode mode, rtx x)
result = validize_mem (result);
gcc_assert (result);
}
+ /* It may also just put a SUBREG wrapper around a MEM for the same
+ reason. */
+ else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))
+ && MEM_P (x))
+ {
+ poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
+ result = adjust_address (x, mode, offset);
+ }
return result;
}
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125]
2021-09-09 11:09 ` [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125] Richard Earnshaw
@ 2021-09-09 12:23 ` Richard Biener
2021-09-09 14:39 ` Richard Earnshaw
0 siblings, 1 reply; 6+ messages in thread
From: Richard Biener @ 2021-09-09 12:23 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: GCC Patches
On Thu, Sep 9, 2021 at 1:09 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>
>
> gen_lowpart_general handles forming a SUBREG of a MEM by using
> adjust_address to rework and validate a new version of the MEM.
> However, gen_highpart does not attempt this and simply returns (SUBREG
> (MEM)) if the change is not 'obviously' safe. Improve on that by
> using a similar approach so that gen_lowpart and gen_highpart are
> mostly symmetrical in this regard.
When I decipher gen_lowpart correctly then it doesn't generate the
subreg of the mem in the first place so doing it like that in gen_highpart
would _not_ invoke simplify_gen_subreg on a MEM_P but instead
do what you now do directly?
I also wonder why gen_lowpart_general uses byte_lowpart_offset
while you use subreg_highpart_offset where subreg_lowpart_offset
is also available ... huh - and there's also
subreg_size_{lowpart,highpart}_offset.
So it looks like your case wouldn't handle the paradoxical highpart
(which better shouldn't be accessed?).
So like
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 77ea8948ee8..c3dae7d8075 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1585,6 +1585,13 @@ gen_highpart (machine_mode mode, rtx x)
gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
|| known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
+ /* Offset MEMs. */
+ if (MEM_P (x))
+ {
+ poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
+ return adjust_address (x, mode, offset);
+ }
+
result = simplify_gen_subreg (mode, x, GET_MODE (x),
subreg_highpart_offset (mode, GET_MODE (x)));
gcc_assert (result);
Testing
+ else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))
+ && MEM_P (x))
looks a bit odd to me.
I'll note it leaves gen_highpart_mode "unfixed", some refactoring should
instead commonize the worker for both interfaces, making gen_highpart
invoke gen_highpart_mode or so.
> gcc/ChangeLog:
>
> PR target/102125
> * emit-rtl.c (gen_highpart): If simplify_gen_subreg returns
> SUBREG (MEM) for a MEM, use adjust_address to produce a new
> MEM.
> ---
> gcc/emit-rtl.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125]
2021-09-09 12:23 ` Richard Biener
@ 2021-09-09 14:39 ` Richard Earnshaw
0 siblings, 0 replies; 6+ messages in thread
From: Richard Earnshaw @ 2021-09-09 14:39 UTC (permalink / raw)
To: Richard Biener, Richard Earnshaw; +Cc: GCC Patches
On 09/09/2021 13:23, Richard Biener via Gcc-patches wrote:
> On Thu, Sep 9, 2021 at 1:09 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>>
>>
>> gen_lowpart_general handles forming a SUBREG of a MEM by using
>> adjust_address to rework and validate a new version of the MEM.
>> However, gen_highpart does not attempt this and simply returns (SUBREG
>> (MEM)) if the change is not 'obviously' safe. Improve on that by
>> using a similar approach so that gen_lowpart and gen_highpart are
>> mostly symmetrical in this regard.
>
> When I decipher gen_lowpart correctly then it doesn't generate the
> subreg of the mem in the first place so doing it like that in gen_highpart
> would _not_ invoke simplify_gen_subreg on a MEM_P but instead
> do what you now do directly?
>
> I also wonder why gen_lowpart_general uses byte_lowpart_offset
> while you use subreg_highpart_offset where subreg_lowpart_offset
> is also available ... huh - and there's also
> subreg_size_{lowpart,highpart}_offset.
> So it looks like your case wouldn't handle the paradoxical highpart
> (which better shouldn't be accessed?).
>
Surely the highpart of a paradoxical subreg is meaningless... what's the
highpart when the outer subreg is wider than the inner one?
And that's why there is subreg_lowpart_offset, subreg_highpart_offset
and byte_lowpart_offset, but not byte_highpart_offset (because the
latter is there to handle paradoxical cases, but decays to
subreg_lowpart_offset for a non-paradoxical subreg case).
> So like
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index 77ea8948ee8..c3dae7d8075 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -1585,6 +1585,13 @@ gen_highpart (machine_mode mode, rtx x)
> gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
> || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
>
> + /* Offset MEMs. */
> + if (MEM_P (x))
> + {
> + poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
> + return adjust_address (x, mode, offset);
> + }
> +
> result = simplify_gen_subreg (mode, x, GET_MODE (x),
> subreg_highpart_offset (mode, GET_MODE (x)));
> gcc_assert (result);
>
In which case, I'm pretty certain the subsequent MEM_P (result) test can
be removed, as I can't see how simplify_gen_subreg would return a MEM
with such a change.
> Testing
>
> + else if (GET_CODE (result) == SUBREG && MEM_P (SUBREG_REG (result))
> + && MEM_P (x))
>
> looks a bit odd to me.
>
> I'll note it leaves gen_highpart_mode "unfixed", some refactoring should
> instead commonize the worker for both interfaces, making gen_highpart
> invoke gen_highpart_mode or so.
>
gen_highpart_mode invokes gen_highpart if the inner mode is not
VOIDmode. Perhaps the logic is somewhat backwards, or perhaps it's just
a bit more efficient that way.
I'll try your suggested change.
R.
>> gcc/ChangeLog:
>>
>> PR target/102125
>> * emit-rtl.c (gen_highpart): If simplify_gen_subreg returns
>> SUBREG (MEM) for a MEM, use adjust_address to produce a new
>> MEM.
>> ---
>> gcc/emit-rtl.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/3] arm: expand handling of movmisalign for DImode [PR102125]
2021-09-09 11:09 [PATCH v2 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125] Richard Earnshaw
@ 2021-09-09 11:09 ` Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
2 siblings, 0 replies; 6+ messages in thread
From: Richard Earnshaw @ 2021-09-09 11:09 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw
[-- Attachment #1: Type: text/plain, Size: 465 bytes --]
DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.
gcc/ChangeLog:
PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.
---
gcc/config/arm/arm.md | 16 ++++++++++++++++
gcc/config/arm/vec-common.md | 4 ++--
2 files changed, 18 insertions(+), 2 deletions(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0002-arm-expand-handling-of-movmisalign-for-DImode-PR1.patch --]
[-- Type: text/x-patch; name="v2-0002-arm-expand-handling-of-movmisalign-for-DImode-PR1.patch", Size: 1556 bytes --]
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5d3f21b91c4..4adc976b8b6 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12617,6 +12617,22 @@ (define_expand "copysigndf3"
}"
)
+;; movmisalign for DImode
+(define_expand "movmisaligndi"
+ [(match_operand:DI 0 "general_operand")
+ (match_operand:DI 1 "general_operand")]
+ "unaligned_access"
+{
+ rtx lo_op0 = gen_lowpart (SImode, operands[0]);
+ rtx lo_op1 = gen_lowpart (SImode, operands[1]);
+ rtx hi_op0 = gen_highpart_mode (SImode, DImode, operands[0]);
+ rtx hi_op1 = gen_highpart_mode (SImode, DImode, operands[1]);
+
+ emit_insn (gen_movmisalignsi (lo_op0, lo_op1));
+ emit_insn (gen_movmisalignsi (hi_op0, hi_op1));
+ DONE;
+})
+
;; movmisalign patterns for HImode and SImode.
(define_expand "movmisalign<mode>"
[(match_operand:HSI 0 "general_operand")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 68de4f0f943..e71d9b3811f 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -281,8 +281,8 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
})
(define_expand "movmisalign<mode>"
- [(set (match_operand:VDQX 0 "neon_perm_struct_or_reg_operand")
- (unspec:VDQX [(match_operand:VDQX 1 "neon_perm_struct_or_reg_operand")]
+ [(set (match_operand:VDQ 0 "neon_perm_struct_or_reg_operand")
+ (unspec:VDQ [(match_operand:VDQ 1 "neon_perm_struct_or_reg_operand")]
UNSPEC_MISALIGNED_ACCESS))]
"ARM_HAVE_<MODE>_LDST && !BYTES_BIG_ENDIAN
&& unaligned_access && !TARGET_REALLY_IWMMXT"
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] gimple: allow more folding of memcpy [PR102125]
2021-09-09 11:09 [PATCH v2 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
@ 2021-09-09 11:09 ` Richard Earnshaw
2 siblings, 0 replies; 6+ messages in thread
From: Richard Earnshaw @ 2021-09-09 11:09 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw
[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations. So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.
Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.
On Arm this now permits optimizing
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
from
ldr r2, [r0] @ unaligned
sub sp, sp, #8
ldr r3, [r0, #4] @ unaligned
strd r2, [sp]
ldrd r0, [sp]
add sp, sp, #8
to
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
gcc/ChangeLog:
PR target/102125
* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
---
gcc/gimple-fold.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v2-0003-gimple-allow-more-folding-of-memcpy-PR102125.patch --]
[-- Type: text/x-patch; name="v2-0003-gimple-allow-more-folding-of-memcpy-PR102125.patch", Size: 2038 bytes --]
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3f2c176cff6..d9ffb5006f5 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -67,6 +67,8 @@ along with GCC; see the file COPYING3. If not see
#include "tree-vector-builder.h"
#include "tree-ssa-strlen.h"
#include "varasm.h"
+#include "memmodel.h"
+#include "optabs.h"
enum strlen_range_kind {
/* Compute the exact constant string length. */
@@ -957,14 +959,17 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
= build_int_cst (build_pointer_type_for_mode (char_type_node,
ptr_mode, true), 0);
- /* If we can perform the copy efficiently with first doing all loads
- and then all stores inline it that way. Currently efficiently
- means that we can load all the memory into a single integer
- register which is what MOVE_MAX gives us. */
+ /* If we can perform the copy efficiently with first doing all loads and
+ then all stores inline it that way. Currently efficiently means that
+ we can load all the memory with a single set operation and that the
+ total size is less than MOVE_MAX * MOVE_RATIO. */
src_align = get_pointer_alignment (src);
dest_align = get_pointer_alignment (dest);
if (tree_fits_uhwi_p (len)
- && compare_tree_int (len, MOVE_MAX) <= 0
+ && (compare_tree_int
+ (len, (MOVE_MAX
+ * MOVE_RATIO (optimize_function_for_size_p (cfun))))
+ <= 0)
/* FIXME: Don't transform copies from strings with known length.
Until GCC 9 this prevented a case in gcc.dg/strlenopt-8.c
from being handled, and the case was XFAILed for that reason.
@@ -1000,6 +1005,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
if (type
&& is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
&& GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
+ && have_insn_for (SET, mode)
/* If the destination pointer is not aligned we must be able
to emit an unaligned store. */
&& (dest_align >= GET_MODE_ALIGNMENT (mode)
^ permalink raw reply [flat|nested] 6+ messages in thread