* [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125]
2021-09-10 14:48 [PATCH v3 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
@ 2021-09-10 14:48 ` Richard Earnshaw
2021-09-13 9:38 ` Richard Biener
2021-09-13 9:38 ` Richard Sandiford
2021-09-10 14:48 ` [PATCH v3 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
2021-09-10 14:48 ` [PATCH v3 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
2 siblings, 2 replies; 8+ messages in thread
From: Richard Earnshaw @ 2021-09-10 14:48 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther
[-- Attachment #1: Type: text/plain, Size: 454 bytes --]
gen_lowpart_general handles forming a lowpart of a MEM by using
adjust_address to rework and validate a new version of the MEM.
Do the same for gen_highpart rather than calling simplify_gen_subreg
for this case.
gcc/ChangeLog:
PR target/102125
* emit-rtl.c (gen_highpart): Use adjust_address to handle
MEM rather than calling simplify_gen_subreg.
---
gcc/emit-rtl.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0001-rtl-directly-handle-MEM-in-gen_highpart-PR102125.patch --]
[-- Type: text/x-patch; name="v3-0001-rtl-directly-handle-MEM-in-gen_highpart-PR102125.patch", Size: 1449 bytes --]
diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 77ea8948ee8..0ba110879aa 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1585,19 +1585,22 @@ gen_highpart (machine_mode mode, rtx x)
gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
|| known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
- result = simplify_gen_subreg (mode, x, GET_MODE (x),
- subreg_highpart_offset (mode, GET_MODE (x)));
- gcc_assert (result);
-
- /* simplify_gen_subreg is not guaranteed to return a valid operand for
- the target if we have a MEM. gen_highpart must return a valid operand,
- emitting code if necessary to do so. */
- if (MEM_P (result))
+ /* gen_lowpart_common handles a lot of special cases due to needing to handle
+ paradoxical subregs; it only calls simplify_gen_subreg when certain that
+ it will produce something meaningful. The only case we need to handle
+ specially here is MEM. */
+ if (MEM_P (x))
{
- result = validize_mem (result);
- gcc_assert (result);
+ poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
+ return adjust_address (x, mode, offset);
}
+ result = simplify_gen_subreg (mode, x, GET_MODE (x),
+ subreg_highpart_offset (mode, GET_MODE (x)));
+ /* Since we handle MEM directly above, we should never get a MEM back
+ from simplify_gen_subreg. */
+ gcc_assert (result && !MEM_P (result));
+
return result;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125]
2021-09-10 14:48 ` [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125] Richard Earnshaw
@ 2021-09-13 9:38 ` Richard Biener
2021-09-13 9:38 ` Richard Sandiford
1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2021-09-13 9:38 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: GCC Patches
On Fri, Sep 10, 2021 at 4:48 PM Richard Earnshaw <rearnsha@arm.com> wrote:
>
>
> gen_lowpart_general handles forming a lowpart of a MEM by using
> adjust_address to rework and validate a new version of the MEM.
> Do the same for gen_highpart rather than calling simplify_gen_subreg
> for this case.
OK from my side.
Thanks,
Richard.
> gcc/ChangeLog:
>
> PR target/102125
> * emit-rtl.c (gen_highpart): Use adjust_address to handle
> MEM rather than calling simplify_gen_subreg.
> ---
> gcc/emit-rtl.c | 23 +++++++++++++----------
> 1 file changed, 13 insertions(+), 10 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125]
2021-09-10 14:48 ` [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125] Richard Earnshaw
2021-09-13 9:38 ` Richard Biener
@ 2021-09-13 9:38 ` Richard Sandiford
2021-09-13 9:51 ` Richard Earnshaw
1 sibling, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2021-09-13 9:38 UTC (permalink / raw)
To: Richard Earnshaw via Gcc-patches; +Cc: Richard Earnshaw
Richard Earnshaw via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> gen_lowpart_general handles forming a lowpart of a MEM by using
> adjust_address to rework and validate a new version of the MEM.
> Do the same for gen_highpart rather than calling simplify_gen_subreg
> for this case.
Looks OK, but what went wrong with the existing code? Did
simplify_gen_subreg refuse to handle a MEM that you wanted
it to handle, or did the validize_mem go wrong for some reason?
Thanks,
Richard
> gcc/ChangeLog:
>
> PR target/102125
> * emit-rtl.c (gen_highpart): Use adjust_address to handle
> MEM rather than calling simplify_gen_subreg.
> ---
> gcc/emit-rtl.c | 23 +++++++++++++----------
> 1 file changed, 13 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> index 77ea8948ee8..0ba110879aa 100644
> --- a/gcc/emit-rtl.c
> +++ b/gcc/emit-rtl.c
> @@ -1585,19 +1585,22 @@ gen_highpart (machine_mode mode, rtx x)
> gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
> || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
>
> - result = simplify_gen_subreg (mode, x, GET_MODE (x),
> - subreg_highpart_offset (mode, GET_MODE (x)));
> - gcc_assert (result);
> -
> - /* simplify_gen_subreg is not guaranteed to return a valid operand for
> - the target if we have a MEM. gen_highpart must return a valid operand,
> - emitting code if necessary to do so. */
> - if (MEM_P (result))
> + /* gen_lowpart_common handles a lot of special cases due to needing to handle
> + paradoxical subregs; it only calls simplify_gen_subreg when certain that
> + it will produce something meaningful. The only case we need to handle
> + specially here is MEM. */
> + if (MEM_P (x))
> {
> - result = validize_mem (result);
> - gcc_assert (result);
> + poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
> + return adjust_address (x, mode, offset);
> }
>
> + result = simplify_gen_subreg (mode, x, GET_MODE (x),
> + subreg_highpart_offset (mode, GET_MODE (x)));
> + /* Since we handle MEM directly above, we should never get a MEM back
> + from simplify_gen_subreg. */
> + gcc_assert (result && !MEM_P (result));
> +
> return result;
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125]
2021-09-13 9:38 ` Richard Sandiford
@ 2021-09-13 9:51 ` Richard Earnshaw
2021-09-13 9:54 ` Richard Sandiford
0 siblings, 1 reply; 8+ messages in thread
From: Richard Earnshaw @ 2021-09-13 9:51 UTC (permalink / raw)
To: Richard Earnshaw via Gcc-patches, Richard Earnshaw, richard.sandiford
On 13/09/2021 10:38, Richard Sandiford via Gcc-patches wrote:
> Richard Earnshaw via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> gen_lowpart_general handles forming a lowpart of a MEM by using
>> adjust_address to rework and validate a new version of the MEM.
>> Do the same for gen_highpart rather than calling simplify_gen_subreg
>> for this case.
>
> Looks OK, but what went wrong with the existing code? Did
> simplify_gen_subreg refuse to handle a MEM that you wanted
> it to handle, or did the validize_mem go wrong for some reason?
It refused to handle it and simply returned (subreg (mem)) - see the
discussion on version 1 of the patch series.
R.
>
> Thanks,
> Richard
>
>> gcc/ChangeLog:
>>
>> PR target/102125
>> * emit-rtl.c (gen_highpart): Use adjust_address to handle
>> MEM rather than calling simplify_gen_subreg.
>> ---
>> gcc/emit-rtl.c | 23 +++++++++++++----------
>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>> index 77ea8948ee8..0ba110879aa 100644
>> --- a/gcc/emit-rtl.c
>> +++ b/gcc/emit-rtl.c
>> @@ -1585,19 +1585,22 @@ gen_highpart (machine_mode mode, rtx x)
>> gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
>> || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
>>
>> - result = simplify_gen_subreg (mode, x, GET_MODE (x),
>> - subreg_highpart_offset (mode, GET_MODE (x)));
>> - gcc_assert (result);
>> -
>> - /* simplify_gen_subreg is not guaranteed to return a valid operand for
>> - the target if we have a MEM. gen_highpart must return a valid operand,
>> - emitting code if necessary to do so. */
>> - if (MEM_P (result))
>> + /* gen_lowpart_common handles a lot of special cases due to needing to handle
>> + paradoxical subregs; it only calls simplify_gen_subreg when certain that
>> + it will produce something meaningful. The only case we need to handle
>> + specially here is MEM. */
>> + if (MEM_P (x))
>> {
>> - result = validize_mem (result);
>> - gcc_assert (result);
>> + poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
>> + return adjust_address (x, mode, offset);
>> }
>>
>> + result = simplify_gen_subreg (mode, x, GET_MODE (x),
>> + subreg_highpart_offset (mode, GET_MODE (x)));
>> + /* Since we handle MEM directly above, we should never get a MEM back
>> + from simplify_gen_subreg. */
>> + gcc_assert (result && !MEM_P (result));
>> +
>> return result;
>> }
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125]
2021-09-13 9:51 ` Richard Earnshaw
@ 2021-09-13 9:54 ` Richard Sandiford
0 siblings, 0 replies; 8+ messages in thread
From: Richard Sandiford @ 2021-09-13 9:54 UTC (permalink / raw)
To: Richard Earnshaw; +Cc: Richard Earnshaw via Gcc-patches, Richard Earnshaw
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> On 13/09/2021 10:38, Richard Sandiford via Gcc-patches wrote:
>> Richard Earnshaw via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>>> gen_lowpart_general handles forming a lowpart of a MEM by using
>>> adjust_address to rework and validate a new version of the MEM.
>>> Do the same for gen_highpart rather than calling simplify_gen_subreg
>>> for this case.
>>
>> Looks OK, but what went wrong with the existing code? Did
>> simplify_gen_subreg refuse to handle a MEM that you wanted
>> it to handle, or did the validize_mem go wrong for some reason?
>
> It refused to handle it and simply returned (subreg (mem)) - see the
> discussion on version 1 of the patch series.
OK, that's good then. The patch is OK from my POV too FWIW.
Richard
>>> gcc/ChangeLog:
>>>
>>> PR target/102125
>>> * emit-rtl.c (gen_highpart): Use adjust_address to handle
>>> MEM rather than calling simplify_gen_subreg.
>>> ---
>>> gcc/emit-rtl.c | 23 +++++++++++++----------
>>> 1 file changed, 13 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
>>> index 77ea8948ee8..0ba110879aa 100644
>>> --- a/gcc/emit-rtl.c
>>> +++ b/gcc/emit-rtl.c
>>> @@ -1585,19 +1585,22 @@ gen_highpart (machine_mode mode, rtx x)
>>> gcc_assert (known_le (msize, (unsigned int) UNITS_PER_WORD)
>>> || known_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
>>>
>>> - result = simplify_gen_subreg (mode, x, GET_MODE (x),
>>> - subreg_highpart_offset (mode, GET_MODE (x)));
>>> - gcc_assert (result);
>>> -
>>> - /* simplify_gen_subreg is not guaranteed to return a valid operand for
>>> - the target if we have a MEM. gen_highpart must return a valid operand,
>>> - emitting code if necessary to do so. */
>>> - if (MEM_P (result))
>>> + /* gen_lowpart_common handles a lot of special cases due to needing to handle
>>> + paradoxical subregs; it only calls simplify_gen_subreg when certain that
>>> + it will produce something meaningful. The only case we need to handle
>>> + specially here is MEM. */
>>> + if (MEM_P (x))
>>> {
>>> - result = validize_mem (result);
>>> - gcc_assert (result);
>>> + poly_int64 offset = subreg_highpart_offset (mode, GET_MODE (x));
>>> + return adjust_address (x, mode, offset);
>>> }
>>>
>>> + result = simplify_gen_subreg (mode, x, GET_MODE (x),
>>> + subreg_highpart_offset (mode, GET_MODE (x)));
>>> + /* Since we handle MEM directly above, we should never get a MEM back
>>> + from simplify_gen_subreg. */
>>> + gcc_assert (result && !MEM_P (result));
>>> +
>>> return result;
>>> }
>>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 2/3] arm: expand handling of movmisalign for DImode [PR102125]
2021-09-10 14:48 [PATCH v3 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-10 14:48 ` [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125] Richard Earnshaw
@ 2021-09-10 14:48 ` Richard Earnshaw
2021-09-10 14:48 ` [PATCH v3 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw
2 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2021-09-10 14:48 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw
[-- Attachment #1: Type: text/plain, Size: 465 bytes --]
DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.
gcc/ChangeLog:
PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ mode.
---
gcc/config/arm/arm.md | 16 ++++++++++++++++
gcc/config/arm/vec-common.md | 4 ++--
2 files changed, 18 insertions(+), 2 deletions(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0002-arm-expand-handling-of-movmisalign-for-DImode-PR1.patch --]
[-- Type: text/x-patch; name="v3-0002-arm-expand-handling-of-movmisalign-for-DImode-PR1.patch", Size: 1556 bytes --]
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 5d3f21b91c4..4adc976b8b6 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -12617,6 +12617,22 @@ (define_expand "copysigndf3"
}"
)
+;; movmisalign for DImode
+(define_expand "movmisaligndi"
+ [(match_operand:DI 0 "general_operand")
+ (match_operand:DI 1 "general_operand")]
+ "unaligned_access"
+{
+ rtx lo_op0 = gen_lowpart (SImode, operands[0]);
+ rtx lo_op1 = gen_lowpart (SImode, operands[1]);
+ rtx hi_op0 = gen_highpart_mode (SImode, DImode, operands[0]);
+ rtx hi_op1 = gen_highpart_mode (SImode, DImode, operands[1]);
+
+ emit_insn (gen_movmisalignsi (lo_op0, lo_op1));
+ emit_insn (gen_movmisalignsi (hi_op0, hi_op1));
+ DONE;
+})
+
;; movmisalign patterns for HImode and SImode.
(define_expand "movmisalign<mode>"
[(match_operand:HSI 0 "general_operand")
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 68de4f0f943..e71d9b3811f 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -281,8 +281,8 @@ (define_expand "cml<fcmac1><conj_op><mode>4"
})
(define_expand "movmisalign<mode>"
- [(set (match_operand:VDQX 0 "neon_perm_struct_or_reg_operand")
- (unspec:VDQX [(match_operand:VDQX 1 "neon_perm_struct_or_reg_operand")]
+ [(set (match_operand:VDQ 0 "neon_perm_struct_or_reg_operand")
+ (unspec:VDQ [(match_operand:VDQ 1 "neon_perm_struct_or_reg_operand")]
UNSPEC_MISALIGNED_ACCESS))]
"ARM_HAVE_<MODE>_LDST && !BYTES_BIG_ENDIAN
&& unaligned_access && !TARGET_REALLY_IWMMXT"
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 3/3] gimple: allow more folding of memcpy [PR102125]
2021-09-10 14:48 [PATCH v3 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-10 14:48 ` [PATCH v3 1/3] rtl: directly handle MEM in gen_highpart [PR102125] Richard Earnshaw
2021-09-10 14:48 ` [PATCH v3 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
@ 2021-09-10 14:48 ` Richard Earnshaw
2 siblings, 0 replies; 8+ messages in thread
From: Richard Earnshaw @ 2021-09-10 14:48 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Earnshaw
[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations. So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.
Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.
On Arm this now permits optimizing
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
from
ldr r2, [r0] @ unaligned
sub sp, sp, #8
ldr r3, [r0, #4] @ unaligned
strd r2, [sp]
ldrd r0, [sp]
add sp, sp, #8
to
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
gcc/ChangeLog:
PR target/102125
* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
---
gcc/gimple-fold.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: v3-0003-gimple-allow-more-folding-of-memcpy-PR102125.patch --]
[-- Type: text/x-patch; name="v3-0003-gimple-allow-more-folding-of-memcpy-PR102125.patch", Size: 2038 bytes --]
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3f2c176cff6..d9ffb5006f5 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -67,6 +67,8 @@ along with GCC; see the file COPYING3. If not see
#include "tree-vector-builder.h"
#include "tree-ssa-strlen.h"
#include "varasm.h"
+#include "memmodel.h"
+#include "optabs.h"
enum strlen_range_kind {
/* Compute the exact constant string length. */
@@ -957,14 +959,17 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
= build_int_cst (build_pointer_type_for_mode (char_type_node,
ptr_mode, true), 0);
- /* If we can perform the copy efficiently with first doing all loads
- and then all stores inline it that way. Currently efficiently
- means that we can load all the memory into a single integer
- register which is what MOVE_MAX gives us. */
+ /* If we can perform the copy efficiently with first doing all loads and
+ then all stores inline it that way. Currently efficiently means that
+ we can load all the memory with a single set operation and that the
+ total size is less than MOVE_MAX * MOVE_RATIO. */
src_align = get_pointer_alignment (src);
dest_align = get_pointer_alignment (dest);
if (tree_fits_uhwi_p (len)
- && compare_tree_int (len, MOVE_MAX) <= 0
+ && (compare_tree_int
+ (len, (MOVE_MAX
+ * MOVE_RATIO (optimize_function_for_size_p (cfun))))
+ <= 0)
/* FIXME: Don't transform copies from strings with known length.
Until GCC 9 this prevented a case in gcc.dg/strlenopt-8.c
from being handled, and the case was XFAILed for that reason.
@@ -1000,6 +1005,7 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
if (type
&& is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
&& GET_MODE_SIZE (mode) * BITS_PER_UNIT == ilen * 8
+ && have_insn_for (SET, mode)
/* If the destination pointer is not aligned we must be able
to emit an unaligned store. */
&& (dest_align >= GET_MODE_ALIGNMENT (mode)
^ permalink raw reply [flat|nested] 8+ messages in thread