public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] lower more cases of memcpy [PR102125]
@ 2021-09-09 11:09 Richard Earnshaw
  2021-09-09 11:09 ` [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125] Richard Earnshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Richard Earnshaw @ 2021-09-09 11:09 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, richard.guenther

Changes since version 1:

patch 1 is reworked entirely to handle SUBREG (MEM) in gen_highpart.  This
brings it more in line with the way gen_lowpart_general handles this case.

patch 2 is simplified because, having reread the manual description of
movmisalign I realised that this pattern can never be called with two
MEM operands.

patch 3 is unchanged, it's included here only for completeness.


 -------------

This short patch series is designed to address some more cases where we
can usefully lower memcpy operations during gimple fold.  The current
code restricts this lowering to a maximum size of MOVE_MAX, ie the size
of a single integer register on the machine, but with modern architectures
this is likely too restrictive.  The motivating example is

uint64_t bar64(const uint8_t *rData1)
{
    uint64_t buffer;
    __builtin_memcpy(&buffer, rData1, sizeof(buffer));
    return buffer;
}

which on a 32-bit machine ends up with an inlined memcpy followed by a load
from the copied buffer.

The patch series is in three parts, although the middle patch is an
arm-specific tweak to handle unaligned 64-bit moves on more versions
of the Arm architecture.

Patch 1 changes how gen_highpart handles simplify_gen_subreg returning
SUBREG (MEM) so that it is more in line with the way gen_lowpart handles
this case.

Patch 2 addresses an issue in the arm backend.  Currently movmisaligndi
only supports vector targets.  This patch reworks the code so that
the pattern can work on any architecture version that supports misaligned
accesses.

Patch 3 then relaxes the gimple fold simplification of memcpy to allow
larger memcpy operations to be folded away, provided that the total size
is less than MOVE_MAX * MOVE_RATIO and provided that the machine has a
suitable SET insn for the appropriate integer mode.

With these three changes, the testcase above now optimizes to

        mov     r3, r0
        ldr     r0, [r0]        @ unaligned
        ldr     r1, [r3, #4]    @ unaligned
        bx      lr
R.

Richard Earnshaw (3):
  rtl: properly handle subreg (mem) in gen_highpart [PR102125]
  arm: expand handling of movmisalign for DImode [PR102125]
  gimple: allow more folding of memcpy [PR102125]

 gcc/config/arm/arm.md        | 16 ++++++++++++++++
 gcc/config/arm/vec-common.md |  4 ++--
 gcc/emit-rtl.c               |  8 ++++++++
 gcc/gimple-fold.c            | 16 +++++++++++-----
 4 files changed, 37 insertions(+), 7 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-09 14:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-09 11:09 [PATCH v2 0/3] lower more cases of memcpy [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 1/3] rtl: properly handle subreg (mem) in gen_highpart [PR102125] Richard Earnshaw
2021-09-09 12:23   ` Richard Biener
2021-09-09 14:39     ` Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 2/3] arm: expand handling of movmisalign for DImode [PR102125] Richard Earnshaw
2021-09-09 11:09 ` [PATCH v2 3/3] gimple: allow more folding of memcpy [PR102125] Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).