public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
@ 2021-08-30 8:27 jankowski938 at gmail dot com
2021-08-30 8:32 ` [Bug c/102125] " jankowski938 at gmail dot com
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: jankowski938 at gmail dot com @ 2021-08-30 8:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
Bug ID: 102125
Summary: (ARM Cortex-M3 and newer) missed optimization. memcpy
not needed operations
Product: gcc
Version: 10.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: jankowski938 at gmail dot com
Target Milestone: ---
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
compiler options:
-Ox -mthumb -mcpu=cortex-my
where x : 2,3,s y:3,4,7
```
bar64:
sub sp, sp, #8
mov r2, r0
ldr r0, [r0] @ unaligned
ldr r1, [r2, #4] @ unaligned
mov r3, sp
stmia r3!, {r0, r1}
ldrd r0, [sp]
add sp, sp, #8
bx lr
```
it is enough to:
```
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
bx lr
```
32 bit memcpy is optimized correctly:
Full example code:
```
uint64_t foo64(const uint8_t *rData1)
{
uint64_t buffer;
buffer = (((uint64_t)rData1[7]) << 56)|((uint64_t)(rData1[6]) <<
48)|((uint64_t)(rData1[5]) << 40)|(((uint64_t)rData1[4]) << 32)|
(((uint64_t)rData1[3]) <<
24)|(((uint64_t)rData1[2]) << 16)|((uint64_t)(rData1[1]) << 8)|rData1[0];
return buffer;
}
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
uint32_t foo32(const uint8_t *rData1)
{
uint32_t buffer;
buffer = (((uint32_t)rData1[3]) << 24)|(((uint32_t)rData1[2]) <<
16)|((uint32_t)(rData1[1]) << 8)|rData1[0];
return buffer;
}
uint32_t bar32(const uint8_t *rData1)
{
uint32_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
```
compiler output:
```
foo64:
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
bx lr
bar64:
sub sp, sp, #8
mov r2, r0
ldr r0, [r0] @ unaligned
ldr r1, [r2, #4] @ unaligned
mov r3, sp
stmia r3!, {r0, r1}
ldrd r0, [sp]
add sp, sp, #8
bx lr
foo32:
ldr r0, [r0] @ unaligned
bx lr
bar32:
ldr r0, [r0] @ unaligned
bx lr
```
Clang compiles without overhead:
https://godbolt.org/z/P7G7Whxqz
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
@ 2021-08-30 8:32 ` jankowski938 at gmail dot com
2021-08-30 11:40 ` [Bug target/102125] " rguenth at gcc dot gnu.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jankowski938 at gmail dot com @ 2021-08-30 8:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #1 from Piotr <jankowski938 at gmail dot com> ---
IMO it is quite important as `memcpy` type punning is considered as the safest
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
2021-08-30 8:32 ` [Bug c/102125] " jankowski938 at gmail dot com
@ 2021-08-30 11:40 ` rguenth at gcc dot gnu.org
2021-08-30 19:29 ` pinskia at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-30 11:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
Last reconfirmed| |2021-08-30
Target| |arm
Keywords| |missed-optimization
Ever confirmed|0 |1
Status|UNCONFIRMED |NEW
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
One common source of missed optimizations is gimple_fold_builtin_memory_op
which has
/* If we can perform the copy efficiently with first doing all loads
and then all stores inline it that way. Currently efficiently
means that we can load all the memory into a single integer
register which is what MOVE_MAX gives us. */
src_align = get_pointer_alignment (src);
dest_align = get_pointer_alignment (dest);
if (tree_fits_uhwi_p (len)
&& compare_tree_int (len, MOVE_MAX) <= 0
...
/* If the destination pointer is not aligned we must be able
to emit an unaligned store. */
&& (dest_align >= GET_MODE_ALIGNMENT (mode)
|| !targetm.slow_unaligned_access (mode, dest_align)
|| (optab_handler (movmisalign_optab, mode)
!= CODE_FOR_nothing)))
where here likely the MOVE_MAX limit applies (it is 4). Since we actually
do need to perform two loads the code seems to do what is intended (but
that's of course "bad" for 64bit copies on 32bit archs and likewise for
128bit copies on 64bit archs).
It's usually too late for RTL memcpy expansion to fully elide stack storage.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
2021-08-30 8:32 ` [Bug c/102125] " jankowski938 at gmail dot com
2021-08-30 11:40 ` [Bug target/102125] " rguenth at gcc dot gnu.org
@ 2021-08-30 19:29 ` pinskia at gcc dot gnu.org
2021-08-30 20:14 ` jankowski938 at gmail dot com
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-30 19:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=91674
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
I suspect PR 91674 is the same.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (2 preceding siblings ...)
2021-08-30 19:29 ` pinskia at gcc dot gnu.org
@ 2021-08-30 20:14 ` jankowski938 at gmail dot com
2021-08-31 11:54 ` rearnsha at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: jankowski938 at gmail dot com @ 2021-08-30 20:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #4 from Piotr <jankowski938 at gmail dot com> ---
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
bx lr
can be optimized even more
ldr r1, [r0, #4] @ unaligned
ldr r0, [r0] @ unaligned
bx lr
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (3 preceding siblings ...)
2021-08-30 20:14 ` jankowski938 at gmail dot com
@ 2021-08-31 11:54 ` rearnsha at gcc dot gnu.org
2021-08-31 16:42 ` rearnsha at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-08-31 11:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #5 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Testcase was not quite complete. Extending it to:
typedef unsigned long long uint64_t;
typedef unsigned long uint32_t;
typedef unsigned char uint8_t;
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
__builtin_memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
uint32_t bar32(const uint8_t *rData1)
{
uint32_t buffer;
__builtin_memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
and then looking at the optimized tree output we see:
;; Function bar64 (bar64, funcdef_no=0, decl_uid=4196, cgraph_uid=1,
symbol_order=0)
uint64_t bar64 (const uint8_t * rData1)
{
uint64_t buffer;
uint64_t _4;
<bb 2> [local count: 1073741824]:
__builtin_memcpy (&buffer, rData1_2(D), 8);
_4 = buffer;
buffer ={v} {CLOBBER};
return _4;
}
;; Function bar32 (bar32, funcdef_no=1, decl_uid=4200, cgraph_uid=2,
symbol_order=1)
uint32_t bar32 (const uint8_t * rData1)
{
unsigned int _3;
<bb 2> [local count: 1073741824]:
_3 = MEM <unsigned int> [(char * {ref-all})rData1_2(D)];
return _3;
}
So in the 32-bit case we've eliminated the memcpy at the tree level, but failed
to do that for 64-bit objects.
We probably need to add 64-bit support to the movmisalign<mode> pattern.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (4 preceding siblings ...)
2021-08-31 11:54 ` rearnsha at gcc dot gnu.org
@ 2021-08-31 16:42 ` rearnsha at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-08-31 16:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #6 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> One common source of missed optimizations is gimple_fold_builtin_memory_op
> which has [...]
Yes, this is the source of the problem. I wonder if this should be scaled by
something like MOVE_RATIO to get a more acceptable limit, especially at higher
optimization levels.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (5 preceding siblings ...)
2021-08-31 16:42 ` rearnsha at gcc dot gnu.org
@ 2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-13 10:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:
https://gcc.gnu.org/g:408e8b906632f215f6652b8851bba612cde07c25
commit r12-3480-g408e8b906632f215f6652b8851bba612cde07c25
Author: Richard Earnshaw <rearnsha@arm.com>
Date: Thu Sep 9 10:56:01 2021 +0100
rtl: directly handle MEM in gen_highpart [PR102125]
gen_lowpart_general handles forming a lowpart of a MEM by using
adjust_address to rework and validate a new version of the MEM.
Do the same for gen_highpart rather than calling simplify_gen_subreg
for this case.
gcc/ChangeLog:
PR target/102125
* emit-rtl.c (gen_highpart): Use adjust_address to handle
MEM rather than calling simplify_gen_subreg.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (6 preceding siblings ...)
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
@ 2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-13 10:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:
https://gcc.gnu.org/g:f0cfd070b68772eaaa19a3b711fbd9e85b244240
commit r12-3481-gf0cfd070b68772eaaa19a3b711fbd9e85b244240
Author: Richard Earnshaw <rearnsha@arm.com>
Date: Fri Sep 3 16:53:13 2021 +0100
arm: expand handling of movmisalign for DImode [PR102125]
DImode is currently handled only for machines with vector modes
enabled, but this is unduly restrictive and is generally better done
in core registers.
gcc/ChangeLog:
PR target/102125
* config/arm/arm.md (movmisaligndi): New define_expand.
* config/arm/vec-common.md (movmisalign<mode>): Iterate over VDQ
mode.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (7 preceding siblings ...)
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
@ 2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:29 ` rearnsha at gcc dot gnu.org
2022-03-23 14:57 ` cvs-commit at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-09-13 10:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Earnshaw <rearnsha@gcc.gnu.org>:
https://gcc.gnu.org/g:5f6a6c91d7c592cb49f7c519f289777eac09bb74
commit r12-3482-g5f6a6c91d7c592cb49f7c519f289777eac09bb74
Author: Richard Earnshaw <rearnsha@arm.com>
Date: Fri Sep 3 17:06:15 2021 +0100
gimple: allow more folding of memcpy [PR102125]
The current restriction on folding memcpy to a single element of size
MOVE_MAX is excessively cautious on most machines and limits some
significant further optimizations. So relax the restriction provided
the copy size does not exceed MOVE_MAX * MOVE_RATIO and that a SET
insn exists for moving the value into machine registers.
Note that there were already checks in place for having misaligned
move operations when one or more of the operands were unaligned.
On Arm this now permits optimizing
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
from
ldr r2, [r0] @ unaligned
sub sp, sp, #8
ldr r3, [r0, #4] @ unaligned
strd r2, [sp]
ldrd r0, [sp]
add sp, sp, #8
to
mov r3, r0
ldr r0, [r0] @ unaligned
ldr r1, [r3, #4] @ unaligned
PR target/102125 - (ARM Cortex-M3 and newer) missed optimization. memcpy
not needed operations
gcc/ChangeLog:
PR target/102125
* gimple-fold.c (gimple_fold_builtin_memory_op): Allow folding
memcpy if the size is not more than MOVE_MAX * MOVE_RATIO.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (8 preceding siblings ...)
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
@ 2021-09-13 10:29 ` rearnsha at gcc dot gnu.org
2022-03-23 14:57 ` cvs-commit at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2021-09-13 10:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
Richard Earnshaw <rearnsha at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #10 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Fixed on master branch.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug target/102125] (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
` (9 preceding siblings ...)
2021-09-13 10:29 ` rearnsha at gcc dot gnu.org
@ 2022-03-23 14:57 ` cvs-commit at gcc dot gnu.org
10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-03-23 14:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102125
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:
https://gcc.gnu.org/g:d9792f8d227cdd409c2b082ef0685b47ccfaa334
commit r12-7786-gd9792f8d227cdd409c2b082ef0685b47ccfaa334
Author: Richard Biener <rguenther@suse.de>
Date: Wed Mar 23 14:53:49 2022 +0100
target/102125 - alternative memcpy folding improvement
The following extends the heuristical memcpy folding path with the
ability to use misaligned accesses on strict-alignment targets just
like the size-based path does. That avoids regressing the following
testcase on arm
uint64_t bar64(const uint8_t *rData1)
{
uint64_t buffer;
memcpy(&buffer, rData1, sizeof(buffer));
return buffer;
}
when r12-3482-g5f6a6c91d7c592 is reverted.
2022-03-23 Richard Biener <rguenther@suse.de>
PR target/102125
* gimple-fold.cc (gimple_fold_builtin_memory_op): Allow the
use of movmisalign when either the source or destination
decl is properly aligned.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-03-23 14:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 8:27 [Bug c/102125] New: (ARM Cortex-M3 and newer) missed optimization. memcpy not needed operations jankowski938 at gmail dot com
2021-08-30 8:32 ` [Bug c/102125] " jankowski938 at gmail dot com
2021-08-30 11:40 ` [Bug target/102125] " rguenth at gcc dot gnu.org
2021-08-30 19:29 ` pinskia at gcc dot gnu.org
2021-08-30 20:14 ` jankowski938 at gmail dot com
2021-08-31 11:54 ` rearnsha at gcc dot gnu.org
2021-08-31 16:42 ` rearnsha at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:27 ` cvs-commit at gcc dot gnu.org
2021-09-13 10:29 ` rearnsha at gcc dot gnu.org
2022-03-23 14:57 ` cvs-commit at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).