public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier
@ 2021-04-06 14:02 mike.robins at talktalk dot net
2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-06 14:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
Bug ID: 99937
Summary: Optimization needed for ARM with single cycle
multiplier
Product: gcc
Version: 10.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: mike.robins at talktalk dot net
Target Milestone: ---
Created attachment 50512
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50512&action=edit
Source file(s)
I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -Wall -Wextra
-fno-strict-aliasing -fwrapv -O3 -S foobar.c" for a target architecture that
performs a multiply in a single cycle, using gcc version 10.2.0 on a PC running
Fedora Linux.
Is there an option to persuade the compiler to use the multiply instruction
automatically instead of shifts and adds when multiplying by a constant?
In the example code attached, gcc uses the trick of multiplying by a big number
instead of dividing by a small one (12 in this case). For my target, the code
from "-O3" is both longer and slower then that for "-Os".
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
@ 2021-04-06 14:20 ` rguenth at gcc dot gnu.org
2021-04-06 14:41 ` mike.robins at talktalk dot net
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-06 14:20 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
Keywords| |missed-optimization
Target| |arm
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
You need to adjust RTX costing accordingly which likely means adding a new
subtarget tuning.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
@ 2021-04-06 14:41 ` mike.robins at talktalk dot net
2021-04-07 7:11 ` rguenth at gcc dot gnu.org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-06 14:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
--- Comment #2 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #1)
> You need to adjust RTX costing accordingly which likely means adding a new
> subtarget tuning.
Hi Richard
Are you saying that this would have to be added at the GCC source level
somehow. I.e that there is no existing -mtune... or -f... to achieve this?
Mike
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
2021-04-06 14:41 ` mike.robins at talktalk dot net
@ 2021-04-07 7:11 ` rguenth at gcc dot gnu.org
2021-04-07 16:16 ` mike.robins at talktalk dot net
2021-04-07 16:23 ` clyon at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-07 7:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to mike.robins from comment #2)
> (In reply to Richard Biener from comment #1)
> > You need to adjust RTX costing accordingly which likely means adding a new
> > subtarget tuning.
>
> Hi Richard
> Are you saying that this would have to be added at the GCC source level
> somehow. I.e that there is no existing -mtune... or -f... to achieve this?
> Mike
Generally yes. I don't know the arm backend enough to tell whether there
exists an ARM variant with the multiplier behaving in this way (I suppose an
in-order,
non-pipelined m0 core might behave this way ...)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
` (2 preceding siblings ...)
2021-04-07 7:11 ` rguenth at gcc dot gnu.org
@ 2021-04-07 16:16 ` mike.robins at talktalk dot net
2021-04-07 16:23 ` clyon at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-07 16:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
--- Comment #4 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #3)
> (In reply to mike.robins from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > You need to adjust RTX costing accordingly which likely means adding a new
> > > subtarget tuning.
> >
> > Hi Richard
> > Are you saying that this would have to be added at the GCC source level
> > somehow. I.e that there is no existing -mtune... or -f... to achieve this?
> > Mike
>
> Generally yes. I don't know the arm backend enough to tell whether there
> exists an ARM variant with the multiplier behaving in this way (I suppose an
> in-order,
> non-pipelined m0 core might behave this way ...)
It appears that other compilers default to the fast multiplier implementation,
using a "small-multiply" option to tune for a smaller silicon, slower version.
See
https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multiply-instructions-on-LPC43xx-and/m-p/461571
and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html.
Is it possible that the GCC default is somehow to use the small/slow multiply
whereas it should default to the large/fast one if the small/slow version isn't
specified?
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
` (3 preceding siblings ...)
2021-04-07 16:16 ` mike.robins at talktalk dot net
@ 2021-04-07 16:23 ` clyon at gcc dot gnu.org
4 siblings, 0 replies; 6+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-04-07 16:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937
Christophe Lyon <clyon at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |clyon at gcc dot gnu.org
--- Comment #5 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Note that gcc also has another -mcpu: cortex-m0plus.small-multiply
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-04-07 16:23 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
2021-04-06 14:41 ` mike.robins at talktalk dot net
2021-04-07 7:11 ` rguenth at gcc dot gnu.org
2021-04-07 16:16 ` mike.robins at talktalk dot net
2021-04-07 16:23 ` clyon at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).