public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier @ 2021-04-06 14:02 mike.robins at talktalk dot net 2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: mike.robins at talktalk dot net @ 2021-04-06 14:02 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 Bug ID: 99937 Summary: Optimization needed for ARM with single cycle multiplier Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: mike.robins at talktalk dot net Target Milestone: --- Created attachment 50512 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50512&action=edit Source file(s) I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -Wall -Wextra -fno-strict-aliasing -fwrapv -O3 -S foobar.c" for a target architecture that performs a multiply in a single cycle, using gcc version 10.2.0 on a PC running Fedora Linux. Is there an option to persuade the compiler to use the multiply instruction automatically instead of shifts and adds when multiplying by a constant? In the example code attached, gcc uses the trick of multiplying by a big number instead of dividing by a small one (12 in this case). For my target, the code from "-O3" is both longer and slower then that for "-Os". ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net @ 2021-04-06 14:20 ` rguenth at gcc dot gnu.org 2021-04-06 14:41 ` mike.robins at talktalk dot net ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-04-06 14:20 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|c |target Keywords| |missed-optimization Target| |arm --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- You need to adjust RTX costing accordingly which likely means adding a new subtarget tuning. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net 2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org @ 2021-04-06 14:41 ` mike.robins at talktalk dot net 2021-04-07 7:11 ` rguenth at gcc dot gnu.org ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: mike.robins at talktalk dot net @ 2021-04-06 14:41 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 --- Comment #2 from mike.robins at talktalk dot net --- (In reply to Richard Biener from comment #1) > You need to adjust RTX costing accordingly which likely means adding a new > subtarget tuning. Hi Richard Are you saying that this would have to be added at the GCC source level somehow. I.e that there is no existing -mtune... or -f... to achieve this? Mike ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net 2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org 2021-04-06 14:41 ` mike.robins at talktalk dot net @ 2021-04-07 7:11 ` rguenth at gcc dot gnu.org 2021-04-07 16:16 ` mike.robins at talktalk dot net 2021-04-07 16:23 ` clyon at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: rguenth at gcc dot gnu.org @ 2021-04-07 7:11 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 --- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to mike.robins from comment #2) > (In reply to Richard Biener from comment #1) > > You need to adjust RTX costing accordingly which likely means adding a new > > subtarget tuning. > > Hi Richard > Are you saying that this would have to be added at the GCC source level > somehow. I.e that there is no existing -mtune... or -f... to achieve this? > Mike Generally yes. I don't know the arm backend enough to tell whether there exists an ARM variant with the multiplier behaving in this way (I suppose an in-order, non-pipelined m0 core might behave this way ...) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net ` (2 preceding siblings ...) 2021-04-07 7:11 ` rguenth at gcc dot gnu.org @ 2021-04-07 16:16 ` mike.robins at talktalk dot net 2021-04-07 16:23 ` clyon at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: mike.robins at talktalk dot net @ 2021-04-07 16:16 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 --- Comment #4 from mike.robins at talktalk dot net --- (In reply to Richard Biener from comment #3) > (In reply to mike.robins from comment #2) > > (In reply to Richard Biener from comment #1) > > > You need to adjust RTX costing accordingly which likely means adding a new > > > subtarget tuning. > > > > Hi Richard > > Are you saying that this would have to be added at the GCC source level > > somehow. I.e that there is no existing -mtune... or -f... to achieve this? > > Mike > > Generally yes. I don't know the arm backend enough to tell whether there > exists an ARM variant with the multiplier behaving in this way (I suppose an > in-order, > non-pipelined m0 core might behave this way ...) It appears that other compilers default to the fast multiplier implementation, using a "small-multiply" option to tune for a smaller silicon, slower version. See https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multiply-instructions-on-LPC43xx-and/m-p/461571 and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html. Is it possible that the GCC default is somehow to use the small/slow multiply whereas it should default to the large/fast one if the small/slow version isn't specified? ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug target/99937] Optimization needed for ARM with single cycle multiplier 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net ` (3 preceding siblings ...) 2021-04-07 16:16 ` mike.robins at talktalk dot net @ 2021-04-07 16:23 ` clyon at gcc dot gnu.org 4 siblings, 0 replies; 6+ messages in thread From: clyon at gcc dot gnu.org @ 2021-04-07 16:23 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937 Christophe Lyon <clyon at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |clyon at gcc dot gnu.org --- Comment #5 from Christophe Lyon <clyon at gcc dot gnu.org> --- Note that gcc also has another -mcpu: cortex-m0plus.small-multiply ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-04-07 16:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net 2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org 2021-04-06 14:41 ` mike.robins at talktalk dot net 2021-04-07 7:11 ` rguenth at gcc dot gnu.org 2021-04-07 16:16 ` mike.robins at talktalk dot net 2021-04-07 16:23 ` clyon at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).