public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier
@ 2021-04-06 14:02 mike.robins at talktalk dot net
  2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-06 14:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

            Bug ID: 99937
           Summary: Optimization needed for ARM with single cycle
                    multiplier
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mike.robins at talktalk dot net
  Target Milestone: ---

Created attachment 50512
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50512&action=edit
Source file(s)

I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -Wall -Wextra
-fno-strict-aliasing -fwrapv -O3 -S foobar.c" for a target architecture that
performs a multiply in a single cycle, using gcc version 10.2.0 on a PC running
Fedora Linux.

Is there an option to persuade the compiler to use the multiply instruction
automatically instead of shifts and adds when multiplying by a constant?

In the example code attached, gcc uses the trick of multiplying by a big number
instead of dividing by a small one (12 in this case). For my target, the code
from "-O3" is both longer and slower then that for "-Os".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
  2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
@ 2021-04-06 14:20 ` rguenth at gcc dot gnu.org
  2021-04-06 14:41 ` mike.robins at talktalk dot net
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-06 14:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|c                           |target
           Keywords|                            |missed-optimization
             Target|                            |arm

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
You need to adjust RTX costing accordingly which likely means adding a new
subtarget tuning.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
  2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
  2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
@ 2021-04-06 14:41 ` mike.robins at talktalk dot net
  2021-04-07  7:11 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-06 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

--- Comment #2 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #1)
> You need to adjust RTX costing accordingly which likely means adding a new
> subtarget tuning.

Hi Richard
Are you saying that this would have to be added at the GCC source level
somehow. I.e that there is no existing -mtune... or -f... to achieve this?
Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
  2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
  2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
  2021-04-06 14:41 ` mike.robins at talktalk dot net
@ 2021-04-07  7:11 ` rguenth at gcc dot gnu.org
  2021-04-07 16:16 ` mike.robins at talktalk dot net
  2021-04-07 16:23 ` clyon at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-04-07  7:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to mike.robins from comment #2)
> (In reply to Richard Biener from comment #1)
> > You need to adjust RTX costing accordingly which likely means adding a new
> > subtarget tuning.
> 
> Hi Richard
> Are you saying that this would have to be added at the GCC source level
> somehow. I.e that there is no existing -mtune... or -f... to achieve this?
> Mike

Generally yes.  I don't know the arm backend enough to tell whether there
exists an ARM variant with the multiplier behaving in this way (I suppose an
in-order,
non-pipelined m0 core might behave this way ...)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
  2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
                   ` (2 preceding siblings ...)
  2021-04-07  7:11 ` rguenth at gcc dot gnu.org
@ 2021-04-07 16:16 ` mike.robins at talktalk dot net
  2021-04-07 16:23 ` clyon at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: mike.robins at talktalk dot net @ 2021-04-07 16:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

--- Comment #4 from mike.robins at talktalk dot net ---
(In reply to Richard Biener from comment #3)
> (In reply to mike.robins from comment #2)
> > (In reply to Richard Biener from comment #1)
> > > You need to adjust RTX costing accordingly which likely means adding a new
> > > subtarget tuning.
> > 
> > Hi Richard
> > Are you saying that this would have to be added at the GCC source level
> > somehow. I.e that there is no existing -mtune... or -f... to achieve this?
> > Mike
> 
> Generally yes.  I don't know the arm backend enough to tell whether there
> exists an ARM variant with the multiplier behaving in this way (I suppose an
> in-order,
> non-pipelined m0 core might behave this way ...)

It appears that other compilers default to the fast multiplier implementation,
using a "small-multiply" option to tune for a smaller silicon, slower version.
See
https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multiply-instructions-on-LPC43xx-and/m-p/461571
and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html.
Is it possible that the GCC default is somehow to use the small/slow multiply
whereas it should default to the large/fast one if the small/slow version isn't
specified?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug target/99937] Optimization needed for ARM with single cycle multiplier
  2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
                   ` (3 preceding siblings ...)
  2021-04-07 16:16 ` mike.robins at talktalk dot net
@ 2021-04-07 16:23 ` clyon at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: clyon at gcc dot gnu.org @ 2021-04-07 16:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99937

Christophe Lyon <clyon at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clyon at gcc dot gnu.org

--- Comment #5 from Christophe Lyon <clyon at gcc dot gnu.org> ---
Note that gcc also has another -mcpu: cortex-m0plus.small-multiply

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-07 16:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-06 14:02 [Bug c/99937] New: Optimization needed for ARM with single cycle multiplier mike.robins at talktalk dot net
2021-04-06 14:20 ` [Bug target/99937] " rguenth at gcc dot gnu.org
2021-04-06 14:41 ` mike.robins at talktalk dot net
2021-04-07  7:11 ` rguenth at gcc dot gnu.org
2021-04-07 16:16 ` mike.robins at talktalk dot net
2021-04-07 16:23 ` clyon at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).