public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* Fast multipliers on ARM
@ 2021-04-02 22:27 Michael Robins
  2021-04-06 11:23 ` Richard Earnshaw
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Robins @ 2021-04-02 22:27 UTC (permalink / raw)
  To: gcc-help

I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -O3" 
for a target architecture that performs a multiply in a single cycle, 
using gcc version 10.2.0 on a PC running Fedora Linux.

Is there an option to persuade the compiler to use the multiply 
instruction automatically instead of shifts and adds when multiplying by 
a constant?

In the example code below, gcc uses the trick of multiplying by a big 
number instead of dividing by a small one (12 in this case). For my 
target, the code from "-O3" is both longer and slower then that for "-Os".

foobar.c:

typedef struct {int x[3];} threeInts;
int foo(threeInts * p, threeInts * q)
{
     return p - q;
}
#pragma GCC push_options
#pragma GCC optimize("-Os")
int bar(threeInts * p, threeInts * q)
{
     return p - q;
}
#pragma GCC pop_options


foobar.s:

     .cpu cortex-m0plus
     .eabi_attribute 20, 1
     .eabi_attribute 21, 1
     .eabi_attribute 23, 3
     .eabi_attribute 24, 1
     .eabi_attribute 25, 1
     .eabi_attribute 26, 1
     .eabi_attribute 30, 2
     .eabi_attribute 34, 0
     .eabi_attribute 18, 4
     .file    "foobar.c"
     .text
     .align    1
     .p2align 2,,3
     .global    foo
     .arch armv6s-m
     .syntax unified
     .code    16
     .thumb_func
     .fpu softvfp
     .type    foo, %function
foo:
     @ args = 0, pretend = 0, frame = 0
     @ frame_needed = 0, uses_anonymous_args = 0
     @ link register save eliminated.
     subs    r1, r0, r1
     asrs    r1, r1, #2
     lsls    r3, r1, #2
     adds    r3, r3, r1
     lsls    r0, r3, #4
     adds    r3, r3, r0
     lsls    r0, r3, #8
     adds    r3, r3, r0
     lsls    r0, r3, #16
     adds    r0, r3, r0
     lsls    r0, r0, #1
     adds    r0, r0, r1
     @ sp needed
     bx    lr
     .size    foo, .-foo
     .align    1
     .global    bar
     .syntax unified
     .code    16
     .thumb_func
     .fpu softvfp
     .type    bar, %function
bar:
     @ args = 0, pretend = 0, frame = 0
     @ frame_needed = 0, uses_anonymous_args = 0
     @ link register save eliminated.
     subs    r0, r0, r1
     ldr    r1, .L4
     asrs    r0, r0, #2
     muls    r0, r1
     @ sp needed
     bx    lr
.L5:
     .align    2
.L4:
     .word    -1431655765
     .size    bar, .-bar
     .ident    "GCC: (GNU) 10.2.0"



Kind regards

Mike Robins


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Fast multipliers on ARM
  2021-04-02 22:27 Fast multipliers on ARM Michael Robins
@ 2021-04-06 11:23 ` Richard Earnshaw
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Earnshaw @ 2021-04-06 11:23 UTC (permalink / raw)
  To: Michael Robins, gcc-help

On 02/04/2021 23:27, Michael Robins via Gcc-help wrote:
> I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -O3" 
> for a target architecture that performs a multiply in a single cycle, 
> using gcc version 10.2.0 on a PC running Fedora Linux.
> 
> Is there an option to persuade the compiler to use the multiply 
> instruction automatically instead of shifts and adds when multiplying by 
> a constant?
> 
> In the example code below, gcc uses the trick of multiplying by a big 
> number instead of dividing by a small one (12 in this case). For my 
> target, the code from "-O3" is both longer and slower then that for "-Os".
> 
> foobar.c:
> 
> typedef struct {int x[3];} threeInts;
> int foo(threeInts * p, threeInts * q)
> {
>      return p - q;
> }
> #pragma GCC push_options
> #pragma GCC optimize("-Os")
> int bar(threeInts * p, threeInts * q)
> {
>      return p - q;
> }
> #pragma GCC pop_options
> 
> 
> foobar.s:
> 
>      .cpu cortex-m0plus
>      .eabi_attribute 20, 1
>      .eabi_attribute 21, 1
>      .eabi_attribute 23, 3
>      .eabi_attribute 24, 1
>      .eabi_attribute 25, 1
>      .eabi_attribute 26, 1
>      .eabi_attribute 30, 2
>      .eabi_attribute 34, 0
>      .eabi_attribute 18, 4
>      .file    "foobar.c"
>      .text
>      .align    1
>      .p2align 2,,3
>      .global    foo
>      .arch armv6s-m
>      .syntax unified
>      .code    16
>      .thumb_func
>      .fpu softvfp
>      .type    foo, %function
> foo:
>      @ args = 0, pretend = 0, frame = 0
>      @ frame_needed = 0, uses_anonymous_args = 0
>      @ link register save eliminated.
>      subs    r1, r0, r1
>      asrs    r1, r1, #2
>      lsls    r3, r1, #2
>      adds    r3, r3, r1
>      lsls    r0, r3, #4
>      adds    r3, r3, r0
>      lsls    r0, r3, #8
>      adds    r3, r3, r0
>      lsls    r0, r3, #16
>      adds    r0, r3, r0
>      lsls    r0, r0, #1
>      adds    r0, r0, r1
>      @ sp needed
>      bx    lr
>      .size    foo, .-foo
>      .align    1
>      .global    bar
>      .syntax unified
>      .code    16
>      .thumb_func
>      .fpu softvfp
>      .type    bar, %function
> bar:
>      @ args = 0, pretend = 0, frame = 0
>      @ frame_needed = 0, uses_anonymous_args = 0
>      @ link register save eliminated.
>      subs    r0, r0, r1
>      ldr    r1, .L4
>      asrs    r0, r0, #2
>      muls    r0, r1
>      @ sp needed
>      bx    lr
> .L5:
>      .align    2
> .L4:
>      .word    -1431655765
>      .size    bar, .-bar
>      .ident    "GCC: (GNU) 10.2.0"
> 
> 
> 
> Kind regards
> 
> Mike Robins
> 

Could you raise a report in GCC bugzilla, with your testcase attached, 
please?

R.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-06 11:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-02 22:27 Fast multipliers on ARM Michael Robins
2021-04-06 11:23 ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).