From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id D2AFF38930DE for ; Tue, 6 Apr 2021 11:24:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D2AFF38930DE Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6605131B; Tue, 6 Apr 2021 04:24:05 -0700 (PDT) Received: from [10.57.1.4] (unknown [10.57.1.4]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DD0213F73D; Tue, 6 Apr 2021 04:24:04 -0700 (PDT) Subject: Re: Fast multipliers on ARM To: Michael Robins , gcc-help@gcc.gnu.org References: From: Richard Earnshaw Message-ID: Date: Tue, 6 Apr 2021 12:23:23 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3492.2 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-help@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-help mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Apr 2021 11:24:07 -0000 On 02/04/2021 23:27, Michael Robins via Gcc-help wrote: > I am cross-compiling using "arm-none-eabi-gcc -mcpu=cortex-m0plus -O3" > for a target architecture that performs a multiply in a single cycle, > using gcc version 10.2.0 on a PC running Fedora Linux. > > Is there an option to persuade the compiler to use the multiply > instruction automatically instead of shifts and adds when multiplying by > a constant? > > In the example code below, gcc uses the trick of multiplying by a big > number instead of dividing by a small one (12 in this case). For my > target, the code from "-O3" is both longer and slower then that for "-Os". > > foobar.c: > > typedef struct {int x[3];} threeInts; > int foo(threeInts * p, threeInts * q) > { >     return p - q; > } > #pragma GCC push_options > #pragma GCC optimize("-Os") > int bar(threeInts * p, threeInts * q) > { >     return p - q; > } > #pragma GCC pop_options > > > foobar.s: > >     .cpu cortex-m0plus >     .eabi_attribute 20, 1 >     .eabi_attribute 21, 1 >     .eabi_attribute 23, 3 >     .eabi_attribute 24, 1 >     .eabi_attribute 25, 1 >     .eabi_attribute 26, 1 >     .eabi_attribute 30, 2 >     .eabi_attribute 34, 0 >     .eabi_attribute 18, 4 >     .file    "foobar.c" >     .text >     .align    1 >     .p2align 2,,3 >     .global    foo >     .arch armv6s-m >     .syntax unified >     .code    16 >     .thumb_func >     .fpu softvfp >     .type    foo, %function > foo: >     @ args = 0, pretend = 0, frame = 0 >     @ frame_needed = 0, uses_anonymous_args = 0 >     @ link register save eliminated. >     subs    r1, r0, r1 >     asrs    r1, r1, #2 >     lsls    r3, r1, #2 >     adds    r3, r3, r1 >     lsls    r0, r3, #4 >     adds    r3, r3, r0 >     lsls    r0, r3, #8 >     adds    r3, r3, r0 >     lsls    r0, r3, #16 >     adds    r0, r3, r0 >     lsls    r0, r0, #1 >     adds    r0, r0, r1 >     @ sp needed >     bx    lr >     .size    foo, .-foo >     .align    1 >     .global    bar >     .syntax unified >     .code    16 >     .thumb_func >     .fpu softvfp >     .type    bar, %function > bar: >     @ args = 0, pretend = 0, frame = 0 >     @ frame_needed = 0, uses_anonymous_args = 0 >     @ link register save eliminated. >     subs    r0, r0, r1 >     ldr    r1, .L4 >     asrs    r0, r0, #2 >     muls    r0, r1 >     @ sp needed >     bx    lr > .L5: >     .align    2 > .L4: >     .word    -1431655765 >     .size    bar, .-bar >     .ident    "GCC: (GNU) 10.2.0" > > > > Kind regards > > Mike Robins > Could you raise a report in GCC bugzilla, with your testcase attached, please? R.