From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 079A53938C1F; Wed, 7 Apr 2021 16:16:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 079A53938C1F From: "mike.robins at talktalk dot net" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/99937] Optimization needed for ARM with single cycle multiplier Date: Wed, 07 Apr 2021 16:16:07 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 10.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: mike.robins at talktalk dot net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Apr 2021 16:16:08 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D99937 --- Comment #4 from mike.robins at talktalk dot net --- (In reply to Richard Biener from comment #3) > (In reply to mike.robins from comment #2) > > (In reply to Richard Biener from comment #1) > > > You need to adjust RTX costing accordingly which likely means adding = a new > > > subtarget tuning. > >=20 > > Hi Richard > > Are you saying that this would have to be added at the GCC source level > > somehow. I.e that there is no existing -mtune... or -f... to achieve th= is? > > Mike >=20 > Generally yes. I don't know the arm backend enough to tell whether there > exists an ARM variant with the multiplier behaving in this way (I suppose= an > in-order, > non-pipelined m0 core might behave this way ...) It appears that other compilers default to the fast multiplier implementati= on, using a "small-multiply" option to tune for a smaller silicon, slower versi= on. See https://community.nxp.com/t5/LPCXpresso-IDE-FAQs/Use-of-Cortex-M0-M0-multip= ly-instructions-on-LPC43xx-and/m-p/461571 and the -mtune section in https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.ht= ml. Is it possible that the GCC default is somehow to use the small/slow multip= ly whereas it should default to the large/fast one if the small/slow version i= sn't specified?=