From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 85C403857C4F; Fri, 24 Jul 2020 03:22:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 85C403857C4F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1595560920; bh=o0s8kI6B6QkPEWTC+gbdRyDdepHoFNMTa16CYJFmlOw=; h=From:To:Subject:Date:From; b=b0s5uYVPsUcXQMYU8/Kun88W5CXEa608Ge+SjVXmpkS7F7zaeYiFZAgzS4fJcR6SP o/2b7VlU9COkjiMUBeUL5CRxz1Ambi+c6IdlE+UaDNW0gYyMW/c1elX5eQiwj5me2E /1w9CqnjhadpiguDOzs5gkU2P18sIdNaKOU0pGtg= From: "petr at nejedli dot cz" To: gcc-bugs@gcc.gnu.org Subject: [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables Date: Fri, 24 Jul 2020 03:22:00 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c X-Bugzilla-Version: 7.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: petr at nejedli dot cz X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jul 2020 03:22:00 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D96305 Bug ID: 96305 Summary: Unnecessary signed x unsigned multiplication with squares or signed variables Product: gcc Version: 7.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: petr at nejedli dot cz Target Milestone: --- In presence of a signed variable multiplied by itself, the compiler seems to recognize that the result will necessarily be positive, then considers the result as unsigned going forward, causing unnecessarily complicated code do= wn the line. I have initially reproduced the issue on 7.2.1 for arm, but I have verified= the same issue happens in the latest supported by the gotbolt compiler. --- [nenik@Pix2 ~]$ arm-none-eabi-gcc --version arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7= .2.1 20170904 (release) [ARM/embedded-7-branch revision 255204] Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [nenik@Pix2 ~]$ cat mull-issue.c inline int hmull(int a, int b) { return ((long long)a * b) >> 32; } int compute(int a, int b) { int t =3D hmull(a,a); return hmull(t, b); } [nenik@Pix2 ~]$ arm-none-eabi-gcc -Os -S -mcpu=3Dcortex-m3 mull-issue.c=20 [nenik@Pix2 ~]$ cat mull-issue.s=20 .cpu cortex-m3 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 1 .eabi_attribute 30, 4 .eabi_attribute 34, 1 .eabi_attribute 18, 4 .file "mull-issue.c" .text .align 1 .global compute .syntax unified .thumb .thumb_func .fpu softvfp .type compute, %function compute: @ args =3D 0, pretend =3D 0, frame =3D 0 @ frame_needed =3D 0, uses_anonymous_args =3D 0 smull r2, r3, r0, r0 push {r4, r6, r7, lr} asrs r7, r1, #31 mul r0, r3, r7 asrs r4, r3, #31 mla r0, r1, r4, r0 umull r2, r3, r3, r1 add r0, r0, r3 pop {r4, r6, r7, pc} .size compute, .-compute .ident "GCC: (GNU Tools for Arm Embedded Processors 7-2017-q4-majo= r) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]" --- https://godbolt.org/z/v186Yz Expected code should be pretty much: smull r2, r3, r0, r0 smull r2, r0, r3, r1 bx lr under the simple reasoning, that r3, after the first smull, would be, at mo= st, 0x40000000 for any argument and thus while certainly positive, never having= the highest bit set. r4 after second asrs will always be zero and so would be t= he multiplicative part of the following mla, removing the need to go with umull and fixing the result. I have got clang to generate optimal code in a more complicated piece of SW. I can also get gcc to generate two smulls (and smaller code overall) if I a= dd an unknown extra argument (or even a small constant) to the "t" variable be= fore the second hmull call, but if I try with a constant of zero and the compil= er manages to learn that, it gets back to suboptimal code.=