public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables
@ 2020-07-24 3:22 petr at nejedli dot cz
2020-07-24 3:35 ` [Bug target/96305] " pinskia at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: petr at nejedli dot cz @ 2020-07-24 3:22 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305
Bug ID: 96305
Summary: Unnecessary signed x unsigned multiplication with
squares or signed variables
Product: gcc
Version: 7.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c
Assignee: unassigned at gcc dot gnu.org
Reporter: petr at nejedli dot cz
Target Milestone: ---
In presence of a signed variable multiplied by itself, the compiler seems to
recognize that the result will necessarily be positive, then considers the
result as unsigned going forward, causing unnecessarily complicated code down
the line.
I have initially reproduced the issue on 7.2.1 for arm, but I have verified the
same issue happens in the latest supported by the gotbolt compiler.
---
[nenik@Pix2 ~]$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1
20170904 (release) [ARM/embedded-7-branch revision 255204]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[nenik@Pix2 ~]$ cat mull-issue.c
inline int hmull(int a, int b) {
return ((long long)a * b) >> 32;
}
int compute(int a, int b) {
int t = hmull(a,a);
return hmull(t, b);
}
[nenik@Pix2 ~]$ arm-none-eabi-gcc -Os -S -mcpu=cortex-m3 mull-issue.c
[nenik@Pix2 ~]$ cat mull-issue.s
.cpu cortex-m3
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 4
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "mull-issue.c"
.text
.align 1
.global compute
.syntax unified
.thumb
.thumb_func
.fpu softvfp
.type compute, %function
compute:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
smull r2, r3, r0, r0
push {r4, r6, r7, lr}
asrs r7, r1, #31
mul r0, r3, r7
asrs r4, r3, #31
mla r0, r1, r4, r0
umull r2, r3, r3, r1
add r0, r0, r3
pop {r4, r6, r7, pc}
.size compute, .-compute
.ident "GCC: (GNU Tools for Arm Embedded Processors 7-2017-q4-major)
7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]"
---
https://godbolt.org/z/v186Yz
Expected code should be pretty much:
smull r2, r3, r0, r0
smull r2, r0, r3, r1
bx lr
under the simple reasoning, that r3, after the first smull, would be, at most,
0x40000000 for any argument and thus while certainly positive, never having the
highest bit set. r4 after second asrs will always be zero and so would be the
multiplicative part of the following mla, removing the need to go with umull
and fixing the result.
I have got clang to generate optimal code in a more complicated piece of SW.
I can also get gcc to generate two smulls (and smaller code overall) if I add
an unknown extra argument (or even a small constant) to the "t" variable before
the second hmull call, but if I try with a constant of zero and the compiler
manages to learn that, it gets back to suboptimal code.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/96305] Unnecessary signed x unsigned multiplication with squares or signed variables
2020-07-24 3:22 [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables petr at nejedli dot cz
@ 2020-07-24 3:35 ` pinskia at gcc dot gnu.org
2020-07-24 8:46 ` [Bug target/96305] Unnecessary signed x unsigned multiplication with squares of " ktkachov at gcc dot gnu.org
2021-09-27 7:39 ` [Bug tree-optimization/96305] not detecting widen multiple after a widen multiply with shift pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2020-07-24 3:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/96305] Unnecessary signed x unsigned multiplication with squares of signed variables
2020-07-24 3:22 [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables petr at nejedli dot cz
2020-07-24 3:35 ` [Bug target/96305] " pinskia at gcc dot gnu.org
@ 2020-07-24 8:46 ` ktkachov at gcc dot gnu.org
2021-09-27 7:39 ` [Bug tree-optimization/96305] not detecting widen multiple after a widen multiply with shift pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: ktkachov at gcc dot gnu.org @ 2020-07-24 8:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305
ktkachov at gcc dot gnu.org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ktkachov at gcc dot gnu.org
--- Comment #1 from ktkachov at gcc dot gnu.org ---
This did get a bit better in GCC 10.1, which generates:
compute:
smull r0, r3, r0, r0
asrs r0, r1, #31
muls r0, r3, r0
asrs r2, r3, #31
mla r0, r1, r2, r0
umull r3, r1, r3, r1
add r0, r0, r1
bx lr
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/96305] not detecting widen multiple after a widen multiply with shift
2020-07-24 3:22 [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables petr at nejedli dot cz
2020-07-24 3:35 ` [Bug target/96305] " pinskia at gcc dot gnu.org
2020-07-24 8:46 ` [Bug target/96305] Unnecessary signed x unsigned multiplication with squares of " ktkachov at gcc dot gnu.org
@ 2021-09-27 7:39 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-09-27 7:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96305
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Unnecessary signed x |not detecting widen
|unsigned multiplication |multiple after a widen
|with squares of signed |multiply with shift
|variables |
Target|arm-*-* |arm-*-*, aarch64-*-*
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2021-09-27
Component|target |tree-optimization
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is a gimple level issue really.
We are able to figure out one widen multiple with shift but not the second one:
_10 = a_2(D) w* a_2(D);
_11 = _10 >> 32;
_3 = (long long int) b_4(D);
_6 = _3 * _11;
_7 = _6 >> 32;
_8 = (int) _7;
You can also see the issue on aarch64 too.
If we do this:
inline int hmull(int a, int b) {
return ((long long)a * b) >> 32;
}
int compute(int a, int b) {
int t = hmull(a,a);
asm("":"+r"(t));
return hmull(t, b);
}
------- CUT ----
On aarch64 we get:
smull x0, w0, w0
asr x2, x0, 32
smull x0, w1, w2
lsr x0, x0, 32
ret
which is exactly what we want.
And on arm we get:
smull r3, r0, r0, r0
smull r1, r0, r1, r0
bx lr
Gimple level:
_11 = a_2(D) w* a_2(D);
_12 = _11 >> 32;
_13 = (int) _12;
__asm__("" : "=r" t_4 : "0" _13);
_7 = b_5(D) w* t_4;
_8 = _7 >> 32;
_9 = (int) _8;
Notice w* there :).
Note the inline-asm helps even clang too.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-09-27 7:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 3:22 [Bug c/96305] New: Unnecessary signed x unsigned multiplication with squares or signed variables petr at nejedli dot cz
2020-07-24 3:35 ` [Bug target/96305] " pinskia at gcc dot gnu.org
2020-07-24 8:46 ` [Bug target/96305] Unnecessary signed x unsigned multiplication with squares of " ktkachov at gcc dot gnu.org
2021-09-27 7:39 ` [Bug tree-optimization/96305] not detecting widen multiple after a widen multiply with shift pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).