public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/68106] New: c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64
@ 2015-10-26 22:29 zsojka at seznam dot cz
2015-10-30 17:37 ` [Bug rtl-optimization/68106] " vmakarov at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: zsojka at seznam dot cz @ 2015-10-26 22:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68106
Bug ID: 68106
Summary: c-c++-common/torture/builtin-arith-overflow-11.c FAILs
with -flra-remat @ aarch64
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: zsojka at seznam dot cz
Target Milestone: ---
Created attachment 36594
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36594&action=edit
reduced testcase
The testcase fails at aarch64 at both trunk and 5-branch with -O -flra-remat. I
haven't managed to generate wrong code with -O2, which enables -flra-remat. I
am using qemu userspace emulation to run the testcase.
$ gcc -O -flra-remat testcase.c
$ ./a.out
qemu: uncaught target signal 6 (Aborted) - core dumped
Aborted
$ gcc -O -flra-remat testcase.c -S
$ gcc -O -fno-lra-remat testcase.c -S -o testcase-no-lra-remat.s
$ diff -u testcase.s testcase-no-lra-remat.s
...
@@ -104,20 +104,21 @@
.L25:
lsl x20, x23, 56
add x19, x20, x26
+ str x19, [x29, 128]
asr x22, x19, 63
adds x27, x21, x19
adc x0, x24, x22
- str x0, [x29, 120]
- add x2, x29, 140
+ str x0, [x29, 136]
+ add x2, x29, 156
mov x1, x19
mov x0, x25
bl upseu
cmp x0, x27
bne .L14
- adc x0, x24, x22
+ ldr x0, [x29, 136]
cmp x0, xzr
cset w1, ne
- ldr w0, [x29, 140]
+ ldr w0, [x29, 156]
cmp w1, w0
bne .L14
subs x1, x21, x20
...
If I am reading the assembly correctly, the important difference is:
...
adds x27, x21, x19
adc x0, x24, x22
- str x0, [x29, 120]
- add x2, x29, 140
+ str x0, [x29, 136]
+ add x2, x29, 156
...
- adc x0, x24, x22
+ ldr x0, [x29, 136]
...
Normally, without lra-remat, the result of "adc x0, x24, x22" is stored to
the stack and then reloaded.
With -flra-remat, the value is stored as well, but later, "adc" is used to
recompute the value again - that saves one access to the stack, but cpsr has
changed in the meantime, so it is using wrong value of the C bit.
Tested revisions:
r229293 - FAIL
5-branch r229305 - FAIL
4_9-branch - doesn't know -flra-remat
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/68106] c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64
2015-10-26 22:29 [Bug rtl-optimization/68106] New: c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64 zsojka at seznam dot cz
@ 2015-10-30 17:37 ` vmakarov at gcc dot gnu.org
2015-10-30 17:45 ` vmakarov at gcc dot gnu.org
2015-10-30 17:52 ` vmakarov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-10-30 17:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68106
--- Comment #1 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Zdenek Sojka from comment #0)
> Created attachment 36594 [details]
> reduced testcase
>
> The testcase fails at aarch64 at both trunk and 5-branch with -O
> -flra-remat. I haven't managed to generate wrong code with -O2, which
> enables -flra-remat. I am using qemu userspace emulation to run the testcase.
>
> $ gcc -O -flra-remat testcase.c
> $ ./a.out
> qemu: uncaught target signal 6 (Aborted) - core dumped
> Aborted
>
> $ gcc -O -flra-remat testcase.c -S
> $ gcc -O -fno-lra-remat testcase.c -S -o testcase-no-lra-remat.s
> $ diff -u testcase.s testcase-no-lra-remat.s
> ...
> @@ -104,20 +104,21 @@
> .L25:
> lsl x20, x23, 56
> add x19, x20, x26
> + str x19, [x29, 128]
> asr x22, x19, 63
> adds x27, x21, x19
> adc x0, x24, x22
> - str x0, [x29, 120]
> - add x2, x29, 140
> + str x0, [x29, 136]
> + add x2, x29, 156
> mov x1, x19
> mov x0, x25
> bl upseu
> cmp x0, x27
> bne .L14
> - adc x0, x24, x22
> + ldr x0, [x29, 136]
> cmp x0, xzr
> cset w1, ne
> - ldr w0, [x29, 140]
> + ldr w0, [x29, 156]
> cmp w1, w0
> bne .L14
> subs x1, x21, x20
> ...
>
> If I am reading the assembly correctly, the important difference is:
> ...
> adds x27, x21, x19
> adc x0, x24, x22
> - str x0, [x29, 120]
> - add x2, x29, 140
> + str x0, [x29, 136]
> + add x2, x29, 156
> ...
> - adc x0, x24, x22
> + ldr x0, [x29, 136]
> ...
>
> Normally, without lra-remat, the result of "adc x0, x24, x22" is stored
> to the stack and then reloaded.
> With -flra-remat, the value is stored as well, but later, "adc" is used to
> recompute the value again - that saves one access to the stack, but cpsr has
> changed in the meantime, so it is using wrong value of the C bit.
>
> Tested revisions:
> r229293 - FAIL
> 5-branch r229305 - FAIL
> 4_9-branch - doesn't know -flra-remat
I was not able to reproduce it on the current trunk. But I've reproduced it on
r229293. I've been working on it and I am planning to submit a patch for the
trunk today.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/68106] c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64
2015-10-26 22:29 [Bug rtl-optimization/68106] New: c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64 zsojka at seznam dot cz
2015-10-30 17:37 ` [Bug rtl-optimization/68106] " vmakarov at gcc dot gnu.org
@ 2015-10-30 17:45 ` vmakarov at gcc dot gnu.org
2015-10-30 17:52 ` vmakarov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-10-30 17:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68106
--- Comment #2 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Fri Oct 30 17:45:16 2015
New Revision: 229593
URL: https://gcc.gnu.org/viewcvs?rev=229593&root=gcc&view=rev
Log:
2015-10-30 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/68106
* lra-remat.c (input_regno_present_p): Process hard regs
explicitly present in machine description insns.
(call_used_input_regno_present_p): Ditto.
(calculate_gen_cands): Ditto.
(do_remat): Ditto.
2015-10-30 Vladimir Makarov <vmakarov@redhat.com>
PR rtl-optimization/68106
* gcc.target/aarch64/pr68106.c: New.
Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr68106.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/lra-remat.c
trunk/gcc/testsuite/ChangeLog
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug rtl-optimization/68106] c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64
2015-10-26 22:29 [Bug rtl-optimization/68106] New: c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64 zsojka at seznam dot cz
2015-10-30 17:37 ` [Bug rtl-optimization/68106] " vmakarov at gcc dot gnu.org
2015-10-30 17:45 ` vmakarov at gcc dot gnu.org
@ 2015-10-30 17:52 ` vmakarov at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2015-10-30 17:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68106
--- Comment #3 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
The problem was in ignoring hard registers explicitly present in machine
description insns by LRA rematerialization subpass.
I'll wait for a few days before backporting this in gcc-5-branch.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-10-30 17:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-26 22:29 [Bug rtl-optimization/68106] New: c-c++-common/torture/builtin-arith-overflow-11.c FAILs with -flra-remat @ aarch64 zsojka at seznam dot cz
2015-10-30 17:37 ` [Bug rtl-optimization/68106] " vmakarov at gcc dot gnu.org
2015-10-30 17:45 ` vmakarov at gcc dot gnu.org
2015-10-30 17:52 ` vmakarov at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).