* [Bug target/42575] arm-eabi-gcc 4.2.1 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
@ 2010-01-01 17:40 ` rguenth at gcc dot gnu dot org
2010-01-04 10:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc " ramana at gcc dot gnu dot org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-01 17:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from rguenth at gcc dot gnu dot org 2010-01-01 17:40 -------
GCC 4.2 is no longer maintained, please reproduce with current trunk.
--
rguenth at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|c |target
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
2010-01-01 17:40 ` [Bug target/42575] " rguenth at gcc dot gnu dot org
@ 2010-01-04 10:54 ` ramana at gcc dot gnu dot org
2010-02-08 10:47 ` steven at gcc dot gnu dot org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: ramana at gcc dot gnu dot org @ 2010-01-04 10:54 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from ramana at gcc dot gnu dot org 2010-01-04 10:54 -------
Confirmed with trunk I get
longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
mul r1, r2, r1
mla r1, r0, r3, r1
stmfd sp!, {r4, r5}
umull r4, r5, r2, r0
add r1, r1, r5
mov r0, r4
mov r5, r1
ldmfd sp!, {r4, r5}
bx lr
r4 and r5 need not be used here - you could do with just r2 and r3 instead of
r4 and r5 here
i.e.
mul r1, r2, r1
mla r1, r0, r3, r1
umull r2, r3, r2, r0
add r1, r1, r3
mov r0, r2
bx lr
--
ramana at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Component|target |rtl-optimization
Ever Confirmed|0 |1
Keywords| |missed-optimization, ra
Last reconfirmed|0000-00-00 00:00:00 |2010-01-04 10:54:28
date| |
Summary|arm-eabi-gcc 4.2.1 64-bit |arm-eabi-gcc 64-bit multiply
|multiply weirdness |weirdness
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
2010-01-01 17:40 ` [Bug target/42575] " rguenth at gcc dot gnu dot org
2010-01-04 10:54 ` [Bug rtl-optimization/42575] arm-eabi-gcc " ramana at gcc dot gnu dot org
@ 2010-02-08 10:47 ` steven at gcc dot gnu dot org
2010-02-08 10:52 ` steven at gcc dot gnu dot org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-08 10:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from steven at gcc dot gnu dot org 2010-02-08 10:47 -------
Trunk today produces this (with -dAP hacked to print slim RTL):
.file "t.c"
.text
.align 2
.global longfunc
.type longfunc, %function
longfunc:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
@ basic block 2
@ 8 ip:SI=r2:SI*r1:SI
@ REG_DEAD: r1:SI
mul ip, r2, r1 @ 8 *arm_mulsi3/2 [length = 4]
@ 35 {[--sp:SI]=unspec[r4:SI] 2;use r5:SI;}
@ REG_DEAD: r5:SI
@ REG_DEAD: r4:SI
@ REG_FRAME_RELATED_EXPR: sequence
stmfd sp!, {r4, r5} @ 35 *push_multi [length = 4]
@ 9 r1:SI=r0:SI*r3:SI+ip:SI
@ REG_DEAD: ip:SI
@ REG_DEAD: r3:SI
@ REG_DEAD: r0:SI
mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4]
@ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@ REG_DEAD: r2:SI
umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4]
@ 11 r1:SI=r1:SI+r5:SI
@ REG_DEAD: r5:SI
add r1, r1, r5 @ 11 *arm_addsi3/1 [length = 4]
@ 12 r5:SI=r1:SI
mov r5, r1 @ 12 *arm_movsi_insn/1 [length = 4]
@ 31 r0:SI=r4:SI
mov r0, r4 @ 31 *arm_movsi_insn/1 [length = 4]
@ 38 unspec/v{return;}
ldmfd sp!, {r4, r5}
bx lr
.size longfunc, .-longfunc
.ident "GCC: (GNU) 4.5.0 20100208 (experimental) [trunk revision
156595]"
Questions for those who know ARM:
* What is the purpose of insn 12 here? It looks to me like this is dead code,
since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
wrong).
* After combine we have these two insns:
9 r138:SI=r142:SI*r3:SI+r139:SI
REG_DEAD: r3:SI
REG_DEAD: r139:SI
10 r137:DI=zero_extend(r144:SI)*zero_extend(r142:SI)
REG_DEAD: r144:SI
REG_DEAD: r142:SI
which translate to the mla insn and to the umull insn that uses r4 and r5:
@ 10 r4:DI=zero_extend(r2:SI)*zero_extend(r0:SI)
@ REG_DEAD: r2:SI
umull r4, r5, r2, r0 @ 10 *umulsidi3_nov6 [length = 4]
@ 9 r1:SI=r0:SI*r3:SI+ip:SI
@ REG_DEAD: ip:SI
@ REG_DEAD: r3:SI
@ REG_DEAD: r0:SI
mla r1, r0, r3, ip @ 9 *mulsi3addsi/2 [length = 4]
Note how the sched1 pass has switched the two insns around. The register
allocator now decides to use two new registers here, because r0 and r3 are both
live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
available in insn 10 -- but this is too late.
Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
and 10, when sched2 wants to swap them back again?
(Note, btw, how wrong the REG_DEAD notes are: r0 dies in insn 9 and is used in
insn 10, because the sched2 pass fails to update the notes when it moves insn 9
before insn 10. But that's a separate issue...)
* If I compile with -fno-schedule-insns, I still don't get the optimal code:
mul ip, r2, r1
str r4, [sp, #-4]!
mla r1, r0, r3, ip
umull r3, r4, r2, r0
add r1, r1, r4
mov r4, r1
mov r0, r3
ldmfd sp!, {r4}
bx lr
This time the compiler choses to use r3:DI in the umull, instead of r2:DI (that
is r2 and r3). I am guessing ths may be a target REG_ALLOC_ORDER issue, where
r3 comes before r2. That's another thing for a target maintainer to look into.
If IRA would select r2:DI, you would also lose the save/restore of r4 and get
the perfect code of comment #2.
So two issues:
1. Why does the sched1 pass schedule insn 10 before insn 9?
2. With -fno-schedule-insns, why does IRA prefer (r3,r4) over (r2,r3)?
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
` (2 preceding siblings ...)
2010-02-08 10:47 ` steven at gcc dot gnu dot org
@ 2010-02-08 10:52 ` steven at gcc dot gnu dot org
2010-02-22 21:06 ` drow at gcc dot gnu dot org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-02-08 10:52 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from steven at gcc dot gnu dot org 2010-02-08 10:51 -------
Add an ARM guy to the CC:
--
steven at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ramana at gcc dot gnu dot
| |org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
` (3 preceding siblings ...)
2010-02-08 10:52 ` steven at gcc dot gnu dot org
@ 2010-02-22 21:06 ` drow at gcc dot gnu dot org
2010-07-29 12:40 ` bernds at gcc dot gnu dot org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: drow at gcc dot gnu dot org @ 2010-02-22 21:06 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from drow at gcc dot gnu dot org 2010-02-22 21:06 -------
(In reply to comment #3)
> * What is the purpose of insn 12 here? It looks to me like this is dead code,
> since r5 is restored in insn 38 (although, not knowing ARM so well, I may be
> wrong).
I couldn't figure this out either. Where did it come from - was it so late
that we never DCE'd it, or does something bizarre claim to be dependent on the
value?
> Note how the sched1 pass has switched the two insns around. The register
> allocator now decides to use two new registers here, because r0 and r3 are both
> live. After RA, sched2 switches insn 9 and insn 10 again, and r2 and r3 become
> available in insn 10 -- but this is too late.
>
> Question for the ARM maintainer now is: Why does sched1 want to swap insns 9
> and 10, when sched2 wants to swap them back again?
I'm guessing, but presumably we want to separate the mul from the mla because
they're dependent; the umull isn't. But I don't know what would swap them back
again and that's probably the crux.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
` (4 preceding siblings ...)
2010-02-22 21:06 ` drow at gcc dot gnu dot org
@ 2010-07-29 12:40 ` bernds at gcc dot gnu dot org
2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: bernds at gcc dot gnu dot org @ 2010-07-29 12:40 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from bernds at gcc dot gnu dot org 2010-07-29 12:40 -------
Subject: Bug 42575
Author: bernds
Date: Thu Jul 29 12:39:57 2010
New Revision: 162678
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=162678
Log:
PR rtl-optimization/42575
* dce.c (word_dce_process_block): Renamed from byte_dce_process_block.
Argument AU removed. All callers changed. Ignore artificial refs.
Use return value of df_word_lr_simulate_defs to decide whether an insn
is necessary.
(fast_dce): Rename arg to WORD_LEVEL.
(run_word_dce): Renamed from rest_of_handle_fast_byte_dce. No longer
static.
(pass_fast_rtl_byte_dce): Delete.
* dce.h (run_word_dce): Declare.
* df-core.c (df_print_word_regset): Renamed from df_print_byteregset.
All callers changed. Simplify code to only deal with two-word regs.
* df.h (DF_WORD_LR): Renamed from DF_BYTE_LR.
(DF_WORD_LR_BB_INFO): Renamed from DF_BYTE_LR_BB_INFO.
(DF_WORD_LR_IN): Renamed from DF_BYTE_LR_IN.
(DF_WORD_LR_OUT): Renamed from DF_BYTE_LR_OUT.
(struct df_word_lr_bb_info): Renamed from df_byte_lr_bb_info.
(df_word_lr_mark_ref): Declare.
(df_word_lr_add_problem, df_word_lr_mark_ref, df_word_lr_simulate_defs,
df_word_lr_simulate_uses): Declare or rename from byte variants.
(df_byte_lr_simulate_artificial_refs_at_top,
df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
df_byte_lr_get_regno_len, df_compute_accessed_bytes): Delete
declarations.
(df_word_lr_get_bb_info): Rename from df_byte_lr_get_bb_info.
(enum df_mm): Delete.
* df-byte-scan.c: Delete file.
* df-problems.c (df_word_lr_problem_data): Renamed from
df_byte_lr_problem_data, all members deleted except for
WORD_LR_BITMAPS, which is renamed from BYTE_LR_BITMAPS. Uses changed.
(df_word_lr_expand_bitmap, df_byte_lr_simulate_artificial_refs_at_top,
df_byte_lr_simulate_artificial_refs_at_end, df_byte_lr_get_regno_start,
df_byte_lr_get_regno_len, df_byte_lr_check_regs,
df_byte_lr_confluence_0): Delete functions.
(df_word_lr_free_bb_info): Renamed from df_byte_lr_free_bb_info; all
callers changed.
(df_word_lr_alloc): Renamed from df_byte_lr_alloc; all callers changed.
Don't initialize members that were deleted, don't try to discover data
about registers. Ignore hard regs.
(df_word_lr_reset): Renamed from df_byte_lr_reset; all callers changed.
(df_word_lr_mark_ref): New function.
(df_word_lr_bb_local_compute): Renamed from
df_byte_bb_lr_local_compute; all callers changed. Use
df_word_lr_mark_ref. Assert that artificial refs don't include
pseudos. Ignore hard registers.
(df_word_lr_local_compute): Renamed from df_byte_lr_local_compute.
Assert that exit block uses don't contain pseudos.
(df_word_lr_init): Renamed from df_byte_lr_init; all callers changed.
(df_word_lr_confluence_n): Renamed from df_byte_lr_confluence_n; all
callers changed. Ignore hard regs.
(df_word_lr_transfer_function): Renamed from
df_byte_lr_transfer_function; all callers changed.
(df_word_lr_free): Renamed from df_byte_lr_free; all callers changed.
(df_word_lr_top_dump): Renamed from df_byte_lr_top_dump; all callers
changed.
(df_word_lr_bottom_dump): Renamed from df_byte_lr_bottom_dump; all
callers changed.
(problem_WORD_LR): Renamed from problem_BYTE_LR; uses changed;
confluence operator 0 set to NULL.
(df_word_lr_add_problem): Renamed from df_byte_lr_add_problem; all
callers changed.
(df_word_lr_simulate_defs): Renamed from df_byte_lr_simulate_defs.
Return bool, true if bitmap changed or insn otherwise necessary.
All callers changed. Simplify using df_word_lr_mark_ref.
(df_word_lr_simulate_uses): Renamed from df_byte_lr_simulate_uses;
all callers changed. Simplify using df_word_lr_mark_ref.
* lower-subreg.c: Include "dce.h"
(decompose_multiword_subregs): Call run_word_dce if df available.
* Makefile.in (lower-subreg.o): Adjust dependencies.
(df-byte-scan.o): Delete.
* timevar.def (TV_DF_WORD_LR): Renamed from TV_DF_BYTE_LR.
Removed:
trunk/gcc/df-byte-scan.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/Makefile.in
trunk/gcc/dce.c
trunk/gcc/dce.h
trunk/gcc/df-core.c
trunk/gcc/df-problems.c
trunk/gcc/df.h
trunk/gcc/lower-subreg.c
trunk/gcc/timevar.def
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
` (5 preceding siblings ...)
2010-07-29 12:40 ` bernds at gcc dot gnu dot org
@ 2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: mkuvyrkov at gcc dot gnu dot org @ 2010-08-18 10:34 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from mkuvyrkov at gcc dot gnu dot org 2010-08-18 10:34 -------
Subject: Bug 42575
Author: mkuvyrkov
Date: Wed Aug 18 10:34:02 2010
New Revision: 163334
URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=163334
Log:
gcc/
PR rtl-optimization/42575
* optabs.c (expand_doubleword_mult): Generate new pseudos to shorten
live ranges.
gcc/testsuite/
PR rtl-optimization/42575
* gcc.target/pr42575.c: New test.
Added:
trunk/gcc/testsuite/gcc.target/arm/pr42575.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/optabs.c
trunk/gcc/testsuite/ChangeLog
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug rtl-optimization/42575] arm-eabi-gcc 64-bit multiply weirdness
2010-01-01 17:33 [Bug c/42575] New: arm-eabi-gcc 4.2.1 64-bit multiply weirdness sliao at google dot com
` (6 preceding siblings ...)
2010-08-18 10:34 ` mkuvyrkov at gcc dot gnu dot org
@ 2010-08-18 10:43 ` mkuvyrkov at gcc dot gnu dot org
7 siblings, 0 replies; 9+ messages in thread
From: mkuvyrkov at gcc dot gnu dot org @ 2010-08-18 10:43 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from mkuvyrkov at gcc dot gnu dot org 2010-08-18 10:43 -------
Bernd did all the heavy lifting for this patch. The above patch fixes the last
piece of the problem -- extra move when compiling for ARMv7-A.
--
mkuvyrkov at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42575
^ permalink raw reply [flat|nested] 9+ messages in thread