* [Bug rtl-optimization/24810] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
@ 2005-11-11 19:29 ` dann at godzilla dot ics dot uci dot edu
2005-11-13 2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-11 19:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from dann at godzilla dot ics dot uci dot edu 2005-11-11 19:29 -------
Created an attachment (id=10220)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10220&action=view)
Preprocessed code containing the functions that exhibit the problem
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-13 2:47 ` dann at godzilla dot ics dot uci dot edu
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-13 2:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from dann at godzilla dot ics dot uci dot edu 2005-11-13 02:47 -------
Simplified testcase:
struct cpuinfo_x86 {
unsigned char x86;
unsigned char x86_vendor;
unsigned char x86_model;
unsigned char x86_mask;
char wp_works_ok;
char hlt_works_ok;
char hard_math;
char rfu;
int cpuid_level;
unsigned long x86_capability[7];
} __attribute__((__aligned__((1 << (7)))));
struct task_struct;
extern void foo (struct task_struct *tsk);
extern void bar (struct task_struct *tsk);
extern struct cpuinfo_x86 boot_cpu_data;
static inline __attribute__((always_inline)) int
constant_test_bit(int nr, const volatile unsigned long *addr)
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
void
restore_fpu(struct task_struct *tsk)
{
if (constant_test_bit(24, boot_cpu_data.x86_capability))
foo (tsk);
else
bar (tsk);
}
The generated code for this simplified tescase shows one additional issue:
restore_fpu:
movl %eax, %edx
movl boot_cpu_data+12, %eax ; edx could be used here
testl $16777216, %eax ; and here
je .L2
movl %edx, %eax ; then all the mov %eax, %edx and mov %edx, %eax
jmp foo ; instructions could be eliminated.
.p2align 4,,7
.L2:
movl %edx, %eax
jmp bar
--
dann at godzilla dot ics dot uci dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|mov + mov + testl generated |[4.1 Regression] mov + mov +
|instead of testb |testl generated instead of
| |testb
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
2005-11-13 2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
2005-11-14 22:17 ` janis at gcc dot gnu dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-14 13:24 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
GCC target triplet|i686-pc-linux-gnu |i?86-*-*, x86_64-*-*
Target Milestone|--- |4.1.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (2 preceding siblings ...)
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
@ 2005-11-14 22:17 ` janis at gcc dot gnu dot org
2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: janis at gcc dot gnu dot org @ 2005-11-14 22:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from janis at gcc dot gnu dot org 2005-11-14 22:17 -------
A regression hunt using an i686-linux cross compiler identified the following
patch where the code generation changes:
http://gcc.gnu.org/viewcvs?view=rev&rev=99658
r99658 | hubicka | 2005-05-13 13:57:19 +0000 (Fri, 13 May 2005) | 15 lines
* gcc.dg/builtins-43.c: Use gimple dump instead of generic.
* gcc.dg/fold-xor-?.c: Likewise.
* gcc.dg/pr15784-?.c: Likewise.
* gcc.dg/pr20922-?.c: Likewise.
* gcc.dg/tree-ssa/20050128-1.c: Likewise.
* gcc.dg/tree-ssa/pr17598.c: Likewise.
* gcc.dg/tree-ssa/pr20470.c: Likewise.
* tree-inline.c (copy_body_r): Simplify substituted ADDR_EXPRs.
* tree-optimize.c (pass_gimple): Kill.
(init_tree_optimization_passes): Kill pass_gimple.
* tree-cfg.c (build_tree_cfg): Do verify_stmts to check that we are
gimple.
* tree-dump.c (dump_files): Rename .generic to .gimple.*
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (3 preceding siblings ...)
2005-11-14 22:17 ` janis at gcc dot gnu dot org
@ 2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-11-19 2:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from mmitchel at gcc dot gnu dot org 2005-11-19 02:10 -------
Should be fixed before 4.1, if possible.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (4 preceding siblings ...)
2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
@ 2005-12-18 20:53 ` hubicka at gcc dot gnu dot org
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-12-18 20:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from hubicka at gcc dot gnu dot org 2005-12-18 20:53 -------
Simplified testcase seems to work for me on 4.1 branch:
restore_fpu:
movl 4(%esp), %edx
movl boot_cpu_data+12, %eax
testl $16777216, %eax
je .L2
jmp foo
.L2:
movl %edx, 4(%esp)
jmp bar
"jmp foo" is not elliminated because we don't have pattern for conditional
tailcalls. Should not be big issue to add the neccesary patterns however.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (5 preceding siblings ...)
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
@ 2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
2005-12-19 0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-12-18 22:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from dann at godzilla dot ics dot uci dot edu 2005-12-18 22:57 -------
(In reply to comment #5)
> Simplified testcase seems to work for me on 4.1 branch:
> restore_fpu:
> movl 4(%esp), %edx
> movl boot_cpu_data+12, %eax
> testl $16777216, %eax
4.0 still does better, it uses a single "testb" instruction instead of 2
dependent
movl + testb instructions.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (6 preceding siblings ...)
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
@ 2005-12-19 0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: kazu at gcc dot gnu dot org @ 2005-12-19 0:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from kazu at gcc dot gnu dot org 2005-12-19 00:37 -------
We are basically talking about narrowing the memory being loaded for testing.
Now, can we really optimize this case? We've got
const volatile unsigned long *addr
I am not sure if "volatile" allows us to change the width of a memory read.
I know a chip that expects you to read memory at one address repeatedly to
transfer a block of data, and people probably use volatile
for this kind of case. If the compiler changes the width of memory access,
we may be screwing up something.
IMHO, if byte access is really desired, the code should be rewritten that way.
--
kazu at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kazu at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (7 preceding siblings ...)
2005-12-19 0:37 ` kazu at gcc dot gnu dot org
@ 2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2005-12-29 11:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jakub at gcc dot gnu dot org 2005-12-29 11:53 -------
I don't think this is a bug, in fact, not honoring the volatile in GCC 4.0.x
and earlier was a bug. If you want to allow byte access rather than word
access, you really need to remove the volatile keyword and then it compiles
into
restore_fpu:
testb $1, boot_cpu_data+15
je .L2
jmp foo
.L2:
jmp bar
.size restore_fpu, .-restore_fpu
.ident "GCC: (GNU) 4.2.0 20051223 (experimental)"
You should report this against Linux kernel, it shouldn't use volatile in
there.
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread