public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb
@ 2005-11-11 19:28 dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-11 19:28 UTC (permalink / raw)
To: gcc-bugs
Compiling i387.c from the Linux kernel using:
-nostdinc -isystem /usr/lib/gcc/i386-redhat-linux/4.0.1/include -D__KERNEL__
-Iinclude -Wall -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing
-fno-common -ffreestanding -O2 -fomit-frame-pointer -g -save-temps -msoft-float
-m32 -fno-builtin-sprintf -fno-builtin-log2 -fno-builtin-puts
-mpreferred-stack-boundary=2 -fno-unit-at-a-time -march=i686 -mtune=pentium4
-mregparm=3 -Iinclude/asm-i386/mach-default -Wdeclaration-after-statement
-Wno-pointer-sign -DKBUILD_BASENAME=i387 -DKBUILD_MODNAME=i387
-carch/i386/kernel/i387.c
(these are the flags generated by rpmbuild on a Fedora Core 4 system)
Using 4.0 the restore_fpu function looks like:
restore_fpu:
testb $1, boot_cpu_data+15
je .L23
[snip]
Using 4.1 it looks like:
restore_fpu:
movl %eax, %edx
movl boot_cpu_data+12, %eax
testl $16777216, %eax
je .L24
[snip]
Similar code sequences appear in other functions in the same file:
get_fpu_mxcsr, get_fpu_swd, get_fpu_cwd, set_fpregs.
The size of these functions increases by 5 bytes (i.e.20%)
It seems that some of these functions might be on some critical path in the
kernel, so the size increase (and maybe speed penalty) could have an impact.
For 4.0 the 00.expand dump looks like:
(insn 9 7 10 1 (set (reg/f:SI 59)
(const:SI (plus:SI (symbol_ref:SI ("boot_cpu_data") [flags 0x40]
<var_decl 0xb7ee2d
80 boot_cpu_data>)
(const_int 12 [0xc])))) -1 (nil)
(nil))
(insn 10 9 11 1 (set (reg:SI 60)
(mem/s/j:SI (reg/f:SI 59) [0 boot_cpu_data.x86_capability+0 S4 A32]))
-1 (nil)
(nil))
(insn 11 10 12 1 (parallel [
(set (reg:SI 61)
(and:SI (reg:SI 60)
(const_int 16777216 [0x1000000])))
(clobber (reg:CC 17 flags))
]) -1 (nil)
(nil))
(insn 12 11 13 1 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 61)
(const_int 0 [0x0]))) -1 (nil)
(nil))
for 4.1 is identical except for insn 10 which has mem/s/v/j:SI
instead of mem/s/j:SI.
The combine pass of 4.0 deletes insn 10, that does not happen for 4.1
For 4.1 the generated code does not change when using -Os or -march=pentium4
This is one of the causes for PR23153
--
Summary: mov + mov + testl generated instead of testb
Product: gcc
Version: 4.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dann at godzilla dot ics dot uci dot edu
GCC target triplet: i686-pc-linux-gnu
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
@ 2005-11-11 19:29 ` dann at godzilla dot ics dot uci dot edu
2005-11-13 2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-11 19:29 UTC (permalink / raw)
To: gcc-bugs
------- Comment #1 from dann at godzilla dot ics dot uci dot edu 2005-11-11 19:29 -------
Created an attachment (id=10220)
--> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10220&action=view)
Preprocessed code containing the functions that exhibit the problem
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-13 2:47 ` dann at godzilla dot ics dot uci dot edu
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-11-13 2:47 UTC (permalink / raw)
To: gcc-bugs
------- Comment #2 from dann at godzilla dot ics dot uci dot edu 2005-11-13 02:47 -------
Simplified testcase:
struct cpuinfo_x86 {
unsigned char x86;
unsigned char x86_vendor;
unsigned char x86_model;
unsigned char x86_mask;
char wp_works_ok;
char hlt_works_ok;
char hard_math;
char rfu;
int cpuid_level;
unsigned long x86_capability[7];
} __attribute__((__aligned__((1 << (7)))));
struct task_struct;
extern void foo (struct task_struct *tsk);
extern void bar (struct task_struct *tsk);
extern struct cpuinfo_x86 boot_cpu_data;
static inline __attribute__((always_inline)) int
constant_test_bit(int nr, const volatile unsigned long *addr)
{
return ((1UL << (nr & 31)) & (addr[nr >> 5])) != 0;
}
void
restore_fpu(struct task_struct *tsk)
{
if (constant_test_bit(24, boot_cpu_data.x86_capability))
foo (tsk);
else
bar (tsk);
}
The generated code for this simplified tescase shows one additional issue:
restore_fpu:
movl %eax, %edx
movl boot_cpu_data+12, %eax ; edx could be used here
testl $16777216, %eax ; and here
je .L2
movl %edx, %eax ; then all the mov %eax, %edx and mov %edx, %eax
jmp foo ; instructions could be eliminated.
.p2align 4,,7
.L2:
movl %edx, %eax
jmp bar
--
dann at godzilla dot ics dot uci dot edu changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|mov + mov + testl generated |[4.1 Regression] mov + mov +
|instead of testb |testl generated instead of
| |testb
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
2005-11-13 2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
@ 2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
2005-11-14 22:17 ` janis at gcc dot gnu dot org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2005-11-14 13:24 UTC (permalink / raw)
To: gcc-bugs
--
pinskia at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
GCC target triplet|i686-pc-linux-gnu |i?86-*-*, x86_64-*-*
Target Milestone|--- |4.1.0
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (2 preceding siblings ...)
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
@ 2005-11-14 22:17 ` janis at gcc dot gnu dot org
2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: janis at gcc dot gnu dot org @ 2005-11-14 22:17 UTC (permalink / raw)
To: gcc-bugs
------- Comment #3 from janis at gcc dot gnu dot org 2005-11-14 22:17 -------
A regression hunt using an i686-linux cross compiler identified the following
patch where the code generation changes:
http://gcc.gnu.org/viewcvs?view=rev&rev=99658
r99658 | hubicka | 2005-05-13 13:57:19 +0000 (Fri, 13 May 2005) | 15 lines
* gcc.dg/builtins-43.c: Use gimple dump instead of generic.
* gcc.dg/fold-xor-?.c: Likewise.
* gcc.dg/pr15784-?.c: Likewise.
* gcc.dg/pr20922-?.c: Likewise.
* gcc.dg/tree-ssa/20050128-1.c: Likewise.
* gcc.dg/tree-ssa/pr17598.c: Likewise.
* gcc.dg/tree-ssa/pr20470.c: Likewise.
* tree-inline.c (copy_body_r): Simplify substituted ADDR_EXPRs.
* tree-optimize.c (pass_gimple): Kill.
(init_tree_optimization_passes): Kill pass_gimple.
* tree-cfg.c (build_tree_cfg): Do verify_stmts to check that we are
gimple.
* tree-dump.c (dump_files): Rename .generic to .gimple.*
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (3 preceding siblings ...)
2005-11-14 22:17 ` janis at gcc dot gnu dot org
@ 2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: mmitchel at gcc dot gnu dot org @ 2005-11-19 2:10 UTC (permalink / raw)
To: gcc-bugs
------- Comment #4 from mmitchel at gcc dot gnu dot org 2005-11-19 02:10 -------
Should be fixed before 4.1, if possible.
--
mmitchel at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (4 preceding siblings ...)
2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
@ 2005-12-18 20:53 ` hubicka at gcc dot gnu dot org
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu dot org @ 2005-12-18 20:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #5 from hubicka at gcc dot gnu dot org 2005-12-18 20:53 -------
Simplified testcase seems to work for me on 4.1 branch:
restore_fpu:
movl 4(%esp), %edx
movl boot_cpu_data+12, %eax
testl $16777216, %eax
je .L2
jmp foo
.L2:
movl %edx, 4(%esp)
jmp bar
"jmp foo" is not elliminated because we don't have pattern for conditional
tailcalls. Should not be big issue to add the neccesary patterns however.
Honza
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (5 preceding siblings ...)
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
@ 2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
2005-12-19 0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: dann at godzilla dot ics dot uci dot edu @ 2005-12-18 22:57 UTC (permalink / raw)
To: gcc-bugs
------- Comment #6 from dann at godzilla dot ics dot uci dot edu 2005-12-18 22:57 -------
(In reply to comment #5)
> Simplified testcase seems to work for me on 4.1 branch:
> restore_fpu:
> movl 4(%esp), %edx
> movl boot_cpu_data+12, %eax
> testl $16777216, %eax
4.0 still does better, it uses a single "testb" instruction instead of 2
dependent
movl + testb instructions.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (6 preceding siblings ...)
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
@ 2005-12-19 0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: kazu at gcc dot gnu dot org @ 2005-12-19 0:37 UTC (permalink / raw)
To: gcc-bugs
------- Comment #7 from kazu at gcc dot gnu dot org 2005-12-19 00:37 -------
We are basically talking about narrowing the memory being loaded for testing.
Now, can we really optimize this case? We've got
const volatile unsigned long *addr
I am not sure if "volatile" allows us to change the width of a memory read.
I know a chip that expects you to read memory at one address repeatedly to
transfer a block of data, and people probably use volatile
for this kind of case. If the compiler changes the width of memory access,
we may be screwing up something.
IMHO, if byte access is really desired, the code should be rewritten that way.
--
kazu at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |kazu at gcc dot gnu dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug rtl-optimization/24810] [4.1/4.2 Regression] mov + mov + testl generated instead of testb
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
` (7 preceding siblings ...)
2005-12-19 0:37 ` kazu at gcc dot gnu dot org
@ 2005-12-29 11:53 ` jakub at gcc dot gnu dot org
8 siblings, 0 replies; 10+ messages in thread
From: jakub at gcc dot gnu dot org @ 2005-12-29 11:53 UTC (permalink / raw)
To: gcc-bugs
------- Comment #8 from jakub at gcc dot gnu dot org 2005-12-29 11:53 -------
I don't think this is a bug, in fact, not honoring the volatile in GCC 4.0.x
and earlier was a bug. If you want to allow byte access rather than word
access, you really need to remove the volatile keyword and then it compiles
into
restore_fpu:
testb $1, boot_cpu_data+15
je .L2
jmp foo
.L2:
jmp bar
.size restore_fpu, .-restore_fpu
.ident "GCC: (GNU) 4.2.0 20051223 (experimental)"
You should report this against Linux kernel, it shouldn't use volatile in
there.
--
jakub at gcc dot gnu dot org changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution| |INVALID
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24810
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2005-12-29 11:53 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-11-11 19:28 [Bug rtl-optimization/24810] New: mov + mov + testl generated instead of testb dann at godzilla dot ics dot uci dot edu
2005-11-11 19:29 ` [Bug rtl-optimization/24810] " dann at godzilla dot ics dot uci dot edu
2005-11-13 2:47 ` [Bug rtl-optimization/24810] [4.1 Regression] " dann at godzilla dot ics dot uci dot edu
2005-11-14 13:24 ` pinskia at gcc dot gnu dot org
2005-11-14 22:17 ` janis at gcc dot gnu dot org
2005-11-19 2:10 ` mmitchel at gcc dot gnu dot org
2005-12-18 20:53 ` [Bug rtl-optimization/24810] [4.1/4.2 " hubicka at gcc dot gnu dot org
2005-12-18 22:57 ` dann at godzilla dot ics dot uci dot edu
2005-12-19 0:37 ` kazu at gcc dot gnu dot org
2005-12-29 11:53 ` jakub at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).