public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug optimization/6585] useless memory store instructions on x86
       [not found] <20020506130600.6585.bruno@clisp.org>
@ 2003-07-04 11:09 ` steven at gcc dot gnu dot org
  2003-07-10 20:09 ` [Bug optimization/6585] Reduntant store/load instruction pairs on ix86 steven at gcc dot gnu dot org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu dot org @ 2003-07-04 11:09 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|0000-00-00 00:00:00         |2003-07-04 11:09:34
               date|                            |
   Target Milestone|---                         |3.4


------- Additional Comments From steven at gcc dot gnu dot org  2003-07-04 11:09 -------
AFAICT this looks a little better, but still not optimal:

        .file   "6585.c"
# GNU C version 3.4 20030703 (experimental) (i686-pc-linux-gnu)
#       compiled by GNU C version 3.4 20030703 (experimental).
# GGC heuristics: --param ggc-min-expand=47 --param ggc-min-heapsize=31916
# options passed:  -O3 -fverbose-asm -march=i686 -mtune=i686
# -fomit-frame-pointer
# options enabled:  -feliminate-unused-debug-types -fdefer-pop
# -fomit-frame-pointer -foptimize-sibling-calls -funit-at-a-time
# -fcse-follow-jumps -fcse-skip-blocks -fexpensive-optimizations
# -fthread-jumps -fstrength-reduce -funswitch-loops -fpeephole -fforce-mem
# -ffunction-cse -fkeep-static-consts -fcaller-saves -fpcc-struct-return
# -fgcse -fgcse-lm -fgcse-sm -floop-optimize -fcrossjumping -fif-conversion
# -fif-conversion2 -frerun-cse-after-loop -frerun-loop-opt
# -fdelete-null-pointer-checks -fschedule-insns2 -fsched-interblock
# -fsched-spec -fbranch-count-reg -freorder-blocks -freorder-functions
# -frename-registers -fcprop-registers -fcommon -fverbose-asm -fgnu-linker
# -fregmove -foptimize-register-move -fargument-alias -fstrict-aliasing
# -fmerge-constants -fzero-initialized-in-bss -fident -fpeephole2
# -fguess-branch-probability -fmath-errno -ftrapping-math -m80387
# -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387
# -maccumulate-outgoing-args -mno-red-zone -mtls-direct-seg-refs
# -mtune=i686 -march=i686
                                                                       
        .text
        .p2align 4,,15
.globl mul
        .type   mul, @function
mul:
        subl    $20, %esp       #
        movl    32(%esp), %ecx  # load b0 in ecx
        movl    %edi, 12(%esp)  # save %edi
        movl    24(%esp), %eax  # load a0 in eax
        movl    %ebx, 8(%esp)   # save %ebx
        movl    36(%esp), %edi  # load b1 in edi
        movl    %ebp, 16(%esp)  # save %ebp             !!! USELESS
        movl    16(%esp), %ebp  # restore %ebp          !!! USELESS
        mull    %ecx            # %edx:%eax := a0*b0
        movl    %edx, %ebx      # save high part (==%edx) in %ebx
        movl    28(%esp), %edx  # load a1 in %edx
        movl    %eax, (%esp)    # save %eax (now a0*b0 in ebx:(%esp))
        movl    24(%esp), %eax  # load a0 in eax
        imull   %edx, %ecx      # %eax := a1*b0
        imull   %edi, %eax      # %eax := b1*%eax
        movl    12(%esp), %edi  # restore edi
        addl    %eax, %ebx      # compute final result...
        leal    (%ebx,%ecx), %eax       #...
        movl    8(%esp), %ebx   # restore ebx
        movl    %eax, 4(%esp)   # Why not just: movl %eax,%edx
        movl    4(%esp), %edx   # !!! USELESS
        movl    (%esp), %eax    # movl  (%esp), %eax
        addl    $20, %esp       #
        ret
        .size   mul, .-mul
        .section        .note.GNU-stack,"",@progbits
        .ident  "GCC: (GNU) 3.4 20030703 (experimental)"

Especially the last "USELESS" is surprising.  Why go through the stack here?!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/6585] Reduntant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
  2003-07-04 11:09 ` [Bug optimization/6585] useless memory store instructions on x86 steven at gcc dot gnu dot org
@ 2003-07-10 20:09 ` steven at gcc dot gnu dot org
  2003-08-23  1:38 ` [Bug optimization/6585] Redundant " dhazeghi at yahoo dot com
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu dot org @ 2003-07-10 20:09 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|useless memory store        |Reduntant store/load
                   |instructions on x86         |instruction pairs on ix86


------- Additional Comments From steven at gcc dot gnu dot org  2003-07-10 20:09 -------
After looking into this a bit further and after consulting with Zdenek, I
believe we should blaim this bug on something in the register allocator.  But
then again the same problem exists with -fnew-ra, so maybe it's something in the
i386 backend (register sets???).  Maybe someone can try on a different 32bits
target and see if the problem exists there as well.

Anyway, before register allocation it
looks reasonable, but after global reg alloc we get this insns:

(insn 39 18 24 0 (set (mem:SI (plus:SI (reg/f:SI 7 esp)
                (const_int 4 [0x4])) [3 S4 A8])
        (reg:SI 2 ecx)) 39 {*movsi_1_nointernunit} (nil)
    (nil))
 
(note:HI 24 39 27 0 NOTE_INSN_FUNCTION_END)
 
(insn:HI 27 24 30 0 (set (reg/i:DI 0 rax [ <result> ])
        (mem:DI (reg/f:SI 7 esp) [3 S8 A8])) 60 {*movdi_2} (insn_list 18
(nil))
    (nil))

%ecx is put in the stack and subsequently loaded into %edx when the
result insn is expanded.  (After some black magic apparently %ecx is
replaced with %ebx post reload...)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
  2003-07-04 11:09 ` [Bug optimization/6585] useless memory store instructions on x86 steven at gcc dot gnu dot org
  2003-07-10 20:09 ` [Bug optimization/6585] Reduntant store/load instruction pairs on ix86 steven at gcc dot gnu dot org
@ 2003-08-23  1:38 ` dhazeghi at yahoo dot com
  2003-11-25  7:51 ` pinskia at gcc dot gnu dot org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: dhazeghi at yahoo dot com @ 2003-08-23  1:38 UTC (permalink / raw)
  To: gcc-bugs

PLEASE REPLY TO gcc-bugzilla@gcc.gnu.org ONLY, *NOT* gcc-bugs@gcc.gnu.org.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


dhazeghi at yahoo dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|3.4                         |---


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (2 preceding siblings ...)
  2003-08-23  1:38 ` [Bug optimization/6585] Redundant " dhazeghi at yahoo dot com
@ 2003-11-25  7:51 ` pinskia at gcc dot gnu dot org
  2004-03-17  8:03 ` kazu at cs dot umass dot edu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2003-11-25  7:51 UTC (permalink / raw)
  To: gcc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
   Last reconfirmed|2003-07-04 11:09:34         |2003-11-25 07:51:11
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (3 preceding siblings ...)
  2003-11-25  7:51 ` pinskia at gcc dot gnu dot org
@ 2004-03-17  8:03 ` kazu at cs dot umass dot edu
  2004-03-17 12:10 ` bruno at clisp dot org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: kazu at cs dot umass dot edu @ 2004-03-17  8:03 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From kazu at cs dot umass dot edu  2004-03-17 08:03 -------
With -O3 -fnew-ra -march=i686 -mtune=i686 -fomit-frame-pointer, I get:

mul:
	subl	$20, %esp
	movl	%esi, 12(%esp)	; save %esi
	movl	24(%esp), %eax
	movl	32(%esp), %esi
	movl	28(%esp), %ecx
	movl	%edi, 16(%esp)	; save %edi
	movl	36(%esp), %edi
	movl	%eax, (%esp)
	mull	%esi
	movl	%ecx, 4(%esp)
	movl	%ebx, 8(%esp)
	movl	%eax, %ecx
	movl	(%esp), %eax
	imull	%eax, %edi
	movl	4(%esp), %eax
	imull	%esi, %eax
	movl	12(%esp), %esi	; restore %esi
	leal	(%edx,%edi), %edi
	leal	(%edi,%eax), %ebx
	movl	16(%esp), %edi	; restore %edi
	movl	%ecx, %eax
	movl	%ebx, %edx
	movl	8(%esp), %ebx	; restore %ebx
	addl	$20, %esp
	ret

Note that those "why not" instructions are gone.
"USELESS" instructions are gone even without -fnew-ra as of today's mainline.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (4 preceding siblings ...)
  2004-03-17  8:03 ` kazu at cs dot umass dot edu
@ 2004-03-17 12:10 ` bruno at clisp dot org
  2004-10-11  2:57 ` [Bug rtl-optimization/6585] " pinskia at gcc dot gnu dot org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: bruno at clisp dot org @ 2004-03-17 12:10 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bruno at clisp dot org  2004-03-17 12:10 -------
This result is much better, indeed. Still there is room for two more optimizations: 
 
1) The final "movl %ebx, %edx" instruction could be removed if the preceding lea 
instruction would write its results into %edx, not into %ebx. 
 
2) Storing a0 and a1 in temporary stack locations is useless since a0 and a1 
already come from the stack, and data analysis could find out that they are 
read-only. 
 
Commented listing: 
 
mul: 
	subl	$20, %esp 
	movl	%esi, 12(%esp)				; save %esi 
	movl	24(%esp), %eax		a0 
	movl	32(%esp), %esi		b0 
	movl	28(%esp), %ecx		a1 
	movl	%edi, 16(%esp)				; save %edi 
	movl	36(%esp), %edi		b1 
	movl	%eax, (%esp)				USELESS! Use 24(%esp) instead 
	mull	%esi			%edx:%eax := a0*b0 
	movl	%ecx, 4(%esp)				USELESS! Use 28(%esp) instead 
	movl	%ebx, 8(%esp)				; save %ebx 
	movl	%eax, %ecx 
	movl	(%esp), %eax 
	imull	%eax, %edi		a0*b1 
	movl	4(%esp), %eax 
	imull	%esi, %eax		a1*b0 
	movl	12(%esp), %esi				; restore %esi 
	leal	(%edx,%edi), %edi	hi+a0*b1 
	leal	(%edi,%eax), %ebx	hi+a0*b1+a1*b0	COULD BE SIMPLIFIED 
	movl	16(%esp), %edi				; restore %edi 
	movl	%ecx, %eax		lo 
	movl	%ebx, %edx				USELESS! 
	movl	8(%esp), %ebx				; restore %ebx 
	addl	$20, %esp 
	ret 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (5 preceding siblings ...)
  2004-03-17 12:10 ` bruno at clisp dot org
@ 2004-10-11  2:57 ` pinskia at gcc dot gnu dot org
  2004-10-11 11:55 ` bruno at clisp dot org
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-10-11  2:57 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-10-11 02:57 -------
Here is the latest asm from the mainline:
mul:
        subl    $20, %esp
        movl    32(%esp), %ecx
        movl    24(%esp), %eax
        movl    %ebx, 8(%esp)
        movl    36(%esp), %ebx
        movl    %edi, 12(%esp)
        movl    24(%esp), %edi
        movl    %ebp, 16(%esp)
        mull    %ecx
        imull   28(%esp), %ecx
        imull   %ebx, %edi
        movl    8(%esp), %ebx
        movl    %edx, %ebp
        movl    %eax, (%esp)
        addl    %edi, %ebp
        movl    (%esp), %eax
        leal    (%ebp,%ecx), %ecx
        movl    12(%esp), %edi
        movl    %ecx, 4(%esp)
        movl    16(%esp), %ebp
        movl    4(%esp), %edx
        addl    $20, %esp
        ret

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (6 preceding siblings ...)
  2004-10-11  2:57 ` [Bug rtl-optimization/6585] " pinskia at gcc dot gnu dot org
@ 2004-10-11 11:55 ` bruno at clisp dot org
  2005-06-26 13:35 ` steven at gcc dot gnu dot org
  2005-06-27 11:50 ` bruno at clisp dot org
  9 siblings, 0 replies; 10+ messages in thread
From: bruno at clisp dot org @ 2004-10-11 11:55 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bruno at clisp dot org  2004-10-11 11:55 -------
This result is even better: shorter than the previous ones, and there are 
no useless moves between registers any more. 
 
However, there are more useless moves from register to stack slot and back 
from stack slot to register. They could be eliminated. 
 
Commented listing: 
 
mul: 
        subl    $20, %esp 
        movl    32(%esp), %ecx		b0 
        movl    24(%esp), %eax		a0 
        movl    %ebx, 8(%esp)				; save %ebx 
        movl    36(%esp), %ebx		b1 
        movl    %edi, 12(%esp)				; save %edi 
        movl    24(%esp), %edi		a0 
        movl    %ebp, 16(%esp)				; save %ebp 
        mull    %ecx			%edx:%eax := a0*b0 
        imull   28(%esp), %ecx		a1*b0 
        imull   %ebx, %edi		a0*b1 
        movl    8(%esp), %ebx				; restore %ebx 
        movl    %edx, %ebp		hi 
        movl    %eax, (%esp)				USELESS! 
        addl    %edi, %ebp		hi+a0*b1 
        movl    (%esp), %eax				USELESS! 
        leal    (%ebp,%ecx), %ecx	hi+a0*b1+a1*b0	COULD GO INTO %edx 
DIRECTLY 
        movl    12(%esp), %edi				; restore %edi 
        movl    %ecx, 4(%esp)				USELESS! 
        movl    16(%esp), %ebp				; restore %ebp 
        movl    4(%esp), %edx				USELESS! 
        addl    $20, %esp 
        ret 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (7 preceding siblings ...)
  2004-10-11 11:55 ` bruno at clisp dot org
@ 2005-06-26 13:35 ` steven at gcc dot gnu dot org
  2005-06-27 11:50 ` bruno at clisp dot org
  9 siblings, 0 replies; 10+ messages in thread
From: steven at gcc dot gnu dot org @ 2005-06-26 13:35 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From steven at gcc dot gnu dot org  2005-06-26 13:35 -------
Today's results ("-O2 -m32 -march=i686 -mtune=i686 -fomit-frame-pointer"): 
 
        .file   "t.c" 
        .text 
        .p2align 4,,15 
.globl mul 
        .type   mul, @function 
mul: 
        subl    $12, %esp               # get space to save three registers 
        movl    %ebx, (%esp)            # save %ebx 
        movl    20(%esp), %edx          # %edx <- a1  COULD GO INTO %esi 
        movl    28(%esp), %ebx          # %ebx <- b1  COULD GO INTO %edi 
        movl    16(%esp), %eax          # %eax <- a0 
        movl    24(%esp), %ecx          # %ecx <- b0 
        movl    %esi, 4(%esp)           # save %esi 
        movl    %edx, %esi              # %esi <- a1 
        movl    %edi, 8(%esp)           # save %edi 
        movl    %ebx, %edi              # %edi <- b1 
        movl    (%esp), %ebx            # restore %ebx 
        imull   %eax, %edi              # %edi <- a0*b1 
        imull   %ecx, %esi              # %esi <- b0*a1 
        mull    %ecx                    # %edx:%eax := a0*b0 
        addl    %edi, %esi              # %esi <- a0*b1 + b0*a1 
        movl    8(%esp), %edi           # restore %edi 
        leal    (%esi,%edx), %edx       # %edx <- a0*b1 + b0*a1 + hi(a0*b0) 
        movl    4(%esp), %esi           # %restore %esi 
        addl    $12, %esp               # free stack space 
        ret                             # return result in %edx:%eax 
        .size   mul, .-mul 
        .ident  "GCC: (GNU) 4.1.0 20050626 (experimental)" 
        .section        .note.GNU-stack,"",@progbits 
 
There are still the questionable moves through %ebx and %edx, but it is 
still better than before. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug rtl-optimization/6585] Redundant store/load instruction pairs on ix86
       [not found] <20020506130600.6585.bruno@clisp.org>
                   ` (8 preceding siblings ...)
  2005-06-26 13:35 ` steven at gcc dot gnu dot org
@ 2005-06-27 11:50 ` bruno at clisp dot org
  9 siblings, 0 replies; 10+ messages in thread
From: bruno at clisp dot org @ 2005-06-27 11:50 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From bruno at clisp dot org  2005-06-27 11:50 -------
Indeed, the result is much better now, nearly optimal. 
As you say, the only further optimization possible is that a better 
register allocation could get rid of the 
      movl    %edx, %esi 
and 
      movl    %ebx, %edi 
instructions. 
 

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=6585


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-06-27 11:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20020506130600.6585.bruno@clisp.org>
2003-07-04 11:09 ` [Bug optimization/6585] useless memory store instructions on x86 steven at gcc dot gnu dot org
2003-07-10 20:09 ` [Bug optimization/6585] Reduntant store/load instruction pairs on ix86 steven at gcc dot gnu dot org
2003-08-23  1:38 ` [Bug optimization/6585] Redundant " dhazeghi at yahoo dot com
2003-11-25  7:51 ` pinskia at gcc dot gnu dot org
2004-03-17  8:03 ` kazu at cs dot umass dot edu
2004-03-17 12:10 ` bruno at clisp dot org
2004-10-11  2:57 ` [Bug rtl-optimization/6585] " pinskia at gcc dot gnu dot org
2004-10-11 11:55 ` bruno at clisp dot org
2005-06-26 13:35 ` steven at gcc dot gnu dot org
2005-06-27 11:50 ` bruno at clisp dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).