public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-4@http.gcc.gnu.org/bugzilla/>
@ 2012-07-13  8:54 ` rguenth at gcc dot gnu.org
  2012-07-13 13:53 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13  8:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947

--- Comment #12 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 08:53:21 UTC ---
Link to vectorizer missed-optimization meta-bug.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-4@http.gcc.gnu.org/bugzilla/>
  2012-07-13  8:54 ` [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha rguenth at gcc dot gnu.org
@ 2012-07-13 13:53 ` rguenth at gcc dot gnu.org
  1 sibling, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-07-13 13:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-07-13 13:52:25 UTC ---
At -O3 we now get calls to memset for the original testcase, changing it to
store 1 instead we get

$f..ng:
f:
        .frame $30,0,$26,0
        .prologue 0
        and $16,4,$1
        cmpult $31,$1,$1
        lda $7,64($31)
        addl $31,$1,$2
        mov $31,$22
        beq $2,$L2
        lda $7,63($31)
        lda $22,1($31)
        lda $3,1($31)
        stl $3,0($16)
$L2:
        lda $3,64($31)
        subl $3,$2,$2
        zapnot $2,15,$8
        lda $5,1($31)
        srl $8,1,$6
        sll $5,32,$5
        addl $6,$6,$2
        s4addq $1,$16,$1
        mov $31,$3
        zapnot $6,15,$6
        lda $5,1($5)
        .align 4
$L6:
        addl $3,1,$3
        stq $5,0($1)
        zapnot $3,15,$4
        lda $1,8($1)
        cmpult $4,$6,$4
        bne $4,$L6
        zapnot $2,15,$3
        addl $22,$2,$1
        cmpeq $8,$3,$8
        cpys $f31,$f31,$f31
        subl $7,$2,$2
        bne $8,$L8
        s4addq $1,0,$1
        lda $4,1($31)
        .align 4
$L5:
        addq $16,$1,$3
        subl $2,1,$2
        stl $4,0($3)
        lda $1,4($1)
        bne $2,$L5
$L8:
        ret $31,($26),1
        .end f

which seems to be reasonable.  We still run into the issue that we do
not recognize that the epilogue loop may at most iterate once.  The
vectorizer makes a mess out of induction variables for the prologue/epilogue
loops.  See PR53355 for where I track this general issue.

*** This bug has been marked as a duplicate of bug 53355 ***


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-2744@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2009-02-03 12:31 ` ubizjak at gmail dot com
@ 2009-02-03 12:51 ` falk at debian dot org
  3 siblings, 0 replies; 13+ messages in thread
From: falk at debian dot org @ 2009-02-03 12:51 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from falk at debian dot org  2009-02-03 12:50 -------
(In reply to comment #10)
> By changing the test to:
> 
> unsigned int p[64];

In this case 8-byte alignment is guaranteed, so no peeling is needed.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-2744@http.gcc.gnu.org/bugzilla/>
  2009-02-03 12:24 ` ubizjak at gmail dot com
  2009-02-03 12:25 ` ubizjak at gmail dot com
@ 2009-02-03 12:31 ` ubizjak at gmail dot com
  2009-02-03 12:51 ` falk at debian dot org
  3 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-02-03 12:31 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from ubizjak at gmail dot com  2009-02-03 12:31 -------
By changing the test to:

--cut here--
unsigned int p[64];

int f(void) {
    for (int i = 0; i < 64; ++i)
        p[i] = 0;
}
--cut here--

gcc -O2 -ftree-vectorize -mcpu=ev67 -std=c99

f:
        .frame $30,0,$26,0
        ldgp $29,0($27)  # 63   *prologue_ldgp_1        [length = 4]
$f..ng:
        .prologue 1
        lda $1,p         # 37   *movdi_fix/4    [length = 4]
        lda $3,256($1)   # 38   *adddi_internal/2       [length = 4]
        .align 4
$L2:
        stq $31,0($1)    # 40   *movv2si_fix/4  [length = 4]
        lda $1,8($1)     # 41   *adddi_internal/2       [length = 4]
        cmpeq $1,$3,$2   # 43   *setcc_internal [length = 4]
        beq $2,$L2       # 44   *bcc_normal     [length = 4]
        ret $31,($26),1  # 67   *return_internal        [length = 4]
        .end f


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-2744@http.gcc.gnu.org/bugzilla/>
  2009-02-03 12:24 ` ubizjak at gmail dot com
@ 2009-02-03 12:25 ` ubizjak at gmail dot com
  2009-02-03 12:31 ` ubizjak at gmail dot com
  2009-02-03 12:51 ` falk at debian dot org
  3 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-02-03 12:25 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from ubizjak at gmail dot com  2009-02-03 12:25 -------
Together with the patch for PR 8603:

f:
        .frame $30,0,$26,0
        .prologue 0
        and $16,4,$4
        lda $1,64($31)
        bis $31,$31,$2
        cmpult $31,$4,$4
        beq $4,$L3
        stl $31,0($16)
        lda $1,63($31)
        lda $2,1($31)
$L3:
        lda $8,64($31)
        subl $8,$4,$8
        zapnot $8,15,$8
        srl $8,1,$6
        addl $6,$6,$7
        beq $7,$L4
        s4addq $4,$16,$4
        zapnot $6,15,$6
        bis $31,$31,$3
        .align 4
$L5:
        addl $3,1,$3
        stq $31,0($4)
        lda $4,8($4)
        zapnot $3,15,$5
        cmpult $5,$6,$5
        bne $5,$L5
        zapnot $7,15,$3
        addl $2,$7,$2
        subl $1,$7,$1
        cmpeq $8,$3,$8
        bne $8,$L9
$L4:
        s4addq $2,$16,$2
        .align 4
$L7:
        stl $31,0($2)
        subl $1,1,$1
        lda $2,4($2)
        bne $1,$L7
$L9:
        ret $31,($26),1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
       [not found] <bug-18557-2744@http.gcc.gnu.org/bugzilla/>
@ 2009-02-03 12:24 ` ubizjak at gmail dot com
  2009-02-03 12:25 ` ubizjak at gmail dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: ubizjak at gmail dot com @ 2009-02-03 12:24 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from ubizjak at gmail dot com  2009-02-03 12:24 -------
Current trunk produces:

f:
        .frame $30,0,$26,0
        .prologue 0
        and $16,4,$3
        lda $6,64($31)
        bis $31,$31,$2
        cmpult $31,$3,$3
        beq $3,$L3
        stl $31,0($16)
        lda $6,63($31)
        lda $2,1($31)
$L3:
        lda $8,64($31)
        subq $8,$3,$8
        zapnot $8,15,$8
        srl $8,1,$5
        addl $5,$5,$7
        beq $7,$L4
        s4addq $3,$16,$3
        zapnot $5,15,$5
        bis $31,$31,$1
        .align 4
$L5:
        addl $1,1,$1
        stq $31,0($3)
        lda $3,8($3)
        zapnot $1,15,$4
        cmpult $4,$5,$4
        bne $4,$L5
        zapnot $7,15,$1
        addl $2,$7,$2
        subl $6,$7,$6
        cmpeq $8,$1,$8
        bne $8,$L9
$L4:
        lda $4,-1($6)
        s4addq $2,$16,$2
        bis $31,$31,$1
        zapnot $4,15,$4
        s4addq $4,4,$4
        .align 4
$L7:
        lda $1,4($1)
        stl $31,0($2)
        lda $2,4($2)
        cmpeq $1,$4,$3
        beq $3,$L7
$L9:
        ret $31,($26),1


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
                   ` (5 preceding siblings ...)
  2004-11-19 18:24 ` dorit at il dot ibm dot com
@ 2004-11-19 20:16 ` falk at debian dot org
  6 siblings, 0 replies; 13+ messages in thread
From: falk at debian dot org @ 2004-11-19 20:16 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From falk at debian dot org  2004-11-19 20:16 -------
(In reply to comment #6)
> I expect these would go away with this patch:
> http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01512.html

Not quite. Code looks like this:

f:
	and $16,4,$1
	mov $31,$7
	lda $6,64($31)
	cmpult $31,$1,$1
	cmpeq $1,0,$8
	lda $8,1($8)
	zapnot $8,15,$5
	beq $5,$L4
	mov $31,$3
	mov $31,$4
	.align 4
$L12:
	lda $3,1($3)
	s4addq $4,$16,$2
	addl $31,$3,$4
	stl $31,0($2)
	zapnot $4,15,$1
	cmpule $5,$1,$1
	beq $1,$L12
	lda $1,64($31)
	addl $31,$4,$7
	subl $1,$4,$6
$L4:
	cmpeq $5,64,$1
	bne $1,$L15
	lda $1,64($31)
	subq $1,$8,$1
	zapnot $1,15,$23
	srl $23,1,$1
	addl $1,$1,$8
	zapnot $8,15,$22
	beq $22,$L8
	s4addq $5,$16,$2
	zapnot $1,15,$4
	mov $31,$3
	.align 4
$L10:
	lda $3,1($3)
	stq $31,0($2)
	lda $2,8($2)
	zapnot $3,15,$1
	cmpule $4,$1,$1
	beq $1,$L10
	subl $6,$8,$6
	addl $7,$8,$7
$L8:
	cmpeq $23,$22,$1
	bne $1,$L15
	mov $31,$2
	.align 4
$L14:
	addl $2,$7,$1
	subl $6,1,$6
	lda $2,1($2)
	s4addq $1,$16,$1
	stl $31,0($1)
	bne $6,$L14
$L15:
	ret

The first branch can never be taken, and all sign extensions (sextl) and zero
extensions (zapnot 15) are useless.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
                   ` (4 preceding siblings ...)
  2004-11-19 14:21 ` pinskia at gcc dot gnu dot org
@ 2004-11-19 18:24 ` dorit at il dot ibm dot com
  2004-11-19 20:16 ` falk at debian dot org
  6 siblings, 0 replies; 13+ messages in thread
From: dorit at il dot ibm dot com @ 2004-11-19 18:24 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dorit at il dot ibm dot com  2004-11-19 18:24 -------
(In reply to comment #5)
> (In reply to comment #2)
> > Subject: Re:  Inefficient code generated by
> >         -ftree-vectorize on Alpha
> > On Fri, 2004-11-19 at 00:04 +0000, pinskia at gcc dot gnu dot org wrote:
> > > ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-
19 00:04 -------
> > > Confirmed.
> > > One issue is that we don't fold stuff:
> > >   D.1061 = 8 - 1;
> > >   D.1065 = 2 - 1;
> > 
I expect these would go away with this patch:
http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01512.html



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
                   ` (3 preceding siblings ...)
  2004-11-19 11:29 ` falk at debian dot org
@ 2004-11-19 14:21 ` pinskia at gcc dot gnu dot org
  2004-11-19 18:24 ` dorit at il dot ibm dot com
  2004-11-19 20:16 ` falk at debian dot org
  6 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-19 14:21 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-19 14:21 -------
(In reply to comment #2)
> Subject: Re:  Inefficient code generated by
>         -ftree-vectorize on Alpha
> On Fri, 2004-11-19 at 00:04 +0000, pinskia at gcc dot gnu dot org wrote:
> > ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-19 00:04 -------
> > Confirmed.
> > One issue is that we don't fold stuff:
> >   D.1061 = 8 - 1;
> >   D.1065 = 2 - 1;
> 
> We correctly rely on CCP to clean it up :)
> 
> Having every single pass fold every statement it generates isn't
> necessarily a win when CCP will do it for you :)

This is how the vectorizer produced the statement in the first place.

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
                   ` (2 preceding siblings ...)
  2004-11-19 10:50 ` giovannibajo at libero dot it
@ 2004-11-19 11:29 ` falk at debian dot org
  2004-11-19 14:21 ` pinskia at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: falk at debian dot org @ 2004-11-19 11:29 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From falk at debian dot org  2004-11-19 11:29 -------
(In reply to comment #3)
> Can we get some numbers to understand how worse we are behaving?

The code size is inflated by a factor of about 3. Run time difference depends
a lot on how many bytes are actually copied, how predictable the branches are
etc. If I just run the test case on always the same data, which is about the 
best possible case, the vectorized code is 5% slower than f2.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
  2004-11-19  0:04 ` [Bug tree-optimization/18557] " pinskia at gcc dot gnu dot org
  2004-11-19  3:42 ` dberlin at dberlin dot org
@ 2004-11-19 10:50 ` giovannibajo at libero dot it
  2004-11-19 11:29 ` falk at debian dot org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: giovannibajo at libero dot it @ 2004-11-19 10:50 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From giovannibajo at libero dot it  2004-11-19 10:50 -------
Can we get some numbers to understand how worse we are behaving?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
  2004-11-19  0:04 ` [Bug tree-optimization/18557] " pinskia at gcc dot gnu dot org
@ 2004-11-19  3:42 ` dberlin at dberlin dot org
  2004-11-19 10:50 ` giovannibajo at libero dot it
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: dberlin at dberlin dot org @ 2004-11-19  3:42 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From dberlin at dberlin dot org  2004-11-19 03:42 -------
Subject: Re:  Inefficient code generated by
	-ftree-vectorize on Alpha

On Fri, 2004-11-19 at 00:04 +0000, pinskia at gcc dot gnu dot org wrote:
> ------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-19 00:04 -------
> Confirmed.
> One issue is that we don't fold stuff:
>   D.1061 = 8 - 1;
>   D.1065 = 2 - 1;

We correctly rely on CCP to clean it up :)

Having every single pass fold every statement it generates isn't
necessarily a win when CCP will do it for you :)



-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha
  2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
@ 2004-11-19  0:04 ` pinskia at gcc dot gnu dot org
  2004-11-19  3:42 ` dberlin at dberlin dot org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2004-11-19  0:04 UTC (permalink / raw)
  To: gcc-bugs


------- Additional Comments From pinskia at gcc dot gnu dot org  2004-11-19 00:04 -------
Confirmed.
One issue is that we don't fold stuff:
  D.1061 = 8 - 1;
  D.1065 = 2 - 1;

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|                            |1
   Last reconfirmed|0000-00-00 00:00:00         |2004-11-19 00:04:13
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18557


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-07-13 13:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-18557-4@http.gcc.gnu.org/bugzilla/>
2012-07-13  8:54 ` [Bug tree-optimization/18557] Inefficient code generated by -ftree-vectorize on Alpha rguenth at gcc dot gnu.org
2012-07-13 13:53 ` rguenth at gcc dot gnu.org
     [not found] <bug-18557-2744@http.gcc.gnu.org/bugzilla/>
2009-02-03 12:24 ` ubizjak at gmail dot com
2009-02-03 12:25 ` ubizjak at gmail dot com
2009-02-03 12:31 ` ubizjak at gmail dot com
2009-02-03 12:51 ` falk at debian dot org
2004-11-18 23:54 [Bug tree-optimization/18557] New: " falk at debian dot org
2004-11-19  0:04 ` [Bug tree-optimization/18557] " pinskia at gcc dot gnu dot org
2004-11-19  3:42 ` dberlin at dberlin dot org
2004-11-19 10:50 ` giovannibajo at libero dot it
2004-11-19 11:29 ` falk at debian dot org
2004-11-19 14:21 ` pinskia at gcc dot gnu dot org
2004-11-19 18:24 ` dorit at il dot ibm dot com
2004-11-19 20:16 ` falk at debian dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).