public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54116] New: suboptimal code for tight loops
@ 2012-07-29  9:09 neleai at seznam dot cz
  2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: neleai at seznam dot cz @ 2012-07-29  9:09 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116

             Bug #: 54116
           Summary: suboptimal code for tight loops
    Classification: Unclassified
           Product: gcc
           Version: 4.7.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: neleai@seznam.cz


Consider following loop.

int recal(int *x){int i;
  for(i=0;;i+=4){
    if(__builtin_expect((x[i]|x[i+1])|(x[i+2]|x[i+3]),0))
      break;
  }
  return (x[i]|x[i+1])*(x[i+2]|x[i+3]);
}

On x64 orl instruction is destructive. Gcc saves intermediate result to
register instead recalculating it at end of loop, making loop run slower.

Relevant assembly output is following:

gcc-4.7 -O3 -S
        .file   "recal.c"
        .text
        .p2align 4,,15
        .globl  recal
        .type   recal, @function
recal:
.LFB0:
        .cfi_startproc
        movl    12(%rdi), %edx
        orl     8(%rdi), %edx
        movl    4(%rdi), %ecx
        orl     (%rdi), %ecx
        movl    %edx, %eax
        orl     %ecx, %eax
        jne     .L2
        leaq    16(%rdi), %rax
        .p2align 4,,10
        .p2align 3
.L3:
        movl    12(%rax), %edx
        orl     8(%rax), %edx
        movl    4(%rax), %ecx
        orl     (%rax), %ecx
        addq    $16, %rax
        movl    %edx, %esi
        orl     %ecx, %esi
        je      .L3
.L2:
        movl    %ecx, %eax
        imull   %edx, %eax
        ret
        .cfi_endproc
.LFE0:
        .size   recal, .-recal
        .ident  "GCC: (Debian 4.7.1-2) 4.7.1"
        .section        .note.GNU-stack,"",@progbits
--


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/54116] suboptimal code for tight loops
  2012-07-29  9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
@ 2012-07-29 10:13 ` pinskia at gcc dot gnu.org
  2012-07-29 10:31 ` neleai at seznam dot cz
  2021-08-07  5:39 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-07-29 10:13 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 10:13:41 UTC ---
The tree level looks correct:
<bb 4>:
  # ivtmp.23_69 = PHI <ivtmp.23_68(5), ivtmp.23_66(3)>
  D.1771_65 = (void *) ivtmp.23_69;
  D.1717_6 = MEM[base: D.1771_65, offset: 0B];
  D.1722_10 = MEM[base: D.1771_65, offset: 4B];
  D.1723_11 = D.1722_10 | D.1717_6;
  D.1727_15 = MEM[base: D.1771_65, offset: 8B];
  D.1731_19 = MEM[base: D.1771_65, offset: 12B];
  D.1732_20 = D.1731_19 | D.1727_15;
  D.1733_21 = D.1732_20 | D.1723_11;
  ivtmp.23_68 = ivtmp.23_69 + 16;
  if (D.1733_21 != 0)
    goto <bb 6>;
  else
    goto <bb 5>;

<bb 5>:
  goto <bb 4>;

<bb 6>:
  # D.1723_22 = PHI <D.1723_11(4), D.1723_36(2)>
  # D.1732_23 = PHI <D.1732_20(4), D.1732_45(2)>
  D.1737_25 = D.1723_22 * D.1732_23;
  return D.1737_25;


--- CUT ----
Are you trying to say GCC should copy the loop header in this case?
Or keeping around x[i]|x[i+1] and x[i+2]|x[i+3] result increases register
pressure and exposes issues with 2-operand machines in some cases?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/54116] suboptimal code for tight loops
  2012-07-29  9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
  2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
@ 2012-07-29 10:31 ` neleai at seznam dot cz
  2021-08-07  5:39 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: neleai at seznam dot cz @ 2012-07-29 10:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116

--- Comment #2 from Ondrej Bilka <neleai at seznam dot cz> 2012-07-29 10:30:46 UTC ---
On Sun, Jul 29, 2012 at 10:13:41AM +0000, pinskia at gcc dot gnu.org wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116
> 
> --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-07-29 10:13:41 UTC ---
> The tree level looks correct:
> <bb 4>:
>   # ivtmp.23_69 = PHI <ivtmp.23_68(5), ivtmp.23_66(3)>
>   D.1771_65 = (void *) ivtmp.23_69;
>   D.1717_6 = MEM[base: D.1771_65, offset: 0B];
>   D.1722_10 = MEM[base: D.1771_65, offset: 4B];
>   D.1723_11 = D.1722_10 | D.1717_6;
>   D.1727_15 = MEM[base: D.1771_65, offset: 8B];
>   D.1731_19 = MEM[base: D.1771_65, offset: 12B];
>   D.1732_20 = D.1731_19 | D.1727_15;
>   D.1733_21 = D.1732_20 | D.1723_11;
>   ivtmp.23_68 = ivtmp.23_69 + 16;
>   if (D.1733_21 != 0)
>     goto <bb 6>;
>   else
>     goto <bb 5>;
> 
> <bb 5>:
>   goto <bb 4>;
> 
> <bb 6>:
>   # D.1723_22 = PHI <D.1723_11(4), D.1723_36(2)>
>   # D.1732_23 = PHI <D.1732_20(4), D.1732_45(2)>
>   D.1737_25 = D.1723_22 * D.1732_23;
>   return D.1737_25;
> 
> 
> --- CUT ----
> Are you trying to say GCC should copy the loop header in this case?
> Or keeping around x[i]|x[i+1] and x[i+2]|x[i+3] result increases register
> pressure and exposes issues with 2-operand machines in some cases?

Increases register pressure. 

> 
> -- 
> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/54116] suboptimal code for tight loops
  2012-07-29  9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
  2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
  2012-07-29 10:31 ` neleai at seznam dot cz
@ 2021-08-07  5:39 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-07  5:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54116

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2021-08-07
             Status|UNCONFIRMED                 |WAITING

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC, clang and ICC all optimize it this way.

Do you have a testcase that causes a performance of increased register
pressure?

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-07  5:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-29  9:09 [Bug tree-optimization/54116] New: suboptimal code for tight loops neleai at seznam dot cz
2012-07-29 10:13 ` [Bug tree-optimization/54116] " pinskia at gcc dot gnu.org
2012-07-29 10:31 ` neleai at seznam dot cz
2021-08-07  5:39 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).