public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/42612] post-increment addressing not used
       [not found] <bug-42612-4@http.gcc.gnu.org/bugzilla/>
@ 2012-08-23 14:16 ` olegendo at gcc dot gnu.org
  2015-05-22 13:57 ` olegendo at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-23 14:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |olegendo at gcc dot gnu.org
      Known to fail|                            |

--- Comment #4 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-08-23 14:15:47 UTC ---
I haven't checked this on ARM but on SH there's a similar problem.  See PR
50749.
As far as I understand it, it is a problem of the auto-inc-dec pass, which is
unable to find related addresses due to prior optimizations


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
       [not found] <bug-42612-4@http.gcc.gnu.org/bugzilla/>
  2012-08-23 14:16 ` [Bug rtl-optimization/42612] post-increment addressing not used olegendo at gcc dot gnu.org
@ 2015-05-22 13:57 ` olegendo at gcc dot gnu.org
  2022-07-12  5:21 ` bd at mail dot ru
  2022-07-12  5:39 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-05-22 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612

--- Comment #5 from Oleg Endo <olegendo at gcc dot gnu.org> ---
There is a GSoC 2015 project which will try to address the AMS problem.
https://www.google-melange.com/gsoc/project/details/google/gsoc2015/erikvarga/5693417237512192

It will be initially for SH.  If it works out, it can be generalized so that
other targets can benefit from it, too.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
       [not found] <bug-42612-4@http.gcc.gnu.org/bugzilla/>
  2012-08-23 14:16 ` [Bug rtl-optimization/42612] post-increment addressing not used olegendo at gcc dot gnu.org
  2015-05-22 13:57 ` olegendo at gcc dot gnu.org
@ 2022-07-12  5:21 ` bd at mail dot ru
  2022-07-12  5:39 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: bd at mail dot ru @ 2022-07-12  5:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612

Dmitry Baksheev <bd at mail dot ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bd at mail dot ru

--- Comment #6 from Dmitry Baksheev <bd at mail dot ru> ---
Please consider fixing this issue. Here is another example where not using
post-increment for loops produces suboptimal code on AArch64. The code is 4x
slower than LLVM-generated code for dot-product function:

    double dotprod(std::size_t n, 
         const double* __restrict__ a, 
         const double* __restrict__ b) 
    {
        double ans = 0;
        #if __clang__
        #pragma clang loop vectorize(assume_safety)
        #else
        #pragma GCC ivdep
        #endif  
        for (std::size_t i = 0; i < n; ++i) {
            ans += a[i] * b[i];
        }
        return ans;
    }


Compile with: $(CXX) -march=armv8.2-a -O3 dp.cpp

GCC-generated loop does not have post-increment loads:
    .L3:                                                                        
        ldr d2, [x1, x3, lsl 3]                                                 
        ldr d1, [x2, x3, lsl 3]                                                 
        add x3, x3, 1                                                           
        fmadd   d0, d2, d1, d0                                                  
        cmp x0, x3                                                              
        bne .L3 

Clang emits this:
    .LBB0_4:
        ldr d1, [x10], #8                                                       
        ldr d2, [x8], #8                                                        
        subs    x9, x9, #1
        fmadd   d0, d1, d2, d0                                                  
        b.ne    .LBB0_4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
       [not found] <bug-42612-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-07-12  5:21 ` bd at mail dot ru
@ 2022-07-12  5:39 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-07-12  5:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Dmitry Baksheev from comment #6)
> Please consider fixing this issue. Here is another example where not using
> post-increment for loops produces suboptimal code on AArch64. The code is 4x
> slower than LLVM-generated code for dot-product function:
> 
>     double dotprod(std::size_t n, 
>          const double* __restrict__ a, 
>          const double* __restrict__ b) 
>     {
>         double ans = 0;
>         #if __clang__
>         #pragma clang loop vectorize(assume_safety)
>         #else
>         #pragma GCC ivdep
>         #endif  
>         for (std::size_t i = 0; i < n; ++i) {
>             ans += a[i] * b[i];
>         }
>         return ans;
>     }
> 
> 
> Compile with: $(CXX) -march=armv8.2-a -O3 dp.cpp
> 
> GCC-generated loop does not have post-increment loads:
>     .L3:                                                                    
> 
>         ldr d2, [x1, x3, lsl 3]                                             
> 
>         ldr d1, [x2, x3, lsl 3]                                             
> 
>         add x3, x3, 1                                                       
> 
>         fmadd   d0, d2, d1, d0                                              
> 
>         cmp x0, x3                                                          
> 
>         bne .L3 
> 
> Clang emits this:
>     .LBB0_4:
>         ldr d1, [x10], #8                                                   
> 
>         ldr d2, [x8], #8                                                    
> 
>         subs    x9, x9, #1
>         fmadd   d0, d1, d2, d0                                              
> 
>         b.ne    .LBB0_4

I suspect that is a different issue. And I suspect it is a target cost issue
which depends on the core really. Because some cores the separate add is
better.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
  2010-01-04 16:02 [Bug c/42612] New: [4.4/4.5] " jon at beniston dot com
                   ` (2 preceding siblings ...)
  2010-01-05 11:43 ` bonzini at gnu dot org
@ 2010-01-05 12:13 ` jon at beniston dot com
  3 siblings, 0 replies; 8+ messages in thread
From: jon at beniston dot com @ 2010-01-05 12:13 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from jon at beniston dot com  2010-01-05 12:13 -------
GCC 4.1.2 seems to produce the same code.

       mov     r2, #0
       mov     r3, r0
       strb    r2, [r3], #1
       strb    r2, [r0, #1]
       add     r0, r3, #2
       @ lr needed for prologue
       strb    r2, [r3, #1]
       bx      lr
       .size   func, .-func
       .ident  "GCC: (GNU) 4.1.2"


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
  2010-01-04 16:02 [Bug c/42612] New: [4.4/4.5] " jon at beniston dot com
  2010-01-04 16:11 ` [Bug rtl-optimization/42612] " rguenth at gcc dot gnu dot org
  2010-01-04 18:53 ` steven at gcc dot gnu dot org
@ 2010-01-05 11:43 ` bonzini at gnu dot org
  2010-01-05 12:13 ` jon at beniston dot com
  3 siblings, 0 replies; 8+ messages in thread
From: bonzini at gnu dot org @ 2010-01-05 11:43 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from bonzini at gnu dot org  2010-01-05 11:43 -------
Combine is doing what it knows best (forming complicated instructions,
addressing modes in this case); to do this it is already damaging the nice
shape of the code after the tree optimizers, and synthesizing things like x+2.

I wonder more about what the RTL looks like before auto-inc-dec, and whether it
is missing something because it must be taught some trick...

Is this a regression from pre-DF (that would be 4.2)?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
  2010-01-04 16:02 [Bug c/42612] New: [4.4/4.5] " jon at beniston dot com
  2010-01-04 16:11 ` [Bug rtl-optimization/42612] " rguenth at gcc dot gnu dot org
@ 2010-01-04 18:53 ` steven at gcc dot gnu dot org
  2010-01-05 11:43 ` bonzini at gnu dot org
  2010-01-05 12:13 ` jon at beniston dot com
  3 siblings, 0 replies; 8+ messages in thread
From: steven at gcc dot gnu dot org @ 2010-01-04 18:53 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from steven at gcc dot gnu dot org  2010-01-04 18:53 -------
>From the tree optimizers we go to expand with the following code (from
PR42612.c.139t.optimized):

;; Function func (func)

func (char * p)
{
<bb 2>:
  *p_1(D) = 0;
  p_2 = p_1(D) + 1;
  *p_2 = 0;
  p_3 = p_2 + 1;
  *p_3 = 0;
  p_4 = p_3 + 1;
  return p_4;

}



The code remains in this form until combine, which changes the code as follows
(left is PR42612.c.174r.dce, right is PR42612.c.175r.combine, dumped with
-fdump-rtl-all-slim):

    4 NOTE_INSN_BASIC_BLOCK                 4 NOTE_INSN_BASIC_BLOCK
    2 r137:SI=r0:SI                         2 r137:SI=r0:SI
      REG_DEAD: r0:SI                         REG_DEAD: r0:SI
    3 NOTE_INSN_FUNCTION_BEG                3 NOTE_INSN_FUNCTION_BEG
    6 r138:SI=0x0                           6 r138:SI=0x0
   28 r133:SI=r137:SI                      28 r133:SI=r137:SI
    8 [r133:SI++]=r138:SI#0                 8 [r133:SI++]=r138:SI#0
      REG_INC: r133:SI                        REG_INC: r133:SI
      REG_EQUAL: 0x0                          REG_EQUAL: 0x0
   12 [r137:SI+0x1]=r138:SI#0              12 [r137:SI+0x1]=r138:SI#0
      REG_DEAD: r137:SI                       REG_DEAD: r137:SI
      REG_EQUAL: 0x0                          REG_EQUAL: 0x0
   13 r134:SI=r133:SI+0x1             |    13 NOTE_INSN_DELETED
   16 [r133:SI+0x1]=r138:SI#0              16 [r133:SI+0x1]=r138:SI#0
      REG_DEAD: r138:SI                       REG_DEAD: r138:SI
      REG_DEAD: r133:SI               <
      REG_EQUAL: 0x0                          REG_EQUAL: 0x0
   17 r144:SI=r134:SI+0x1             |    17 NOTE_INSN_DELETED
      REG_DEAD: r134:SI               |    22 r0:SI=r133:SI+0x2
   22 r0:SI=r144:SI                   |       REG_DEAD: r133:SI
      REG_DEAD: r144:SI               <
   25 use r0:SI                            25 use r0:SI

Paolo, you know combine best. Is there a way, you think, to teach combine about
post-increment addressing?


-- 

steven at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bonzini at gnu dot org
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-01-04 18:53:12
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/42612] post-increment addressing not used
  2010-01-04 16:02 [Bug c/42612] New: [4.4/4.5] " jon at beniston dot com
@ 2010-01-04 16:11 ` rguenth at gcc dot gnu dot org
  2010-01-04 18:53 ` steven at gcc dot gnu dot org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-01-04 16:11 UTC (permalink / raw)
  To: gcc-bugs



-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
          Component|c                           |rtl-optimization
           Keywords|                            |missed-optimization
      Known to fail|                            |4.4.2 4.5.0
            Summary|[4.4/4.5] post-increment    |post-increment addressing
                   |addressing not used         |not used


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42612


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-07-12  5:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-42612-4@http.gcc.gnu.org/bugzilla/>
2012-08-23 14:16 ` [Bug rtl-optimization/42612] post-increment addressing not used olegendo at gcc dot gnu.org
2015-05-22 13:57 ` olegendo at gcc dot gnu.org
2022-07-12  5:21 ` bd at mail dot ru
2022-07-12  5:39 ` pinskia at gcc dot gnu.org
2010-01-04 16:02 [Bug c/42612] New: [4.4/4.5] " jon at beniston dot com
2010-01-04 16:11 ` [Bug rtl-optimization/42612] " rguenth at gcc dot gnu dot org
2010-01-04 18:53 ` steven at gcc dot gnu dot org
2010-01-05 11:43 ` bonzini at gnu dot org
2010-01-05 12:13 ` jon at beniston dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).