public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access
@ 2011-10-16 20:29 oleg.endo@t-online.de
  2011-10-16 23:34 ` [Bug target/50749] " kkojima at gcc dot gnu.org
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-16 20:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

             Bug #: 50749
           Summary: SH Target: Post-increment addressing used only for
                    first memory access
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: target
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: oleg.endo@t-online.de
                CC: kkojima@gcc.gnu.org
            Target: sh*-*-*


Post-increment addressing is generated only for the first
memory access. Any subsequent memory access does not use 
post-increment addressing.
The following two functions are reduced examples and result
in the same code being generated.
The problem exists with any number of memory accesses > 1
and at any optimization level.


int test_0 (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;
  return r;
}

int test_1 (char* p, int c)
{
  int r = 0;
  r += p[0];
  r += p[1];
  r += p[2];
  return r;
}

compiled with -fno-ivopts -Os -m4-single -ml ... 

    mov    r4,r1
    mov.b    @r1+,r0
    add    #2,r4
    mov.b    @r1,r1
    add    r1,r0
    mov.b    @r4,r1
    rts    
    add    r1,r0

This could be done better ...

    mov.b    @r4+,r0
    mov.b    @r4+,r1
    add    r1,r0
    mov.b    @r4+,r1
    rts    
    add    r1,r0



Another example with a loop:

int test_func_1 (char* p, int c)
{
  int r = 0;
  do
  {
    r += *p++;
    r += *p++;
  } while (--c);
  return r;
}

compiled with -fno-ivopts -Os -m4-single -ml ... 

    mov    #0,r0
.L5:
    mov    r4,r1
    mov.b    @r1+,r2
    dt    r5
    mov.b    @r1,r1
    add    r2,r0
    add    #2,r4
    bf/s    .L5
    add    r1,r0
    rts    
    nop

would be better as:

    mov    #0, r0
.L5:
    mov.b    @r4+, r1
    dt    r5
    mov.b    @r4+, r2
    add    r1, r0
    bf/s    .L5
    add    r2, r0

    rts
    nop




Using built-in specs.
COLLECT_GCC=sh-elf-gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/sh-elf/4.7.0/lto-wrapper
Target: sh-elf
Configured with: ../gcc-trunk/configure --target=sh-elf --prefix=/usr/local
--enable-languages=c,c++ --enable-multilib --disable-libssp --disable-nls
--disable-werror --enable-lto --with-newlib --with-gnu-as --with-gnu-ld
--with-system-zlib
Thread model: single
gcc version 4.7.0 20111016 (experimental) (GCC)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
@ 2011-10-16 23:34 ` kkojima at gcc dot gnu.org
  2011-10-17  0:33 ` kkojima at gcc dot gnu.org
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-16 23:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #1 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-16 23:33:40 UTC ---
GCC makes usual mem accesses into those with post_inc/pre_dec at
auto_inc_dec pass.  I guess that auto_inc_dec pass can't find
post_inc insns well in that case because other tree/rtl optimizers
tweak the code already.  If this is the case, the problem would be
not target specific.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
  2011-10-16 23:34 ` [Bug target/50749] " kkojima at gcc dot gnu.org
@ 2011-10-17  0:33 ` kkojima at gcc dot gnu.org
  2011-10-19  0:00 ` oleg.endo@t-online.de
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-17  0:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #2 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-17 00:32:39 UTC ---
*** Bug 50750 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
  2011-10-16 23:34 ` [Bug target/50749] " kkojima at gcc dot gnu.org
  2011-10-17  0:33 ` kkojima at gcc dot gnu.org
@ 2011-10-19  0:00 ` oleg.endo@t-online.de
  2011-10-19 21:37 ` kkojima at gcc dot gnu.org
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-19  0:00 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #3 from Oleg Endo <oleg.endo@t-online.de> 2011-10-19 00:00:01 UTC ---
Kaz,

do you happen to know why the following is defined in sh.h?

#define USE_LOAD_POST_INCREMENT(mode)    ((mode == SImode || mode == DImode) \
                                           ? 0 : TARGET_SH1)

#define USE_STORE_PRE_DECREMENT(mode)    ((mode == SImode || mode == DImode) \
                                           ? 0 : TARGET_SH1)

Is there any (historical) reason to disable post-inc / pre-dec addressing
for SImode / DImode?

Thanks,
Oleg


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (2 preceding siblings ...)
  2011-10-19  0:00 ` oleg.endo@t-online.de
@ 2011-10-19 21:37 ` kkojima at gcc dot gnu.org
  2011-10-30 12:37 ` oleg.endo@t-online.de
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-19 21:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #4 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-19 21:36:56 UTC ---
(In reply to comment #3)

USE_LOAD_POST_INCREMENT and USE_STORE_PRE_DECREMENT are used only
in move_by_pieces which is for some block operations when
MOVE_BY_PIECES_P says OK.  They don't disable post_inc/pre_dec
addressing for SI/DImode in general, I think.  It seems that they
are 0 for SI/DImode because we have addressing with display for
a limited size of memory chunk in these modes, though I'm wrong
about it.  I'm a bit curious to see what happens if they are changed
to non-zero for SI/DImode.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (3 preceding siblings ...)
  2011-10-19 21:37 ` kkojima at gcc dot gnu.org
@ 2011-10-30 12:37 ` oleg.endo@t-online.de
  2011-10-30 13:54 ` oleg.endo@t-online.de
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-30 12:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #5 from Oleg Endo <oleg.endo@t-online.de> 2011-10-30 12:36:51 UTC ---
(In reply to comment #4)
> 
> I'm a bit curious to see what happens if they are changed
> to non-zero for SI/DImode.

..not much actually. I've tried defining both as TARGET_SH1 and compared 
before/after CSiBE results.  No code size changes at all.  Also the micro test
cases do not seem to be affected by that.  Of course I might have missed some
border case...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (4 preceding siblings ...)
  2011-10-30 12:37 ` oleg.endo@t-online.de
@ 2011-10-30 13:54 ` oleg.endo@t-online.de
  2011-10-30 23:37 ` kkojima at gcc dot gnu.org
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-10-30 13:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #6 from Oleg Endo <oleg.endo@t-online.de> 2011-10-30 13:53:51 UTC ---
(In reply to comment #1)
> GCC makes usual mem accesses into those with post_inc/pre_dec at
> auto_inc_dec pass.  I guess that auto_inc_dec pass can't find
> post_inc insns well in that case because other tree/rtl optimizers
> tweak the code already.  If this is the case, the problem would be
> not target specific.

Looks like so.  For the simple test case...

int test (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;        
  return r;
}

...the load insns are initially expanded as:

(insn 2 5 3 2 (set (reg/v/f:SI 169 [ p ])
        (reg:SI 4 r4 [ p ]))
     (nil))


(insn 7 6 8 3 (set (reg:SI 171)
        (sign_extend:SI (mem:QI (reg/v/f:SI 169 [ p ]) [0 *p_2(D)+0 S1 A8]))) 
     (nil))


(insn 8 7 9 3 (set (reg:SI 172)
        (reg/v/f:SI 169 [ p ]))
     (nil))


(insn 9 8 10 3 (set (reg/f:SI 173)
        (plus:SI (reg/v/f:SI 169 [ p ])
            (const_int 1 [0x1])))
     (nil))


(insn 10 9 11 3 (set (reg:SI 174)
        (sign_extend:SI (mem:QI (reg/f:SI 173) [0 MEM[(char *)p_2(D) + 1B]+0 S1
A8])))
     (nil))

(insn 13 12 14 3 (set (reg/f:SI 177)
        (plus:SI (reg/v/f:SI 169 [ p ])
            (const_int 2 [0x2])))
     (nil))

(insn 14 13 15 3 (set (reg:SI 178)
        (sign_extend:SI (mem:QI (reg/f:SI 177) [0 MEM[(char *)p_2(D) + 2B]+0 S1
A8])))
     (nil))


The auto_inc_dec pass then fails to realize that (reg:SI 177) = (reg:SI 169) +
2 = (reg:SI 173) + 1.

I wonder whether there might be something in the target code that suggests the
early optimizers to do that?  I've tried playing with the TARGET_ADDRESS_COST
hook but it didn't have any effect in this case.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (5 preceding siblings ...)
  2011-10-30 13:54 ` oleg.endo@t-online.de
@ 2011-10-30 23:37 ` kkojima at gcc dot gnu.org
  2011-11-28 22:54 ` oleg.endo@t-online.de
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-10-30 23:37 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #7 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-10-30 23:36:27 UTC ---
(In reply to comment #6)
> I wonder whether there might be something in the target code that suggests the
> early optimizers to do that?  I've tried playing with the TARGET_ADDRESS_COST
> hook but it didn't have any effect in this case.

-ftree-dump-all shows that forward propagation on ssa trees makes
those memory accesses into simple array accesses.  You can try
-fno-tree-forwprop and see the effect of that option.
It seems that there are no special knobs to control forwprop from
the target side.
The problem is that SH target can't do those simple array accesses
well at QI/HImode because of the lack of displacement addressing
for those modes.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (6 preceding siblings ...)
  2011-10-30 23:37 ` kkojima at gcc dot gnu.org
@ 2011-11-28 22:54 ` oleg.endo@t-online.de
  2011-11-29  0:25 ` kkojima at gcc dot gnu.org
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-11-28 22:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #8 from Oleg Endo <oleg.endo@t-online.de> 2011-11-28 22:31:44 UTC ---
(In reply to comment #7)
> The problem is that SH target can't do those simple array accesses
> well at QI/HImode because of the lack of displacement addressing
> for those modes.

In these particular cases, this is true.  Even if there was a working QI/HImode
displacement addressing, it would most likely result in worse code compared to
post-inc / pre-dec addressing opportunities, because of register pressure on
R0.  Apart from that there is no displacement addressing for FPU loads/stores,
which also misses some opportunities:

float test_func_5 (float* p, int c)
{
  float r = 0;
  do
  {
    r += *p++;
    r += *p++;
    r += *p++;
  } while (--c);
  return r;
}

Compiled with -Os -m4-single:

    fldi0    fr0
.L11:
    mov    r4,r1
    fmov.s    @r1+,fr1
    dt    r5
    fadd    fr1,fr0
    fmov.s    @r1,fr1
    mov    r4,r1
    add    #8,r1
    add    #12,r4
    fadd    fr1,fr0
    fmov.s    @r1,fr1
    bf/s    .L11
    fadd    fr1,fr0
    rts    
    nop


..could be:
    fldi0    fr0
.L11:
    fmov.s    @r4+,fr1
    dt    r5
    fadd    fr1,fr0
    fmov.s    @r4+,fr1
    fadd    fr1,fr0
    fmov.s    @r4+,fr1
    bf/s    .L11
    fadd    fr1,fr0
    rts    
    nop


Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (7 preceding siblings ...)
  2011-11-28 22:54 ` oleg.endo@t-online.de
@ 2011-11-29  0:25 ` kkojima at gcc dot gnu.org
  2011-12-30  2:14 ` oleg.endo@t-online.de
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-11-29  0:25 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #9 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-11-28 23:29:57 UTC ---
(In reply to comment #8)
> Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.

For that function, -fdump-tree-all shows that the tree loop ivopts
optimization does it.  Try -fno-tree-forwprop -fno-ivopts.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (8 preceding siblings ...)
  2011-11-29  0:25 ` kkojima at gcc dot gnu.org
@ 2011-12-30  2:14 ` oleg.endo@t-online.de
  2011-12-30  3:36 ` kkojima at gcc dot gnu.org
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: oleg.endo@t-online.de @ 2011-12-30  2:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #10 from Oleg Endo <oleg.endo@t-online.de> 2011-12-30 02:14:00 UTC ---
(In reply to comment #9)
> (In reply to comment #8)
> > Specifying -fno-tree-forwprop doesn't seem to have any effect on these cases.
> 
> For that function, -fdump-tree-all shows that the tree loop ivopts
> optimization does it.  Try -fno-tree-forwprop -fno-ivopts.

This makes even worse code.
Anyway, I think this issue should be addressed by the auto_inc_dec pass.

If OK, I'd like to change it from target PR to middle-end PR.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug target/50749] SH Target: Post-increment addressing used only for first memory access
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (9 preceding siblings ...)
  2011-12-30  2:14 ` oleg.endo@t-online.de
@ 2011-12-30  3:36 ` kkojima at gcc dot gnu.org
  2012-06-12  7:10 ` [Bug middle-end/50749] Auto-inc-dec does not find subsequent contiguous mem accesses olegendo at gcc dot gnu.org
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: kkojima at gcc dot gnu.org @ 2011-12-30  3:36 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #11 from Kazumoto Kojima <kkojima at gcc dot gnu.org> 2011-12-30 03:24:01 UTC ---
(In reply to comment #10)
> If OK, I'd like to change it from target PR to middle-end PR.

Sure.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug middle-end/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (10 preceding siblings ...)
  2011-12-30  3:36 ` kkojima at gcc dot gnu.org
@ 2012-06-12  7:10 ` olegendo at gcc dot gnu.org
  2012-08-23 14:17 ` [Bug rtl-optimization/50749] " olegendo at gcc dot gnu.org
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-06-12  7:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #12 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-06-12 07:09:58 UTC ---
Author: olegendo
Date: Tue Jun 12 07:09:52 2012
New Revision: 188426

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=188426
Log:
    PR target/50749
    * gcc.target/sh/pr50749-sf-postinc-2.c: New.
    * gcc.target/sh/pr50749-sf-postinc-4.c: New.
    * gcc.target/sh/pr50749-qihisi-postinc-2.c: New.
    * gcc.target/sh/pr50749-qihisi-postinc-4.c: New.
    * gcc.target/sh/pr50749-sf-predec-2.c: New.
    * gcc.target/sh/pr50749-sf-predec-4.c: New.
    * gcc.target/sh/pr50749-qihisi-predec-1.c: New.
    * gcc.target/sh/pr50749-qihisi-predec-3.c: New.
    * gcc.target/sh/pr50749-sf-postinc-1.c: New.
    * gcc.target/sh/pr50749-sf-postinc-3.c: New.
    * gcc.target/sh/pr50749-qihisi-postinc-1.c: New.
    * gcc.target/sh/pr50749-qihisi-postinc-3.c: New.
    * gcc.target/sh/pr50749-sf-predec-1.c: New.
    * gcc.target/sh/pr50749-sf-predec-3.c: New.
    * gcc.target/sh/pr50749-qihisi-predec-2.c: New.
    * gcc.target/sh/pr50749-qihisi-predec-4.c: New.


Added:
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-postinc-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-postinc-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-postinc-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-postinc-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-predec-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-predec-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-predec-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-qihisi-predec-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-postinc-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-postinc-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-postinc-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-postinc-4.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-predec-1.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-predec-2.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-predec-3.c
    trunk/gcc/testsuite/gcc.target/sh/pr50749-sf-predec-4.c
Modified:
    trunk/gcc/testsuite/ChangeLog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (11 preceding siblings ...)
  2012-06-12  7:10 ` [Bug middle-end/50749] Auto-inc-dec does not find subsequent contiguous mem accesses olegendo at gcc dot gnu.org
@ 2012-08-23 14:17 ` olegendo at gcc dot gnu.org
  2013-06-22 12:22 ` olegendo at gcc dot gnu.org
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2012-08-23 14:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |olegendo at gcc dot gnu.org
          Component|middle-end                  |rtl-optimization

--- Comment #13 from Oleg Endo <olegendo at gcc dot gnu.org> 2012-08-23 14:17:29 UTC ---
PR 42612 describes a similar problem on ARM.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (12 preceding siblings ...)
  2012-08-23 14:17 ` [Bug rtl-optimization/50749] " olegendo at gcc dot gnu.org
@ 2013-06-22 12:22 ` olegendo at gcc dot gnu.org
  2013-09-30  9:06 ` amker.cheng at gmail dot com
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-06-22 12:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #14 from Oleg Endo <olegendo at gcc dot gnu.org> ---
Just wanted to clarify the reason why the examples in the description have
'-fno-ivopts', as it caused some confusion on the mailing list:

int test_0 (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;
  return r;
}

compiled with -m4 -O2:

        mov.b   @(1,r4),r0
        mov.b   @r4,r1
        add     r0,r1
        mov.b   @(2,r4),r0
        rts
        add     r1,r0

compiled with -m4 -Os:

        mov.b   @(1,r4),r0
        mov.b   @r4,r2
        add     r0,r2
        mov.b   @(2,r4),r0
        add     r0,r2
        rts
        mov     r2,r0

As seen above, the memory accesses are turned into reg+disp addressing,
although on SH using post-inc addressing would be cheaper (as it is also
reported 'sh_address_cost').
Turning off ivopts leaves the post-inc addressing and points out the original
problem of this PR.  I believe that the proper fix for this would be having PR
56590 fixed.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (13 preceding siblings ...)
  2013-06-22 12:22 ` olegendo at gcc dot gnu.org
@ 2013-09-30  9:06 ` amker.cheng at gmail dot com
  2013-10-03 10:47 ` olegendo at gcc dot gnu.org
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: amker.cheng at gmail dot com @ 2013-09-30  9:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

bin.cheng <amker.cheng at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amker.cheng at gmail dot com

--- Comment #15 from bin.cheng <amker.cheng at gmail dot com> ---
There must be another scenario for the example, and in this case example:

int test_0 (char* p, int c)
{
  int r = 0;
  r += *p++;
  r += *p++;
  r += *p++;
  return r;
}

should be translated into sth like:
  //...
  ldrb [rx]
  ldrb [rx+1]
  ldrb [rx+2]
  add rx, rx, #3
  //...
This way all loads are independent and can be issued on super scalar machine. 
Actuall for targets like arm which supports post-increment constant (other than
size of memory access), it can be further changed into:
  //...
  ldrb [rx], #3
  ldrb [rx-2]
  ldrb [rx-1]
  //...
For now auto-increment pass can't do this optimization.  I once have a patch
for this but benchmark shows the case is not common.

This case is common especially after loop unrolling and rtl passes deliberately
break down long dependence of RX, which I think is right.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (14 preceding siblings ...)
  2013-09-30  9:06 ` amker.cheng at gmail dot com
@ 2013-10-03 10:47 ` olegendo at gcc dot gnu.org
  2013-12-06 20:28 ` olegendo at gcc dot gnu.org
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-10-03 10:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #16 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to bin.cheng from comment #15)
> There must be another scenario for the example, and in this case example:
> 
> int test_0 (char* p, int c)
> {
>   int r = 0;
>   r += *p++;
>   r += *p++;
>   r += *p++;
>   return r;
> }
> 
> should be translated into sth like:
>   //...
>   ldrb [rx]
>   ldrb [rx+1]
>   ldrb [rx+2]
>   add rx, rx, #3
>   //...

As mentioned above, on SH this is the case (displacement addressing mode is
selected).  However, the order of the memory accesses is not the same as in the
original source code (which is OK for non-volatile mems).

> This way all loads are independent and can be issued on super scalar
> machine.  Actuall for targets like arm which supports post-increment
> constant (other than size of memory access), it can be further changed into:
>   //...
>   ldrb [rx], #3
>   ldrb [rx-2]
>   ldrb [rx-1]
>   //...

Whether this is transformation is beneficial or not depends on the target
architecture of course.  E.g. SH2A and SH4* is 2-way super scalar, but they
have only one memory load/store unit.  Thus the loads would not be done in
parallel anyway and the latency of the post-incremented address register can be
neglected.
There is a similar case on SH regarding floating point loads.  SH ISAs (other
than SH2A) don't have a displacement addressing mode for floating point
loads/stores.  When loading adjacent memory locations it's best to use post-inc
addressing modes and when storing adjacent memory locations it's best to use
pre-dec stores.  I.e. things like

float* x = ...;
*x++ = a;
*x++ = b;
*x++ = c;

should become:

add      #8,r1
fmov.s   fr0,@r1  // store c
fmov.s   fr1,@-r1 // store b
fmov.s   fr2,@-r1 // store a


> For now auto-increment pass can't do this optimization.  I once have a patch
> for this but benchmark shows the case is not common.

To be honest, I think such optimizations as mentioned above are out of scope of
the auto-increment pass.  Of course we can try to wallpaper its shortcomings
and get some improvements here and there, but as soon as ivopts or another tree
pass is changed and outputs different sequences it will break again.
Thus I suggested to replace the auto-inc-dec pass with a more generic
addressing mode selection RTL pass (PR 56590).  Unfortunately, I don't have
anything useful yet.  But I could share some notes and thoughts regarding AMS
optimization, if you're interested.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (15 preceding siblings ...)
  2013-10-03 10:47 ` olegendo at gcc dot gnu.org
@ 2013-12-06 20:28 ` olegendo at gcc dot gnu.org
  2013-12-24 22:55 ` steven at gcc dot gnu.org
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2013-12-06 20:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

Oleg Endo <olegendo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tir5c3 at yahoo dot co.uk

--- Comment #17 from Oleg Endo <olegendo at gcc dot gnu.org> ---
*** Bug 59078 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (16 preceding siblings ...)
  2013-12-06 20:28 ` olegendo at gcc dot gnu.org
@ 2013-12-24 22:55 ` steven at gcc dot gnu.org
  2015-05-22 13:57 ` olegendo at gcc dot gnu.org
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: steven at gcc dot gnu.org @ 2013-12-24 22:55 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chris at bubblescope dot net

--- Comment #18 from Steven Bosscher <steven at gcc dot gnu.org> ---
*** Bug 19078 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (17 preceding siblings ...)
  2013-12-24 22:55 ` steven at gcc dot gnu.org
@ 2015-05-22 13:57 ` olegendo at gcc dot gnu.org
  2015-09-17  9:00 ` urjaman at gmail dot com
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-05-22 13:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #19 from Oleg Endo <olegendo at gcc dot gnu.org> ---
There is a GSoC 2015 project which will try to address the AMS problem.
https://www.google-melange.com/gsoc/project/details/google/gsoc2015/erikvarga/5693417237512192

It will be initially for SH.  If it works out, it can be generalized so that
other targets can benefit from it, too.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (18 preceding siblings ...)
  2015-05-22 13:57 ` olegendo at gcc dot gnu.org
@ 2015-09-17  9:00 ` urjaman at gmail dot com
  2015-09-17 14:26 ` olegendo at gcc dot gnu.org
  2015-09-17 19:45 ` urjaman at gmail dot com
  21 siblings, 0 replies; 23+ messages in thread
From: urjaman at gmail dot com @ 2015-09-17  9:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

Urja Rannikko <urjaman at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |urjaman at gmail dot com

--- Comment #20 from Urja Rannikko <urjaman at gmail dot com> ---
I'll add a note here that this seems to also affect the AVR target when it is
under register pressure and cant use Z or Y registers which can do Z+n /Y+n
addressing, it ends up doing really stupid things with X register (aka r26:r27)
(which could post-inc):
adiw    r26, 0x01
ld      r24, X
sbiw    r26, 0x01
(followed by adiw r26, 2...)

This was my test case for showing this behaviour on AVR:
unsigned char test_0 (unsigned char* p1, unsigned char *p2, unsigned char *p3)
{
  unsigned char r = 0;
  r += *p1++;
  r += *p2++;
  r += *p3++;

  r += *p1++;
  r += *p2++;
  r += *p3++;

  r += *p1++;
  r += *p2++;
  r += *p3++;

  return r;
}

This note added for anyone later trying to google for this after seeing that X
reg stupidity on AVR, sorry for maybe-a-bit-noise ...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (19 preceding siblings ...)
  2015-09-17  9:00 ` urjaman at gmail dot com
@ 2015-09-17 14:26 ` olegendo at gcc dot gnu.org
  2015-09-17 19:45 ` urjaman at gmail dot com
  21 siblings, 0 replies; 23+ messages in thread
From: olegendo at gcc dot gnu.org @ 2015-09-17 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #21 from Oleg Endo <olegendo at gcc dot gnu.org> ---
(In reply to Urja Rannikko from comment #20)
> I'll add a note here that this seems to also affect the AVR target when it
> is under register pressure and cant use Z or Y registers which can do Z+n
> /Y+n addressing, it ends up doing really stupid things with X register 
> ...

I don't know AVR, but if it has only two addr regs which can do post-inc loads,
in your example a loop distribution is required.  In other words, you need to
use two loops, first with p1,p2 and second loop with p3.  Because the
calculation in the loop is simple, it would make sense.  If the calculations in
the loop are more complex, maybe not.  It's similar to the problems in PR
53949.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug rtl-optimization/50749] Auto-inc-dec does not find subsequent contiguous mem accesses
  2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
                   ` (20 preceding siblings ...)
  2015-09-17 14:26 ` olegendo at gcc dot gnu.org
@ 2015-09-17 19:45 ` urjaman at gmail dot com
  21 siblings, 0 replies; 23+ messages in thread
From: urjaman at gmail dot com @ 2015-09-17 19:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749

--- Comment #22 from Urja Rannikko <urjaman at gmail dot com> ---
All of the 3 pointer regs (X,Y,Z) can do post-inc operations, but only Y and Z
can do displacement (which is what gcc tries to use, even when the increment
operation would be better all around since we usually want the +4 pointer).

The given code wasnt meant to be anywhere near the real code, just a code that
shows the behaviour exists. The real code is just a function with lots going on
involving loading 32-bit values from pointers.

This code is WIP and i was just looking how it'll end up compiled, but for your
curiosity, here's a fragment (not enough to compile), with a comment added on
the part where i found the ld, adiw, ld, sbiw, adiw, ld, sbiw, adiw, ..
sequence, but AFAIK it could be triggered at any usage of inlined buf2u32:

static uint32_t buf2u32(uint8_t *buf) {
        return *(uint32_t*)buf;
}

static uint8_t do_exec_opbuf_extwrite(uint8_t *ewsq, uint32_t addr, uint8_t
*buf, uint16_t dlen) {
        uint8_t* end = ewsq + ewsq[0] + 1;
        for (uint16_t i=0;i<dlen;i++,addr++) {
                uint8_t c = *buf++;
                uint8_t *cmd;
                for (cmd=ewsq+1;cmd<end;) {
                        uint8_t op = *cmd++;
                        uint8_t rop = op&0x3F;
                        if (rop==S_CMD_O_WRITEB) {
                                uint32_t ra;
                                uint8_t rc;
                                if (op & 0x80) {
                                        ra = addr;
                                } else {
                                        ra = buf2u32(cmd);
                                        cmd += 4;
                                }
                                if (op & 0x40) {
                                        rc = c;
                                } else {
                                        rc = *cmd++;
                                }
                                flash_write(ra,rc);
                                continue;
                        }
                        if (rop==S_CMD_O_DELAY) {
                                udelay(buf2u32(cmd));
                                cmd += 4;
                                continue;
                        }
                        if ((rop==S_CMD_O_POLL)||(rop==S_CMD_O_POLL_DLY)) {
                                uint8_t details = *cmd++;
                                uint8_t mask = pgm_read_byte(
&(bit_mask_lut[details&7]) );
                                uint32_t ra = addr;
                                uint32_t usecs = 0;
                                if ((op & 0x40)&&(!(details & 0x10))) { 
                                        if (c & mask) {
                                                details |= 0x20;
                                        } else {
                                                details &= ~0x20;
                                        }
                                }
                                if (!(op & 0x80)) {
                                        ra = buf2u32(cmd); // Problem Here
                                        cmd += 4;
                                }
                                if (rop == S_CMD_O_POLL_DLY) {
                                        usecs = buf2u32(cmd);
                                        cmd += 4;
                                }
                                do_exec_poll(details,mask,ra,usecs);
                                continue;
                        }
                        return 1;
                }
                if (cmd>end) return 1;
        }
        return 0;
}
If i end up liking this code enough it'll end up in a branch at
https://github.com/urjaman/libfrser


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-09-17 19:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-16 20:29 [Bug target/50749] New: SH Target: Post-increment addressing used only for first memory access oleg.endo@t-online.de
2011-10-16 23:34 ` [Bug target/50749] " kkojima at gcc dot gnu.org
2011-10-17  0:33 ` kkojima at gcc dot gnu.org
2011-10-19  0:00 ` oleg.endo@t-online.de
2011-10-19 21:37 ` kkojima at gcc dot gnu.org
2011-10-30 12:37 ` oleg.endo@t-online.de
2011-10-30 13:54 ` oleg.endo@t-online.de
2011-10-30 23:37 ` kkojima at gcc dot gnu.org
2011-11-28 22:54 ` oleg.endo@t-online.de
2011-11-29  0:25 ` kkojima at gcc dot gnu.org
2011-12-30  2:14 ` oleg.endo@t-online.de
2011-12-30  3:36 ` kkojima at gcc dot gnu.org
2012-06-12  7:10 ` [Bug middle-end/50749] Auto-inc-dec does not find subsequent contiguous mem accesses olegendo at gcc dot gnu.org
2012-08-23 14:17 ` [Bug rtl-optimization/50749] " olegendo at gcc dot gnu.org
2013-06-22 12:22 ` olegendo at gcc dot gnu.org
2013-09-30  9:06 ` amker.cheng at gmail dot com
2013-10-03 10:47 ` olegendo at gcc dot gnu.org
2013-12-06 20:28 ` olegendo at gcc dot gnu.org
2013-12-24 22:55 ` steven at gcc dot gnu.org
2015-05-22 13:57 ` olegendo at gcc dot gnu.org
2015-09-17  9:00 ` urjaman at gmail dot com
2015-09-17 14:26 ` olegendo at gcc dot gnu.org
2015-09-17 19:45 ` urjaman at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).