public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference
@ 2020-07-15  1:50 michaeljclark at mac dot com
  2020-07-15  5:42 ` [Bug target/96201] " crazylht at gmail dot com
  2020-09-15 11:15 ` amker at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: michaeljclark at mac dot com @ 2020-07-15  1:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201

            Bug ID: 96201
           Summary: x86 movsd/movsq string instructions and alignment
                    inference
           Product: gcc
           Version: 10.1.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: michaeljclark at mac dot com
  Target Milestone: ---

Taking the time to record some observations and extract minimal test code for
alignment (inference) and x86 string instruction selection.

GCC9 and GCC10 are not generating x86 string instructions in cases apparently
due to the compiler believing the addresses are not aligned.

GCC10 appears to have an additional issue whereby x86 string instructions are
not selected unless the address is aligned to twice the natural alignment.

Two observations:

* (GCC9/10) integer alignment is not inferred from expressions i.e. x & ~3
* (GCC10) __builtin_assume_aligned appears to require double the alignment

The double alignment issue was observed with both int/movsd and long/movsq
whereby GCC10 will only generate movsd or movsq if the alignment is double the
type's natural alignment. The test case here is for int.


--- BEGIN SAMPLE CODE ---

void f1(long d, long s, unsigned n)
{
    int *sn = (int*)( (long)(s    ) & ~3l );
    int *dn = (int*)( (long)(d    ) & ~3l );
    int *de = (int*)( (long)(d + n) & ~3l );

    while (dn < de) *dn++ = *sn++;
}

void f2(long d, long s, unsigned n)
{
    int *sn = (int*)( (long)(s    ) & ~7l );
    int *dn = (int*)( (long)(d    ) & ~7l );
    int *de = (int*)( (long)(d + n) & ~7l );

    while (dn < de) *dn++ = *sn++;
}

void f3(long d, long s, unsigned n)
{
    int *sn = __builtin_assume_aligned( (int*)( (long)(s    ) & ~3l ), 4 );
    int *dn = __builtin_assume_aligned( (int*)( (long)(d    ) & ~3l ), 4 );
    int *de = __builtin_assume_aligned( (int*)( (long)(d + n) & ~3l ), 4 );

    while (dn < de) *dn++ = *sn++;
}

void f4(long d, long s, unsigned n)
{
    int *sn = __builtin_assume_aligned( (int*)((long)(s    ) & ~3l ), 8 );
    int *dn = __builtin_assume_aligned( (int*)((long)(d    ) & ~3l ), 8 );
    int *de = __builtin_assume_aligned( (int*)((long)(d + n) & ~3l ), 8 );

    while (dn < de) *dn++ = *sn++;
}

--- END SAMPLE CODE ---


GCC9 generates this for f1, f2 and GCC10 generates this for f1, f2, f3

.Ln:
        leaq    (%rax,%rsi), %rcx
        movq    %rax, %rdx
        addq    $4, %rax
        movl    (%rcx), %ecx
        movl    %ecx, (%rdx)
        cmpq    %rax, %rdi
        ja      .Ln

GCC9 generates this for f3, f4 and GCC10 generates this only for f4

.Ln:
        movsl
        cmpq    %rdi, %rdx
        ja      .Ln

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/96201] x86 movsd/movsq string instructions and alignment inference
  2020-07-15  1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
@ 2020-07-15  5:42 ` crazylht at gmail dot com
  2020-09-15 11:15 ` amker at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: crazylht at gmail dot com @ 2020-07-15  5:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201

Hongtao.liu <crazylht at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |crazylht at gmail dot com

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
The issue is caused by pass_ivopt, ivopt select only one iv for f3(dn) which
seems not to be optimal, and select two iv for f4(sn,dn) which seems optimal.
---
loop in f3:

Selected IV set for loop 1 at pr96201.c:25, 10 avg niters, 1 IVs:
Candidate 8:
  Var befor: dn_24
  Var after: dn_18
  Incr POS: orig biv
  IV struct:
    Type:       int *
    Base:       (int *) _3
    Step:       4
    Biv:        N
    Overflowness wrto loop niter:       Overflow

loop in f4:

Selected IV set for loop 1 at pr96201.c:34, 10 avg niters, 2 IVs:
Candidate 6:
  Var befor: sn_26
  Var after: sn_20
  Incr POS: orig biv
  IV struct:
    Type:       int *
    Base:       sn_14
    Step:       4
    Object:     (void *) sn_14
    Biv:        N
    Overflowness wrto loop niter:       Overflow
Candidate 8:
  Var befor: dn_27
  Var after: dn_21
  Incr POS: orig biv
  IV struct:
    Type:       int *
    Base:       dn_16
    Step:       4
    Object:     (void *) dn_16
    Biv:        N
    Overflowness wrto loop niter:       Overflow

---

then it generate more instructions for f3 which pass_combine failed to combine
them.

---
loop in f3:

Trying 19 -> 22:
   19: r83:DI=r92:DI
   22: [r83:DI]=r89:SI
      REG_DEAD r89:SI
      REG_DEAD r83:DI
Can't combine i2 into i3

Trying 21 -> 22:
   21: r89:SI=[r93:DI]
      REG_DEAD r93:DI
   22: [r83:DI]=r89:SI
      REG_DEAD r89:SI
      REG_DEAD r83:DI
Failed to match this instruction:
(set (mem:SI (reg/v/f:DI 83 [ dn ]) [1 *dn_2+0 S4 A32])
    (mem:SI (reg/f:DI 93 [ _20 ]) [1 *_20+0 S4 A32]))

Trying 18, 21 -> 22:
   18: {r93:DI=r92:DI+r102:DI;clobber flags:CC;}
      REG_UNUSED flags:CC
   21: r89:SI=[r93:DI]
      REG_DEAD r93:DI
   22: [r83:DI]=r89:SI
      REG_DEAD r89:SI
      REG_DEAD r83:DI
Can't combine i1 into i3

Trying 21, 19 -> 22:
   21: r89:SI=[r93:DI]
      REG_DEAD r93:DI
   19: r83:DI=r92:DI
   22: [r83:DI]=r89:SI
      REG_DEAD r89:SI
      REG_DEAD r83:DI
Can't combine i1 into i3

(insn 18 16 19 4 (parallel [
            (set (reg/f:DI 93 [ _20 ])
                (plus:DI (reg/v/f:DI 92 [ dn ])
                    (reg:DI 102)))
            (clobber (reg:CC 17 flags))
        ]) 210 {*adddi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 19 18 20 4 (set (reg/v/f:DI 83 [ dn ])
        (reg/v/f:DI 92 [ dn ])) 74 {*movdi_internal}
     (nil))
(insn 20 19 21 4 (parallel [
            (set (reg/v/f:DI 92 [ dn ])
                (plus:DI (reg/v/f:DI 92 [ dn ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "pr96201.c":25:24 210 {*adddi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 21 20 22 4 (set (reg:SI 89 [ _9 ])
        (mem:SI (reg/f:DI 93 [ _20 ]) [1 *_20+0 S4 A32])) "pr96201.c":25:29 75
{*movsi_internal}
     (expr_list:REG_DEAD (reg/f:DI 93 [ _20 ])
        (nil)))
(insn 22 21 24 4 (set (mem:SI (reg/v/f:DI 83 [ dn ]) [1 *dn_2+0 S4 A32])
        (reg:SI 89 [ _9 ])) "pr96201.c":25:27 75 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 89 [ _9 ])
        (expr_list:REG_DEAD (reg/v/f:DI 83 [ dn ])
            (nil))))



loop in f4:

Trying 16, 18, 17 -> 19:
   16: {r89:DI=r89:DI+0x4;clobber flags:CC;}
      REG_UNUSED flags:CC
   18: r88:SI=[r89:DI-0x4]
   17: {r90:DI=r90:DI+0x4;clobber flags:CC;}
      REG_UNUSED flags:CC
   19: [r90:DI-0x4]=r88:SI
      REG_DEAD r88:SI
Successfully matched this instruction:
(parallel [
        (set (mem:SI (reg/v/f:DI 90 [ dn ]) [1 MEM[base: dn_21, offset: -4B]+0
S4 A32])
            (mem:SI (reg/v/f:DI 89 [ sn ]) [1 MEM[base: sn_20, offset: -4B]+0
S4 A32]))
        (set (reg/v/f:DI 90 [ dn ])
            (plus:DI (reg/v/f:DI 90 [ dn ])
                (const_int 4 [0x4])))
        (set (reg/v/f:DI 89 [ sn ])
            (plus:DI (reg/v/f:DI 89 [ sn ])
                (const_int 4 [0x4])))
    ])

(insn 16 15 17 3 (parallel [
            (set (reg/v/f:DI 89 [ sn ])
                (plus:DI (reg/v/f:DI 89 [ sn ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "pr96201.c":34:32 210 {*adddi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 17 16 18 3 (parallel [
            (set (reg/v/f:DI 90 [ dn ])
                (plus:DI (reg/v/f:DI 90 [ dn ])
                    (const_int 4 [0x4])))
            (clobber (reg:CC 17 flags))
        ]) "pr96201.c":34:24 210 {*adddi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (nil)))
(insn 18 17 19 3 (set (reg:SI 88 [ _9 ])
        (mem:SI (plus:DI (reg/v/f:DI 89 [ sn ])
                (const_int -4 [0xfffffffffffffffc])) [1 MEM[base: sn_20,
offset: -4B]+0 S4 A32])) "pr96201.c":34:29 75 {*movsi_internal}
     (nil))
(insn 19 18 21 3 (set (mem:SI (plus:DI (reg/v/f:DI 90 [ dn ])
                (const_int -4 [0xfffffffffffffffc])) [1 MEM[base: dn_21,
offset: -4B]+0 S4 A32])
        (reg:SI 88 [ _9 ])) "pr96201.c":34:27 75 {*movsi_internal}
     (expr_list:REG_DEAD (reg:SI 88 [ _9 ])
        (nil)))


---

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/96201] x86 movsd/movsq string instructions and alignment inference
  2020-07-15  1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
  2020-07-15  5:42 ` [Bug target/96201] " crazylht at gmail dot com
@ 2020-09-15 11:15 ` amker at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: amker at gcc dot gnu.org @ 2020-09-15 11:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96201

bin cheng <amker at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amker at gcc dot gnu.org

--- Comment #2 from bin cheng <amker at gcc dot gnu.org> ---
Reason is that memory references in f3 are not identified as address type IV
uses.  I don't remember details, but it's intended by below commit:
commit 653a4b32fe72e33bfd4cdd4c25493049524a3805
Author: Bin Cheng <bin.cheng@arm.com>
Date:   Thu Mar 2 11:25:11 2017 +0000

    re PR tree-optimization/66768 (address space gets lost on literal pointer)

            PR tree-optimization/66768
            * tree-ssa-loop-ivopts.c (find_interesting_uses_address): Skip addr
            iv_use if base object can't be determined.

            gcc/testsuite
            * gcc.target/i386/pr66768.c: New test.

    From-SVN: r245837

For f1/f2, IVOPTs fails to identify base object because pointers are converted
from integer.  We need to tell the difference better.

For f3, __builtin_assume_aligned is optimized away by GCC-10 before IVOPTs.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-09-15 11:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15  1:50 [Bug target/96201] New: x86 movsd/movsq string instructions and alignment inference michaeljclark at mac dot com
2020-07-15  5:42 ` [Bug target/96201] " crazylht at gmail dot com
2020-09-15 11:15 ` amker at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).