public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/50489] New: [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2
@ 2011-09-22 19:22 gary at intrepid dot com
  2011-09-22 19:30 ` [Bug rtl-optimization/50489] " gary at intrepid dot com
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: gary at intrepid dot com @ 2011-09-22 19:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489

             Bug #: 50489
           Summary: [UPC/IA64] mis-schedule of MEM ref with
                    -ftree-vectorize and -fschedule-insns2
    Classification: Unclassified
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: gary@intrepid.com
            Target: IA64


After a change in GUPC's tree-lowering pass made a couple of months back (that
simplified the tree code being generated), we saw regressions where several
small test cases were failing on an IA64 target (SGI Altix, running SUSE).

We have been unable so far to reduce this to a "C" only test case that
demonstrates the problem, so we are submitting this as a "UPC" bug report,
along with a script that will build the UPC compiler from the GUPC branch, and
create the various bug artifacts.  Perhaps someone knowledgeable with the
instruction scheduler will understand how this mis-scheduling happens and
either reproduce the issue as a "C" test case, or propose a patch.

We also do not know at this time if the UPC compiler should be generating
memory barriers or generate some other metadata to avoid this mis-scheduling,
and would appreciate any suggestions in that regard.

The attached UPC test case works fine when "-O2 -ftree-vectorize
-fno-schedule-insns2" is asserted, but demonstrates a mis-schedule when "-O2
-ftree-vectorize" is asserted.

The following description is copied from the
README-ia64-upc-sched-insn2-bug.txt file that is included in the attached zip
file as well.

Background
----------

On a 64-bit target (using the "struct PTS" configuration),
the UPC compiler represents UPC pointer-to-shared values 
as 128-bit structures with three fields: (vaddr, thread, phase)
as shown in the declaration below.

typedef struct shared_ptr_struct
  {
    void     *vaddr;
    uint32_t thread;
    uint32_t phase;
  } upc_shared_ptr_t
  __attribute__ ((aligned (16)))
  ;

In: ia64-upc-vaddr-bug.upc.143t.optimized, there is the following
sequence of tree statements.

  unsigned int D.3062;
  unsigned int D.3061;
  shared [8] struct foo * D.3060;
  shared [8] struct foo[1] * D.3059;
  struct upc_shared_ptr_t D.3058;
  unsigned int D.3057;

  D.3057_10 = D.3056_9 * 8;
  D.3058.vaddr = &_u_barray;
  MEM[(struct upc_shared_ptr_t *)&D.3058 + 8B] = { 0, 0 };
  D.3059_11 = VIEW_CONVERT_EXPR<shared [8] struct foo[1] *>(D.3058);
  D.3060_12 = (shared [8] struct foo *) D.3059_11;
  D.3061_13 = VIEW_CONVERT_EXPR<struct upc_shared_ptr_t>(D.3060_12).phase;
  D.3062_14 = D.3057_10 + D.3061_13;

D.3059_11 and D.3060_12 are UPC pointers-to-shared (PTS's);
these are 128-bit "fat" pointers with internal
{vaddr, thread, phase} fields.

D.3058 is a PTS representation struct that is initialized
to {&_u_barray, 0, 0}.  Note that D.3059_11 and D.3060_12
are copies of the PTS representation structure, D.3058
that have been recast into a UPC pointer-to-shared (PTS).

The casts above might impose inefficiencies, and there may
be ways to improve the code, but this is the current
tree code that is generated.

This assignment statement:
  D.3061_13 = VIEW_CONVERT_EXPR<struct upc_shared_ptr_t>(D.3060_12).phase;
extracts the 'phase' field from D.3060_12, which is a copy
of the value of D.3058.phase.  The value of D.3058.phase was
previously initialized to zero by the MEM[] assignment.
The fetched phase value D.3061_13 should be zero when this
assignment is executed.

Bug
---

It is this latter access to D.3060_12.phase that expands
into incorrect RTL after the selective instruction scheduling
pass is run.  The access to D.3060_12.phase is scheduled
ahead of the code that sets D.3058.phase.

Valid RTL
---------

The 'ok' and 'bug' compilations share the same RTL dump output all
the way through ia64-upc-vaddr-bug.upc.213r.compgotos.
In that file there RTL statements that are affected by the
mis-scheduling of instructions.  (additional notes added
as '#' comments):

# D.3058.vaddr = &_u_barray (the base address of barray.
#
# r34 was previously assigned the value of &_u_barray
# r47 = r12 + 32;
# r12 is the stack pointer and r47 points to the beginning
# of the D.3058 structure, which also happens to be the
# address of the first field, D.3058.vaddr.
# Therefore, r47 points to D.3058.vaddr

(insn 46 42 331 4 (set (mem/s/f/c:DI (reg/f:DI 47 r47 [532]) [2 D.3058.vaddr+0
S8 A128])
        (reg/f:DI 34 r34 [533])) ia64-upc-vaddr-bug.upc:11 5 {movdi_internal}
     (nil))

# This vector op assigns: {D.3058.thread = 0; D.3058.phase = 0;}
#
# This is done by using r46 as the destination address and r36 as the source.
# r46 = r12 + 40, which is the base address of D.3058.thread.
#                 (D.3058.phase is the field following the D.3058.thread)
# r36 was previously set to {0, 0}.

(insn 52 69 65 4 (set (mem/s/c:V2SI (reg/f:DI 46 r46 [534]) [3 MEM[(struct
upc_shared_ptr_t *)&D.3058 + 8B]+0 S8 A64])
        (reg:V2SI 36 r36 [536])) ia64-upc-vaddr-bug.upc:11 377
{*movv2si_internal}
     (nil))

# r37 = D.3085.phase by indirecting through r45
#
# r12 is the stack pointer
# D.3085.vaddr  starts at r12 + 32
# D.3085.thread starts at r12 + 40
# D.3085.phase  starts at r12 + 44
# r45 = (r12 + 44); where r12 is the stack pointer and 44
#                   is the offset of D.3085.phase
# Therefore, r45 points to D.3058.phase.

(insn 57 59 68 4 (set (reg:DI 37 r37)
        (zero_extend:DI (mem/s/j/c:SI (reg/f:DI 45 r45 [524]) [0
VIEW_CONVERT_EXPR<struct upc_shared_ptr_t>(D.3060_12).phase+0 S4 A32])))
ia64-upc-vaddr-bug.upc:11 136 {zero_extendsidi2}
     (nil))

Although there are intervening instructions, the key thing to note
is that the first two instructions (46 and 52) initialize the
contents of D.3085, and the instruction (57) fetches the
value of D.3085.phase.  This is a valid ordering.

Incorrect RTL: after instruction scheduling
-------------------------------------------

The file ia64-upc-vaddr-bug.upc.215r.mach dumps the RTL
*after* the selective scheduling pass has run.  Here, we
see that D.3058.phase is fetched *before* it is set by
the vector operation.

The following RTL is copied directly from the .mach dump file and
appears exactly in the order shown.

# r37 = D.3058.phase by indirecting through r45
# r45 points to D.3058.phase [r12 + 44]
#
# BUG: D.3058.phase has *not* been initialized at this point.
#

(insn:TI 57 507 506 4 (set (reg:DI 37 r37)
        (zero_extend:DI (mem/s/j/c:SI (reg/f:DI 45 r45 [524]) [0
VIEW_CONVERT_EXPR<struct upc_shared_ptr_t>(D.3060_12).phase+0 S4 A32])))
ia64-upc-vaddr-bug.upc:11 136 {zero_extendsidi2}
     (nil))

# Initialize {D.3058.thread = 0; D.3058.phase = 0} via a vector operation.
#
# BUG: this vector operation should precede the instruction that
# fetches D.3058.phase, but the instruction scheduler has incorrectly
# scheduled this vector assignment after the fetch.  It apparently
# did *not* notice that the memory vector beginning at r46 [r12 + 40] aliases
# both D.3058.thread and D.3058.phase and that r45 [r12 + 44] points
# to D.3058.phase, and therefore is being used to fetch
# the value of D.3058.phase.
#

(insn 52 501 17 4 (set (mem/s/c:V2SI (reg/f:DI 46 r46 [534]) [3 MEM[(struct
upc_shared_ptr_t *)&D.3058 + 8B]+0 S8 A64])
        (reg:V2SI 36 r36 [536])) ia64-upc-vaddr-bug.upc:11 377
{*movv2si_internal}
     (nil))


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-03-28  4:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-22 19:22 [Bug rtl-optimization/50489] New: [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2 gary at intrepid dot com
2011-09-22 19:30 ` [Bug rtl-optimization/50489] " gary at intrepid dot com
2011-09-22 19:33 ` gary at intrepid dot com
2011-09-23 10:11 ` amonakov at gcc dot gnu.org
2011-09-23 17:59 ` gary at intrepid dot com
2011-09-25 12:26 ` rguenth at gcc dot gnu.org
2011-09-25 20:06 ` gary at intrepid dot com
2011-10-17  3:04 ` gary at intrepid dot com
2012-08-20 20:54 ` olegendo at gcc dot gnu.org
2024-03-28  4:31 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).