From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8593 invoked by alias); 22 Sep 2011 19:17:29 -0000 Received: (qmail 8585 invoked by uid 22791); 22 Sep 2011 19:17:27 -0000 X-SWARE-Spam-Status: No, hits=-2.9 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 22 Sep 2011 19:17:12 +0000 From: "gary at intrepid dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug rtl-optimization/50489] New: [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2 Date: Thu, 22 Sep 2011 19:22:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: rtl-optimization X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: gary at intrepid dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-09/txt/msg01579.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50489 Bug #: 50489 Summary: [UPC/IA64] mis-schedule of MEM ref with -ftree-vectorize and -fschedule-insns2 Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned@gcc.gnu.org ReportedBy: gary@intrepid.com Target: IA64 After a change in GUPC's tree-lowering pass made a couple of months back (that simplified the tree code being generated), we saw regressions where several small test cases were failing on an IA64 target (SGI Altix, running SUSE). We have been unable so far to reduce this to a "C" only test case that demonstrates the problem, so we are submitting this as a "UPC" bug report, along with a script that will build the UPC compiler from the GUPC branch, and create the various bug artifacts. Perhaps someone knowledgeable with the instruction scheduler will understand how this mis-scheduling happens and either reproduce the issue as a "C" test case, or propose a patch. We also do not know at this time if the UPC compiler should be generating memory barriers or generate some other metadata to avoid this mis-scheduling, and would appreciate any suggestions in that regard. The attached UPC test case works fine when "-O2 -ftree-vectorize -fno-schedule-insns2" is asserted, but demonstrates a mis-schedule when "-O2 -ftree-vectorize" is asserted. The following description is copied from the README-ia64-upc-sched-insn2-bug.txt file that is included in the attached zip file as well. Background ---------- On a 64-bit target (using the "struct PTS" configuration), the UPC compiler represents UPC pointer-to-shared values as 128-bit structures with three fields: (vaddr, thread, phase) as shown in the declaration below. typedef struct shared_ptr_struct { void *vaddr; uint32_t thread; uint32_t phase; } upc_shared_ptr_t __attribute__ ((aligned (16))) ; In: ia64-upc-vaddr-bug.upc.143t.optimized, there is the following sequence of tree statements. unsigned int D.3062; unsigned int D.3061; shared [8] struct foo * D.3060; shared [8] struct foo[1] * D.3059; struct upc_shared_ptr_t D.3058; unsigned int D.3057; D.3057_10 = D.3056_9 * 8; D.3058.vaddr = &_u_barray; MEM[(struct upc_shared_ptr_t *)&D.3058 + 8B] = { 0, 0 }; D.3059_11 = VIEW_CONVERT_EXPR(D.3058); D.3060_12 = (shared [8] struct foo *) D.3059_11; D.3061_13 = VIEW_CONVERT_EXPR(D.3060_12).phase; D.3062_14 = D.3057_10 + D.3061_13; D.3059_11 and D.3060_12 are UPC pointers-to-shared (PTS's); these are 128-bit "fat" pointers with internal {vaddr, thread, phase} fields. D.3058 is a PTS representation struct that is initialized to {&_u_barray, 0, 0}. Note that D.3059_11 and D.3060_12 are copies of the PTS representation structure, D.3058 that have been recast into a UPC pointer-to-shared (PTS). The casts above might impose inefficiencies, and there may be ways to improve the code, but this is the current tree code that is generated. This assignment statement: D.3061_13 = VIEW_CONVERT_EXPR(D.3060_12).phase; extracts the 'phase' field from D.3060_12, which is a copy of the value of D.3058.phase. The value of D.3058.phase was previously initialized to zero by the MEM[] assignment. The fetched phase value D.3061_13 should be zero when this assignment is executed. Bug --- It is this latter access to D.3060_12.phase that expands into incorrect RTL after the selective instruction scheduling pass is run. The access to D.3060_12.phase is scheduled ahead of the code that sets D.3058.phase. Valid RTL --------- The 'ok' and 'bug' compilations share the same RTL dump output all the way through ia64-upc-vaddr-bug.upc.213r.compgotos. In that file there RTL statements that are affected by the mis-scheduling of instructions. (additional notes added as '#' comments): # D.3058.vaddr = &_u_barray (the base address of barray. # # r34 was previously assigned the value of &_u_barray # r47 = r12 + 32; # r12 is the stack pointer and r47 points to the beginning # of the D.3058 structure, which also happens to be the # address of the first field, D.3058.vaddr. # Therefore, r47 points to D.3058.vaddr (insn 46 42 331 4 (set (mem/s/f/c:DI (reg/f:DI 47 r47 [532]) [2 D.3058.vaddr+0 S8 A128]) (reg/f:DI 34 r34 [533])) ia64-upc-vaddr-bug.upc:11 5 {movdi_internal} (nil)) # This vector op assigns: {D.3058.thread = 0; D.3058.phase = 0;} # # This is done by using r46 as the destination address and r36 as the source. # r46 = r12 + 40, which is the base address of D.3058.thread. # (D.3058.phase is the field following the D.3058.thread) # r36 was previously set to {0, 0}. (insn 52 69 65 4 (set (mem/s/c:V2SI (reg/f:DI 46 r46 [534]) [3 MEM[(struct upc_shared_ptr_t *)&D.3058 + 8B]+0 S8 A64]) (reg:V2SI 36 r36 [536])) ia64-upc-vaddr-bug.upc:11 377 {*movv2si_internal} (nil)) # r37 = D.3085.phase by indirecting through r45 # # r12 is the stack pointer # D.3085.vaddr starts at r12 + 32 # D.3085.thread starts at r12 + 40 # D.3085.phase starts at r12 + 44 # r45 = (r12 + 44); where r12 is the stack pointer and 44 # is the offset of D.3085.phase # Therefore, r45 points to D.3058.phase. (insn 57 59 68 4 (set (reg:DI 37 r37) (zero_extend:DI (mem/s/j/c:SI (reg/f:DI 45 r45 [524]) [0 VIEW_CONVERT_EXPR(D.3060_12).phase+0 S4 A32]))) ia64-upc-vaddr-bug.upc:11 136 {zero_extendsidi2} (nil)) Although there are intervening instructions, the key thing to note is that the first two instructions (46 and 52) initialize the contents of D.3085, and the instruction (57) fetches the value of D.3085.phase. This is a valid ordering. Incorrect RTL: after instruction scheduling ------------------------------------------- The file ia64-upc-vaddr-bug.upc.215r.mach dumps the RTL *after* the selective scheduling pass has run. Here, we see that D.3058.phase is fetched *before* it is set by the vector operation. The following RTL is copied directly from the .mach dump file and appears exactly in the order shown. # r37 = D.3058.phase by indirecting through r45 # r45 points to D.3058.phase [r12 + 44] # # BUG: D.3058.phase has *not* been initialized at this point. # (insn:TI 57 507 506 4 (set (reg:DI 37 r37) (zero_extend:DI (mem/s/j/c:SI (reg/f:DI 45 r45 [524]) [0 VIEW_CONVERT_EXPR(D.3060_12).phase+0 S4 A32]))) ia64-upc-vaddr-bug.upc:11 136 {zero_extendsidi2} (nil)) # Initialize {D.3058.thread = 0; D.3058.phase = 0} via a vector operation. # # BUG: this vector operation should precede the instruction that # fetches D.3058.phase, but the instruction scheduler has incorrectly # scheduled this vector assignment after the fetch. It apparently # did *not* notice that the memory vector beginning at r46 [r12 + 40] aliases # both D.3058.thread and D.3058.phase and that r45 [r12 + 44] points # to D.3058.phase, and therefore is being used to fetch # the value of D.3058.phase. # (insn 52 501 17 4 (set (mem/s/c:V2SI (reg/f:DI 46 r46 [534]) [3 MEM[(struct upc_shared_ptr_t *)&D.3058 + 8B]+0 S8 A64]) (reg:V2SI 36 r36 [536])) ia64-upc-vaddr-bug.upc:11 377 {*movv2si_internal} (nil))