Hi, This patch fixes PR69110, a wrong-code bug in autopar. I. consider testcase test.c: ... #define N 1000 unsigned int i = 0; static void __attribute__((noinline, noclone)) foo (void) { unsigned int z; for (z = 0; z < N; ++z) ++i; } extern void abort (void); int main (void) { foo (); if (i != N) abort (); return 0; } ... When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test fails: ... $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd -P)//install/lib64 -fno-tree-loop-im $ ./a.out Aborted (core dumped) $ ... II. Before parloops, at ivcanon we have the loop body: ... : # z_10 = PHI # ivtmp_12 = PHI i.1_4 = i; _5 = i.1_4 + 1; i = _5; z_7 = z_10 + 1; ivtmp_2 = ivtmp_12 - 1; if (ivtmp_2 != 0) goto ; else goto ; ... There's a loop-carried dependency in i, that is, the read from i in iteration z == 1 depends on the write to i in iteration z == 0. So the loop cannot be parallelized. The test-case fails because parloops still parallelizes the loop. III. Since the loop carried dependency is in-memory, it is not handled by the code analyzing reductions, since that code ignores the virtual phi. So, AFAIU, this loop carried dependency should be handled by the dependency testing in loop_parallel_p. And loop_parallel_p returns true for this loop. A comment in loop_parallel_p reads: "Check for problems with dependences. If the loop can be reversed, the iterations are independent." AFAIU, the loop order can actually be reversed. But, it cannot be executed in parallel. So from this perspective, it seems in this case the comment matches the check, but the check is not sufficient. IV. OTOH, if we replace the declaration of i with i[1], and replace the references of i with i[0], we see that loop_parallel_p fails. So the loop_parallel_p check in this case seems sufficient, and there's something else that causes the check to fail in this case. The difference is in the generated data ref: - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to vector with a single element: access function 0. - in the 'i' case, we set DR_ACCESS_FNS to NULL. This difference causes different handling in the dependency generation, in particular in add_distance_for_zero_overlaps which has no effect for the 'i' case because DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL access_fns of both the source and sink data refs). From this perspective, it seems that the loop_parallel_p check is sufficient, and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'. V. When compiling with graphite using -floop-parallelize-all --param graphite-min-loops-per-function=1, we find: ... [scop-detection-fail] Graphite cannot handle data-refs in stmt: # VUSE <.MEM_11> i.1_4 = i; ... The function scop_detection::stmt_has_simple_data_refs_p returns false because of the code recently added for PR66980 at r228357: ... int nb_subscripts = DR_NUM_DIMENSIONS (dr); if (nb_subscripts < 1) { free_data_refs (drs); return false; } ... [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ] This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has failed'. From this perspective, it seems that the dependence handling should bail out once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS == 0). VI. This test-case used to pass in 4.6 because in find_data_references_in_stmt we had: ... /* FIXME -- data dependence analysis does not work correctly for objects with invariant addresses in loop nests. Let us fail here until the problem is fixed. */ if (dr_address_invariant_p (dr) && nest) { free_data_ref (dr); if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "\tFAILED as dr address is invariant\n"); ret = false; break; } ... That FIXME was removed in the fix for PR46787, at r175704. The test-case fails in 4.8, and I guess from there onwards. VII. The attached patch fixes the problem by returning a zero access function for 'i' in dr_analyze_indices. [ But I can also imagine a solution similar to the graphite fix: ... @@ -3997,6 +3999,12 @@ find_data_references_in_stmt dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, stmt, ref->is_read); gcc_assert (dr != NULL); + if (DR_NUM_DIMENSIONS (dr) == 0) + { + datarefs->release (); + return false; + } + datarefs->safe_push (dr); } references.release (); ... I'm not familiar enough with the dependency analysis code to know where exactly this should be fixed. ] Bootstrapped and reg-tested on x86_64. OK for trunk? OK for release branches? Thanks, - Tom