From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9578 invoked by alias); 12 Jan 2016 11:22:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 9565 invoked by uid 89); 12 Jan 2016 11:22:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=III, vii, VII, Graphite X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 12 Jan 2016 11:22:19 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59994) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1aIx1k-0004zp-IM for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aIx1f-0006Sr-6d for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:52617) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aIx1e-0006SV-Jo for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:11 -0500 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id CD19BAC01; Tue, 12 Jan 2016 11:22:05 +0000 (UTC) Date: Tue, 12 Jan 2016 11:22:00 -0000 From: Richard Biener To: Tom de Vries cc: "gcc-patches@gnu.org" , Sebastian Pop Subject: Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices In-Reply-To: <5694CF85.1020708@mentor.com> Message-ID: References: <5694CF85.1020708@mentor.com> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic] X-Received-From: 195.135.220.15 X-SW-Source: 2016-01/txt/msg00686.txt.bz2 On Tue, 12 Jan 2016, Tom de Vries wrote: > Hi, > > This patch fixes PR69110, a wrong-code bug in autopar. > > > I. > > consider testcase test.c: > ... > #define N 1000 > > unsigned int i = 0; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++i; > } > > extern void abort (void); > > int > main (void) > { > foo (); > if (i != N) > abort (); > > return 0; > } > ... > > When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test > fails: > ... > $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd > -P)//install/lib64 -fno-tree-loop-im > $ ./a.out > Aborted (core dumped) > $ > ... > > > II. > > Before parloops, at ivcanon we have the loop body: > ... > : > # z_10 = PHI > # ivtmp_12 = PHI > i.1_4 = i; > _5 = i.1_4 + 1; > i = _5; > z_7 = z_10 + 1; > ivtmp_2 = ivtmp_12 - 1; > if (ivtmp_2 != 0) > goto ; > else > goto ; > ... > > There's a loop-carried dependency in i, that is, the read from i in iteration > z == 1 depends on the write to i in iteration z == 0. So the loop cannot be > parallelized. The test-case fails because parloops still parallelizes the > loop. > > > III. > > Since the loop carried dependency is in-memory, it is not handled by the code > analyzing reductions, since that code ignores the virtual phi. > > So, AFAIU, this loop carried dependency should be handled by the dependency > testing in loop_parallel_p. And loop_parallel_p returns true for this loop. > > A comment in loop_parallel_p reads: "Check for problems with dependences. If > the loop can be reversed, the iterations are independent." > > AFAIU, the loop order can actually be reversed. But, it cannot be executed in > parallel. > > So from this perspective, it seems in this case the comment matches the check, > but the check is not sufficient. > > > IV. > > OTOH, if we replace the declaration of i with i[1], and replace the references > of i with i[0], we see that loop_parallel_p fails. So the loop_parallel_p > check in this case seems sufficient, and there's something else that causes > the check to fail in this case. > > The difference is in the generated data ref: > - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to > vector with a single element: access function 0. > - in the 'i' case, we set DR_ACCESS_FNS to NULL. > > This difference causes different handling in the dependency generation, in > particular in add_distance_for_zero_overlaps which has no effect for the 'i' > case because DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL > access_fns of both the source and sink data refs). > > From this perspective, it seems that the loop_parallel_p check is sufficient, > and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'. > > > V. > > When compiling with graphite using -floop-parallelize-all --param > graphite-min-loops-per-function=1, we find: > ... > [scop-detection-fail] Graphite cannot handle data-refs in stmt: > # VUSE <.MEM_11> > i.1_4 = i; > ... > > The function scop_detection::stmt_has_simple_data_refs_p returns false because > of the code recently added for PR66980 at r228357: > ... > int nb_subscripts = DR_NUM_DIMENSIONS (dr); > > if (nb_subscripts < 1) > { > free_data_refs (drs); > return false; > } > ... > > [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ] > > This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has > failed'. > > From this perspective, it seems that the dependence handling should bail out > once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS == > 0). > > > VI. > > This test-case used to pass in 4.6 because in find_data_references_in_stmt we > had: > ... > /* FIXME -- data dependence analysis does not work correctly for > objects with invariant addresses in loop nests. Let us fail > here until the problem is fixed. */ > if (dr_address_invariant_p (dr) && nest) > { > free_data_ref (dr); > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "\tFAILED as dr address is invariant\n"); > ret = false; > break; > } > ... > > That FIXME was removed in the fix for PR46787, at r175704. > > The test-case fails in 4.8, and I guess from there onwards. > > > VII. > > The attached patch fixes the problem by returning a zero access function for > 'i' in dr_analyze_indices. > > [ But I can also imagine a solution similar to the graphite fix: > ... > @@ -3997,6 +3999,12 @@ find_data_references_in_stmt > dr = create_data_ref (nest, loop_containing_stmt (stmt), > ref->ref, stmt, ref->is_read); > gcc_assert (dr != NULL); > + if (DR_NUM_DIMENSIONS (dr) == 0) > + { > + datarefs->release (); > + return false; > + } > + > datarefs->safe_push (dr); > } > references.release (); > ... > > I'm not familiar enough with the dependency analysis code to know where > exactly this should be fixed. ] > > Bootstrapped and reg-tested on x86_64. > > OK for trunk? > > OK for release branches? Bah - attachement [please post patches inline] So without quote ;) Doesnt' the same issue apply to > unsigned int *p; > > static void __attribute__((noinline, noclone)) > foo (void) > { > unsigned int z; > > for (z = 0; z < N; ++z) > ++(*p); > } thus when we have a MEM_REF[p_1]? SCEV will not analyze its evolution to a POLYNOMIAL_CHREC and thus access_fns will be NULL again. I think avoiding a NULL access_fns is ok but it should be done unconditionally, not only for the DECL_P case. Thanks, Richard.