From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-418756-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 9578 invoked by alias); 12 Jan 2016 11:22:21 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 9565 invoked by uid 89); 12 Jan 2016 11:22:20 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.8 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD,SPF_PASS,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 spammy=III, vii, VII, Graphite
X-HELO: fencepost.gnu.org
Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Tue, 12 Jan 2016 11:22:19 +0000
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59994)	by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256)	(Exim 4.82)	(envelope-from <rguenther@suse.de>)	id 1aIx1k-0004zp-IM	for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:16 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)	(envelope-from <rguenther@suse.de>)	id 1aIx1f-0006Sr-6d	for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:16 -0500
Received: from mx2.suse.de ([195.135.220.15]:52617)	by eggs.gnu.org with esmtp (Exim 4.71)	(envelope-from <rguenther@suse.de>)	id 1aIx1e-0006SV-Jo	for gcc-patches@gnu.org; Tue, 12 Jan 2016 06:22:11 -0500
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id CD19BAC01;	Tue, 12 Jan 2016 11:22:05 +0000 (UTC)
Date: Tue, 12 Jan 2016 11:22:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Tom de Vries <Tom_deVries@mentor.com>
cc: "gcc-patches@gnu.org" <gcc-patches@gnu.org>,     Sebastian Pop <sebpop@gmail.com>
Subject: Re: [PATCH, PR69110] Don't return NULL access_fns in dr_analyze_indices
In-Reply-To: <5694CF85.1020708@mentor.com>
Message-ID: <alpine.LSU.2.11.1601121218410.31122@t29.fhfr.qr>
References: <5694CF85.1020708@mentor.com>
User-Agent: Alpine 2.11 (LSU 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x (no timestamps) [generic]
X-Received-From: 195.135.220.15
X-SW-Source: 2016-01/txt/msg00686.txt.bz2

On Tue, 12 Jan 2016, Tom de Vries wrote:

> Hi,
> 
> This patch fixes PR69110, a wrong-code bug in autopar.
> 
> 
> I.
> 
> consider testcase test.c:
> ...
> #define N 1000
> 
> unsigned int i = 0;
> 
> static void __attribute__((noinline, noclone))
> foo (void)
> {
>   unsigned int z;
> 
>   for (z = 0; z < N; ++z)
>     ++i;
> }
> 
> extern void abort (void);
> 
> int
> main (void)
> {
>   foo ();
>   if (i != N)
>     abort ();
> 
>   return 0;
> }
> ...
> 
> When compiled with -O1 -ftree-parallelize-loops=2 -fno-tree-loop-im, the test
> fails:
> ...
> $ gcc test.c -O1 -ftree-parallelize-loops=2 -Wl,-rpath=$(pwd
> -P)//install/lib64 -fno-tree-loop-im
> $ ./a.out
> Aborted (core dumped)
> $
> ...
> 
> 
> II.
> 
> Before parloops, at ivcanon we have the loop body:
> ...
>   <bb 3>:
>   # z_10 = PHI <z_7(4), 0(2)>
>   # ivtmp_12 = PHI <ivtmp_2(4), 1000(2)>
>   i.1_4 = i;
>   _5 = i.1_4 + 1;
>   i = _5;
>   z_7 = z_10 + 1;
>   ivtmp_2 = ivtmp_12 - 1;
>   if (ivtmp_2 != 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> ...
> 
> There's a loop-carried dependency in i, that is, the read from i in iteration
> z == 1 depends on the write to i in iteration z == 0. So the loop cannot be
> parallelized. The test-case fails because parloops still parallelizes the
> loop.
> 
> 
> III.
> 
> Since the loop carried dependency is in-memory, it is not handled by the code
> analyzing reductions, since that code ignores the virtual phi.
> 
> So, AFAIU, this loop carried dependency should be handled by the dependency
> testing in loop_parallel_p. And loop_parallel_p returns true for this loop.
> 
> A comment in loop_parallel_p reads: "Check for problems with dependences.  If
> the loop can be reversed, the iterations are independent."
> 
> AFAIU, the loop order can actually be reversed. But, it cannot be executed in
> parallel.
> 
> So from this perspective, it seems in this case the comment matches the check,
> but the check is not sufficient.
> 
> 
> IV.
> 
> OTOH, if we replace the declaration of i with i[1], and replace the references
> of i with i[0], we see that loop_parallel_p fails.  So the loop_parallel_p
> check in this case seems sufficient, and there's something else that causes
> the check to fail in this case.
> 
> The difference is in the generated data ref:
> - in the 'i[1]' case, we set DR_ACCESS_FNS in dr_analyze_indices to
>   vector with a single element: access function 0.
> - in the 'i' case, we set DR_ACCESS_FNS to NULL.
> 
> This difference causes different handling in the dependency generation, in
> particular in add_distance_for_zero_overlaps which has no effect for the 'i'
> case because  DDR_NUM_SUBSCRIPTS (ddr) == 0 (as a consequence of the NULL
> access_fns of both the source and sink data refs).
> 
> From this perspective, it seems that the loop_parallel_p check is sufficient,
> and that dr_analyze_indices shouldn't return a NULL access_fns for 'i'.
> 
> 
> V.
> 
> When compiling with graphite using -floop-parallelize-all --param
> graphite-min-loops-per-function=1, we find:
> ...
> [scop-detection-fail] Graphite cannot handle data-refs in stmt:
> # VUSE <.MEM_11>
> i.1_4 = i;
> ...
> 
> The function scop_detection::stmt_has_simple_data_refs_p returns false because
> of the code recently added for PR66980 at r228357:
> ...
>       int nb_subscripts = DR_NUM_DIMENSIONS (dr);
> 
>       if (nb_subscripts < 1)
> 	{
>           free_data_refs (drs);
>           return false;
>         }
> ...
> 
> [ DR_NUM_DIMENSIONS (dr) is 0 as a consequence of the NULL access_fns. ]
> 
> This code labels DR_NUM_DIMENSIONS (dr) == 0 as 'data reference analysis has
> failed'.
> 
> From this perspective, it seems that the dependence handling should bail out
> once it finds a data ref with DR_NUM_DIMENSIONS (dr) == 0 (or DR_ACCESS_FNS ==
> 0).
> 
> 
> VI.
> 
> This test-case used to pass in 4.6 because in find_data_references_in_stmt we
> had:
> ...
>       /* FIXME -- data dependence analysis does not work correctly for
>          objects with invariant addresses in loop nests.  Let us fail
>          here until the problem is fixed.  */
>       if (dr_address_invariant_p (dr) && nest)
> 	{
>           free_data_ref (dr);
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file,
>                      "\tFAILED as dr address is invariant\n");
>           ret = false;
>           break;
> 	}
> ...
> 
> That FIXME was removed in the fix for PR46787, at r175704.
> 
> The test-case fails in 4.8, and I guess from there onwards.
> 
> 
> VII.
> 
> The attached patch fixes the problem by returning a zero access function for
> 'i' in dr_analyze_indices.
> 
> [ But I can also imagine a solution similar to the graphite fix:
> ...
> @@ -3997,6 +3999,12 @@ find_data_references_in_stmt
>        dr = create_data_ref (nest, loop_containing_stmt (stmt),
>                             ref->ref, stmt, ref->is_read);
>        gcc_assert (dr != NULL);
> +      if (DR_NUM_DIMENSIONS (dr) == 0)
> +       {
> +         datarefs->release ();
> +         return false;
> +       }
> +
>        datarefs->safe_push (dr);
>      }
>    references.release ();
> ...
> 
> I'm not familiar enough with the dependency analysis code to know where
> exactly this should be fixed. ]
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for trunk?
> 
> OK for release branches?

Bah - attachement [please post patches inline]

So without quote ;)

Doesnt' the same issue apply to 

> unsigned int *p;
>       
> static void __attribute__((noinline, noclone))
> foo (void)
> {         
>   unsigned int z;
>   
>   for (z = 0; z < N; ++z)
>     ++(*p);
> }

thus when we have a MEM_REF[p_1]?  SCEV will not analyze
its evolution to a POLYNOMIAL_CHREC and thus access_fns will
be NULL again.

I think avoiding a NULL access_fns is ok but it should be done
unconditionally, not only for the DECL_P case.

Thanks,
Richard.