From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32420 invoked by alias); 9 May 2017 10:09:07 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 32409 invoked by uid 89); 9 May 2017 10:09:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=conveniently, disambiguation, type2, Things X-HELO: mail-oi0-f53.google.com Received: from mail-oi0-f53.google.com (HELO mail-oi0-f53.google.com) (209.85.218.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 09 May 2017 10:09:04 +0000 Received: by mail-oi0-f53.google.com with SMTP id h4so81182642oib.3 for ; Tue, 09 May 2017 03:09:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Mgo1zWrR9RJoASG9vR/DUuqbKfiNdazcEyHQZnr4Wvg=; b=OQYeFPu/YAi1BDJ/QQcfqE4elds2EbqGu6wCxBJAF8rKR6JtmqgNn5A337g8mj8PP0 1XzzH9DnDNEbV35SiZ/O27CmJych97zieC5nhX/YLGxGgJPqBvPD3/z5W0+NfkLjOnUZ ZebTpLa31xhY5YA/F/Rfua2UXujgW/F0uoKP/blc0NIootMjn7jNxn1aJyKIwUQuf2Rv O2Vf4hVPzI/5YBxTf06NfBY86lYPeSO0RYyUCOZmUhXV7QjQyq5XIof79Ibm1/uVR62J /twk0pNZ6G9SFNdth4P/PTFqmoID7aV2+tgoOe76zjVOH6QkEVtbqj33nrkgWUnJb/yF 7/Wg== X-Gm-Message-State: AN3rC/6wFv9bKbtmN8hqQSbClPHTlYyxz+g8vk0KBCOkpx92wJs1prnP Bh0eJsyKtsVjS3pcHTWGjMCozJZvAg== X-Received: by 10.157.46.234 with SMTP id w97mr24344450ota.78.1494324545246; Tue, 09 May 2017 03:09:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.157.51.83 with HTTP; Tue, 9 May 2017 03:09:04 -0700 (PDT) In-Reply-To: <87a86sfsgg.fsf@linaro.org> References: <87tw52s73r.fsf@linaro.org> <87a86sfsgg.fsf@linaro.org> From: Richard Biener Date: Tue, 09 May 2017 10:10:00 -0000 Message-ID: Subject: Re: Handle data dependence relations with different bases To: Richard Biener , GCC Patches , Richard Sandiford Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-05/txt/msg00616.txt.bz2 On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford wrote: > Richard Biener writes: >> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >> wrote: >>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>> wrote: >>>> This patch tries to calculate conservatively-correct distance >>>> vectors for two references whose base addresses are not the same. >>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>> isn't guaranteed to occur. >>>> >>>> The motivating example is: >>>> >>>> struct s { int x[8]; }; >>>> void >>>> f (struct s *a, struct s *b) >>>> { >>>> for (int i = 0; i < 8; ++i) >>>> a->x[i] += b->x[i]; >>>> } >>>> >>>> in which the "a" and "b" accesses are either independent or have a >>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>> prevents vectorisation, so we can vectorise without an alias check. >>>> >>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>> >>>> void >>>> f (int a[][8], struct b[][8]) >>>> { >>>> for (int i = 0; i < 8; ++i) >>>> a[0][i] += b[0][i]; >>>> } >>>> >>>> I think this is valid because C11 6.7.6.2/6 says: >>>> >>>> For two array types to be compatible, both shall have compatible >>>> element types, and if both size specifiers are present, and are >>>> integer constant expressions, then both size specifiers shall have >>>> the same constant value. >>>> >>>> So if we access an array through an int (*)[8], it must have type X[8] >>>> or X[], where X is compatible with int. It doesn't seem possible in >>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>> >>>> However, Richard B said that (at least in gimple) we support arbitrary >>>> overlap of arrays and allow arrays to be accessed with different >>>> dimensionality. There are examples of this in PR50067. I've therefore >>>> only handled references that end in a structure field access. >>>> >>>> There are two ways of handling these dependences in the vectoriser: >>>> use them to limit VF, or check at runtime as before. I've gone for >>>> the approach of checking at runtime if we can, to avoid limiting VF >>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>> aren't allowed. >>>> >>>> The patch tests whether we queued an alias check with a dependence >>>> distance of X and then picked a VF <= X, in which case it's safe to >>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>> be called twice with different VF for the same loop, it's no longer >>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>> comp_alias_ddrs to check whether versioning is necessary. >>>> >>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>> >>> You seem to do your "fancy" thing but also later compute the old >>> base equality anyway (for same_base_p). It looks to me for this >>> case the new fancy code can be simply skipped, keeping num_dimensions >>> as before? >>> >>> + /* Try to approach equal type sizes. */ >>> + if (!COMPLETE_TYPE_P (type_a) >>> + || !COMPLETE_TYPE_P (type_b) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>> + break; >>> >>> ah, interesting idea to avoid a quadratic search. Note that you should >>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>> as they are used for type-punning. > > All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, > ARRAY_REFs or COMPONENT_REFs of structures, since that's all that > dr_analyze_indices allows, so I think we safe in terms of the tree codes. Yeah. I think we need to document that we should have a 1:1 match here. >>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>> ARRAY_REFs - but I also see there: >>> >>> /* ??? We cannot simply use the type of operand #0 of the refs here >>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>> for common blocks instead of using unions like everyone else. */ >>> tree type1 = DECL_CONTEXT (field1); >>> tree type2 = DECL_CONTEXT (field2); >>> >>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >>> You also may not use types_compatible_p here as for LTO that is _way_ too >>> lax for aggregates. The above uses >>> >>> /* We cannot disambiguate fields in a union or qualified union. */ >>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>> return false; >>> >>> so you should also bail out on unions here, rather than the check you do later. > > The loop stops before we get to a union, so I think "only" the RECORD_TYPE > COMPONENT_REF handling is a potential problem. Does this mean that > I should use the nonoverlapping_component_refs_of_decl_p code: > > tree field1 = TREE_OPERAND (ref1, 1); > tree field2 = TREE_OPERAND (ref2, 1); > > /* ??? We cannot simply use the type of operand #0 of the refs here > as the Fortran compiler smuggles type punning into COMPONENT_REFs > for common blocks instead of using unions like everyone else. */ > tree type1 = DECL_CONTEXT (field1); > tree type2 = DECL_CONTEXT (field2); > > /* We cannot disambiguate fields in a union or qualified union. */ > if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) > return false; > > if (field1 != field2) > { > /* A field and its representative need to be considered the > same. */ > if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 > || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) > return false; > /* Different fields of the same record type cannot overlap. > ??? Bitfields can overlap at RTL level so punt on them. */ > if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) > return false; > return true; > } > > as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p > during the new loop? Yes. OTOH you want to "match" while the above disambiguates. So it means you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL equality (which should be the same in the end). The RTL concern should not matter here. > And test for this as well as unions in the outer > references? So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, and you (should) stop the ref walk at DR_BASE_OBJECT. The dr_analyze_indices code is also somewhat fishy in that it simply ignores everything below unhandled component-refs even if there are indices involved (and it gets away with this because dependence analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test in case it is sth like a[i].union for example ... hopefully ...). >>> You seem to rely on getting an access_fn entry for each handled_component_p. >>> It looks like this is the case -- we even seem to stop at unions (with the same >>> fortran "issue"). I'm not sure that's the best thing to do but you >>> rely on that. > > Yeah, the loop is deliberately limited to the components associated with > an access_fn. I did wonder at first whether dr_analyze_indices should > store the original component reference trees for each access function. > That would make things simpler and more explicit, but would also eat up > more memory. Things like object_address_invariant_in_loop_p rely on the > access_fns in the same way that the loop in the patch does. in fact it fails to handle ARRAY_RANGE_REFs ... >>> I don't understand the looping, it needs more comments. You seem to be >>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>> is how many compatible refs you found on the way (with incompatible ones >>> not counting?!). What about an inner varying array of structs? This seems to >>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>> b[i].s.b[i].j? > > I'll try to improve the comments. But the idea is that both sequences are > as long as possible, while that still gives compatible types. If there is > more than one such sequence, we pick the one nearest the base. > > So in your example, the access functions would be: > > 0 1 2 3 4 > a: .j [i] .b .s [i] > > 0 1 2 3 4 5 > b: __real .j [i] .b .s [i] > > If a and b are pointers, the final access functions would be > unconstrained base accesses, so we'd end up with: > > a: [0, 3] > b: [1, 4] > > for both sequences. > >>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>> conveniently start from the other >>> end of the ref here. >> >> That said, for the motivational cases we either have one ref having >> more dimensions than the other (the __real vs. full complex access) or >> they have the same number of dimensions (and no access fn for the >> base). >> >> For the first case we should simply "drop" access_fns of the larger >> dimensional ref (from the start, plus outer component refs) up to the >> point the number of dimensions are equal. > > Yeah, that's what happens for your example. But if we had: > > a[i].s.c.d > __real b[i].s.b[i].j > > (where d is the same type as the real component) then the access > functions would be: > > 0 1 2 3 > a: .d .c .s [i] > > 0 1 2 3 4 5 > b: __real .j [i] .b .s [i] > > Comparing the a0/b2 column doesn't make sense, because one's an array > and the other is a structure. In this case the sequence we care about is: > > a: [1, 3] > b: [3, 5] > > which is what the loop gives. The a1/b3 column is the one that proves > there's no dependence. > >> Then we have the case of >> >> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >> >> where we have to punt. >> >> Then we have the case of >> >> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> >> which is where the new code should kick in to see if we can drop access_fns >> from the other end (as unanalyzable but either having distance zero or not >> aliased because of TBAA). >> >> At least your testcases suggest you do not want to handle >> >> struct s { int x[N]; }; >> struct r { struct s s; }; >> f (struct s *a, struct r *b) >> { >> for (i = 0; i < N; ++i) >> a->s.x[i] = b->x[i]; >> } >> >> ? >> >> With this example your loop which seems to search for a "common" >> sequence in (different) midst of the reference trees makes more sense >> (still that loop is awkward to understand). > > Yeah, I want to handle that too, just hadn't thought of it as a specific > testcase. The code does give the expected dependence distance of 0. Ok. I think the patch is reasonable, maybe the loop can be restructured / simplified a bit and handling of the union case for example be done first (by looking at DR_BASE_OBJECT). Thanks, Richard. > > Thanks, > Richard