From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 44262 invoked by alias); 26 Jun 2017 11:50:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 42669 invoked by uid 89); 26 Jun 2017 11:50:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f50.google.com Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 26 Jun 2017 11:50:09 +0000 Received: by mail-wm0-f50.google.com with SMTP id 62so5071702wmw.1 for ; Mon, 26 Jun 2017 04:50:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version; bh=hXOHsEk08OQgB6vxI6yR9mSG/RgMR2baWsOsfVlQhw0=; b=jKFR4mZos7L0/TGrkZk6a2wCKSPQRZ2ZrITfWh38CxoYBBRa6Qf/mRLg4uqVzM/ds1 uCAQF62y+33gzoFL+zyQrYay7QzuOWerY/ZWhLdsQu3PdNmkEPVhcZgqMwZ9oi6/S0Of XjSJTMdqfAJxcXDjreIfAGkSZqUq/NZhB+/Qbhq39SFZtG9NnQ9AviL8lfnHqHfWPzUI c9oQmc1VkN80NC7RKvCQdjDldcDCF2xeozkiHp2E9Ur1gsRD18dUfxb23U/SnN2Q/lVG v4KjXJs3XZZg1QNslsqZnHyUalilHumAQiJyHl24T3y+j9GDgZGfh5hWAE8gUrx0pQ0y F3Gw== X-Gm-Message-State: AKS2vOxz5hNzzO8a0KN4IdubQF7O4gRRMrRgX3qswFIUujcKqS3RO8UI 0JJA/7rfCghBzi/FoYiNJQ== X-Received: by 10.28.140.194 with SMTP id o185mr1013719wmd.87.1498477806424; Mon, 26 Jun 2017 04:50:06 -0700 (PDT) Received: from localhost ([2.26.27.176]) by smtp.gmail.com with ESMTPSA id n131sm8659507wmg.11.2017.06.26.04.50.04 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 26 Jun 2017 04:50:05 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,GCC Patches , richard.sandiford@linaro.org Cc: GCC Patches Subject: Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs References: <87k244z2c5.fsf@linaro.org> <8760fmgb90.fsf@linaro.org> <874lv36lww.fsf@linaro.org> Date: Mon, 26 Jun 2017 11:50:00 -0000 In-Reply-To: (Richard Biener's message of "Mon, 26 Jun 2017 13:36:20 +0200") Message-ID: <87zicv55p0.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-06/txt/msg01922.txt.bz2 Richard Biener writes: > On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford > wrote: >> Richard Biener writes: >>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford >>> wrote: >>>> Richard Biener writes: >>>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>>>> wrote: >>>>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>>>> >>>>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>>>> >>>>>> We knew that the two DRs had the same misalignment at runtime, but when >>>>>> considered in isolation, one data reference guaranteed a higher compile-time >>>>>> base alignment than the other. >>>>>> >>>>>> In the test case this looks like a missed opportunity. Both references >>>>>> are unconditional, so it should be possible to use the highest of the >>>>>> available base alignment guarantees when analyzing each reference. >>>>>> The patch does this. >>>>>> >>>>>> However, as the comment in the patch says, the base alignment guarantees >>>>>> provided by a conditional reference only apply if the reference occurs >>>>>> at least once. In this case it would be legitimate for two references >>>>>> to have the same runtime misalignment and for one reference to provide a >>>>>> stronger compile-time guarantee than the other about what the misalignment >>>>>> actually is. The patch therefore relaxes the assert to handle that case. >>>>> >>>>> Hmm, but you don't actually check whether a reference occurs only >>> conditional, >>>>> do you? You just seem to say that for masked loads/stores the reference >>>>> is conditional (I believe that's not true). But for a loop like >>>>> >>>>> for (;;) >>>>> if (a[i]) >>>>> sum += b[j]; >>>>> >>>>> you still assume b[j] executes unconditionally? >>>> >>>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >>>> was supposed to mean "even if the containing statement executes >>>> and runs to completion, the reference might not actually occur". >>>> The example above isn't conditional in that sense because the >>>> reference to b[j] does occur if the store is reached and completes. >>>> >>>> Masked loads and stores are conditional in that sense though. >>>> The reference only occurs if the mask is nonzero; the memory >>>> isn't touched otherwise. The functions are used to if-convert >>>> things like: >>>> >>>> for (...) >>>> a[i] = b[i] ? c[i] : d[i]; >>>> >>>> where there's no guarantee that it's safe to access c[i] when !b[i] >>>> (or d[i] when b[i]). No reference occurs for an all-false mask. >>> >>> But as you touch generic data-ref code here you should apply more >>> sensible semantics to DR_IS_CONDITIONAL than just marking >>> masked loads/stores but not DRs occuring inside BBs only executed >>> conditionally ... >> >> I don't see why that's more sensible though. If a statement is only >> conditionally executed in a loop, it's up to the consumer to decide >> what to do about that. The conditions under which the statement >> is reached are a control-flow issue and tree-data-ref.c doesn't >> have any special information about it. >> >> Masked loads and stores are special because the DR_REFs created by >> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the >> original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely >> because they're not guaranteed to happen, even if the load or store >> statement itself is executed. So in this case the DR_IS_CONDITIONAL >> is reflecting something that tree-data-ref.c itself has done. >> >> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the >> general-sounding name? > > That sounds better and avoids the ambiguity. OK. >>>>> The vectorizer of course only sees unconditionally executed stmts. >>>>> >>>>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>>>> any real-world (testsuite) issues without this? >>>> >>>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >>>> assumptions in silly corner cases. I could add a scan test for it, >>>> for targets with masked loads and stores. It wouldn't trigger >>>> an execution failure though because we assume that targets with >>>> masked loads and stores allow unaligned accesses: >>>> >>>> /* For now assume all conditional loads/stores support unaligned >>>> access without any special code. */ >>>> if (is_gimple_call (stmt) >>>> && gimple_call_internal_p (stmt) >>>> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >>>> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >>>> return dr_unaligned_supported; >>>> >>>> So the worst that would happen is that we'd supposedly peel for >>>> alignment, but actually misalign everything instead, and so make >>>> things slower rather than quicker. >>>> >>>>> Note that the assert is to prevent bogus information. Iff we aligned >>>>> DR with base alignment 8 and misalign 3 then if another same-align >>>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>>>> as it still can be 8 after aligning DR. >>>>> >>>>> So I think it's wrong to put DRs with differing base-alignment into >>>>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>>>> updated independenlty after peeling. >>>> >>>> DR_MISALIGNMENT is relative to the vector alignment rather than >>>> the base alignment though. So: >>> >>> We seem to use it that way, yes (looking at set_ptr_info_alignment >>> uses). So why not fix the assert then by capping the alignment/misalignment >>> we compute at this value as well? (and document this in the header >>> around DR_MISALIGNMENT) >>> >>> Ideally we'd do alignment analysis independent of the vector size >>> though (for those stupid targets with multiple vector sizes to consider...). >>> >>>> a) when looking for references *A1 and *A2 with the same alignment, >>>> we simply have to prove that A1 % vecalign == A2 % vecalign. >>>> This doesn't require any knowledge about the base alignment. >>>> If we break the addresses down as: >>>> >>>> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >>>> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >>>> >>>> and can prove that BASE1 == BASE2, the alignment of that base >>>> isn't important. We simply need to prove that REST1 % vecalign >>>> == REST2 % vecalign for all X. >>>> >>>> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >>>> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >>>> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >>>> rely on the base alignment being known. >>>> >>>> What a high base alignment for DR_PEEL gives us is the ability to know >>>> at compile how many iterations need to be peeled to make DR_PEEL aligned. >>>> But the points above apply regardless of whether we know that value at >>>> compile time or not. >>>> >>>> In examples like the test case, we would have known at compile time that >>>> VF-1 iterations would need to be peeled if we'd picked the store as the >>>> DR_PEEL, but would have treated the number of peels as variable if we'd >>>> picked the load. The value calculated at runtime would still have been >>>> VF-1, it's just that the code wouldn't have been as efficient. >>>> >>>> One of the benefits of pooling the alignments for unconditional references >>>> is that it no longer matters which DR we pick: the number of peels will >>>> be a compile-time constant both ways. >>>> >>>> Thanks, >>>> Richard >>>> >>>>> I'd rather not mix fixing this with the improvement to eventuall use a >>>>> larger align for the other DR if possible. >>> >>> ^^^ >>> >>> So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? >> >> I don't think the problem is the lack of a cap. In the test case we >> see that: >> >> 1. B is known at compile time to be X * vecsize + Y when considered in >> isolation, because the base alignment derived from its DR_REF >= vecsize. >> So DR_MISALIGNMENT (B) == Y. >> >> 2. A's misalignment wrt vecsize is not known at compile time when >> considered in isolation, because no useful base alignment can be >> derived from its DR_REF. (The DR_REF is to a plain int rather than >> to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. >> >> 3. A and B when considered as a pair trivially have the same misalignment >> wrt vecsize, for the reasons above. >> >> Each of these results is individually correct. The problem is that the >> assert is conflating two things: it's saying that if we know two datarefs >> have the same misaligment, we must either be able to calculate a >> compile-time misalignment for both datarefs in isolation, or we must >> fail to calculate a compile-time misalignment for both datarefs in >> isolation. That isn't true: it's valid to have situations in which the >> compile-time misalignment is known for one dataref in isolation but not >> for the other. > > True. So the assert should then become > > gcc_assert (! known_alignment_for_access_p (dr) > || DR_MISALIGNMENT (dr) / dr_size == > DR_MISALIGNMENT (dr_peel) / dr_peel_size); > > ? I think it would need to be: gcc_assert (!known_alignment_for_access_p (dr) || !known_alignment_for_access_p (dr_peel) || (DR_MISALIGNMENT (dr) / dr_size == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); But yeah, that would work too. The idea with the assert in the patch was that for unconditional references we probably *do* want to try to compute the same compile-time misalignment, but for optimisation reasons rather than correctness. Maybe that's more properly a gcc_checking_assert though, since nothing goes wrong if it fails. So perhaps: gcc_checking_assert (DR_IS_CONDITIONAL_IN_STMT (dr) || DR_IS_CONDITIONAL_IN_STMT (dr_peel) || (known_alignment_for_access_p (dr) == known_alignment_for_access_p (dr_peel))); as a follow-on assert. Should I split it into two patches, one to change the gcc_assert and another to add the optimisation? Thanks, Richard