From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 66689 invoked by alias); 19 Sep 2016 08:18:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 66662 invoked by uid 89); 19 Sep 2016 08:18:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RCVD_IN_SORBS_SPAM,SPF_PASS,URIBL_RED autolearn=ham version=3.3.2 spammy=you! X-Spam-User: qpsmtpd, 2 recipients X-HELO: mail-lf0-f54.google.com Received: from mail-lf0-f54.google.com (HELO mail-lf0-f54.google.com) (209.85.215.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Sep 2016 08:18:38 +0000 Received: by mail-lf0-f54.google.com with SMTP id l131so99853495lfl.2; Mon, 19 Sep 2016 01:18:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=ptKzVGVGFD+67jKxUIX+6ATxO1fPSdpB2kRvmcoHzWk=; b=M1+C/YB31OJ6tj3s+KgK3bEub+q3ggLPAIsj7nZxImssKuQJ+xxzuRGG2C4yZkHHpH BIfWX8dMJFO6dXPNimUVMd/Ruj1E7ZLK0RJF+loTN4euNQYyhR0fyaS8Y5WwMZAIEe3U dtxw9HjNDTBZ9/eINNy7Qy22H/OL8NSPa1kCeS7VJJR3p5qB9A4FIzy3KWVomrzGTMgH uQ93qkDAZ/ufe0/BQblGD0YkEZZDxZY/yXiPF7i84OSShTCpu4HZm0HKP1SoQiVWL4/K NvMUi3LKeWHuJdsApqSJh2uyia2x6DL7hyUUTvYwWmsHUWfer4bm2doTuYp5THIE0KCG KQOg== X-Gm-Message-State: AE9vXwMSTkV9Gnj3dEK15Qc1o+1uP41lRy53RifUwg7/GSRZKnAb33ahVfTXEsxvYrc/VaUxpZ04oSin4Tde8Q== X-Received: by 10.194.0.116 with SMTP id 20mr22508149wjd.123.1474273116155; Mon, 19 Sep 2016 01:18:36 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.137.129 with HTTP; Mon, 19 Sep 2016 01:18:35 -0700 (PDT) In-Reply-To: <87poo4as2j.fsf@hertz.schwinge.homeip.net> References: <20160817154244.GA39270@arm.com> <87h99s53ls.fsf@kepler.schwinge.homeip.net> <87d1kefwgt.fsf@hertz.schwinge.homeip.net> <87twdgb9zi.fsf@hertz.schwinge.homeip.net> <87poo4as2j.fsf@hertz.schwinge.homeip.net> From: Richard Biener Date: Mon, 19 Sep 2016 08:53:00 -0000 Message-ID: Subject: Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types (was: Advice sought for debugging a lto1 ICE (was: Implement C _FloatN, _FloatNx types [version 6])) To: Thomas Schwinge Cc: GCC Patches , Joseph Myers , GCC Development , Jakub Jelinek , Bernd Schmidt Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes X-SW-Source: 2016-09/txt/msg01147.txt.bz2 On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge wrote: > Hi! > > On Fri, 16 Sep 2016 10:59:16 +0200, Richard Biener wrote: >> On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge >> wrote: >> > On Thu, 08 Sep 2016 13:43:30 +0200, I wrote: >> >> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener wrote: >> >> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge wrote: >> >> > > As I noted in : >> >> > > >> >> > > As of the PR32187 commit r239625 "Implement C _FloatN, _Float= Nx types", nvptx >> >> > > offloading is broken, ICEs in LTO stream-in. Probably some k= ind of data-type >> >> > > mismatch that is not visible with Intel MIC offloading (using= the same data >> >> > > types) but explodes with nvptx. I'm having a look. >> > >> >> > [...] preload_common_nodes. This is carefully crafted to _not_ div= erge by >> >> > frontend (!) it wasn't even designed to cope with global trees bein= g present >> >> > or not dependent on target (well, because the target is always the >> >> > same! mind you!) >> >> >> >> Scary. ;-/ >> >> >> >> > Now -- in theory it should deal with NULLs just fine (resulting in >> >> > error_mark_node), but it can diverge when there are additional >> >> > compount types (like vectors, complex >> >> > or array or record types) whose element types are not in the set of >> >> > global trees. >> >> > The complex _FloatN types would be such a case given they appear be= fore their >> >> > components. That mixes up the ordering at least. >> >> >> >> ACK, but that's also an issue for "regular" float/complex float, which >> >> also is in "reverse" order. But that's "fixed" by the recursion in >> >> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) =3D=3D >> >> COMPLEX_TYPE". This likewise seems to work for the _FloatN types. >> > >> > So, that mechanism does work, but what's going wrong is the following: >> > with differing target vs. offload target, we potentially (and actually >> > do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and >> > _FloatNx types. That is, for nvptx, a few of these don't exist (so, >> > NULL_TREE), and so it follows their complex variants also don't exist, >> > and thus the recursion that I just mentioned for complex types is no >> > longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we >> > get an offset (of two, in this specific case), and consequently stream= ing >> > explodes, for example, as soon as it hits a forward-reference (due to >> > looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when = it >> > should be looking for tree 183 (nvptx lto1 view). >> > >> >> (I've >> >> put "fixed" in quotes -- doesn't that recursion mean that we're thus >> >> putting "complex float", "float", [...], "float" (again) into the cac= he? >> >> Anyway that's for later...) >> > >> > Maybe it would make sense to do this tree streaming in two passes: fir= st >> > build a set of what we actually need, and then stream that, without >> > duplicates. (Or, is also for these "pickled" trees their order releva= nt, >> > so that one tree may only refer to other earlier but not later ones? >> > Anyway, we could still remember the set of trees already streamed, and >> > avoid the double streaming I described?) >> > >> > So I now understand that due to the stream format, the integer tree IDs >> > (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have = to >> > make that work for x86_64 target vs. nvptx offload target being >> > different. (I'm pondering some ideas about how to rework that integer >> > tree ID generation.) >> > >> > (I have not digested completely yet what the implications are for the >> > skipping we're doing for some trees in preload_common_nodes), but here= is >> > a first patch that at least gets us rid of the immediate problem. (I'= ll >> > fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to >> > gcc/tree-core.h, OK?) Will such a patch be OK for trunk, at least for >> > now? >> >> Humm ... do we anywhere compare to those global trees by pointer equival= ence? > > I have not verified that. Does GCC permit/forbid that on a case-by-case > basis -- which seems very fragile to me? > >> If so then it breaks LTO support for those types. > > OK, I think I understand that -- but I do have a "lto_stream_offload_p" > conditional in my code changes, so these changes should only affect the > offloading stream, in my understanding? > >> I think forcing proper ordering so that we can assert that at the >> point we'd like >> to recurse the nodes we recurse for are already in the cache would be a = better >> fix. > > ACK. That's what I'd planned to look into as a next step. > >> This might need some additional global trees in case components do not >> explicitely exist > > That's what I was afraid of: for example, I can't tell if it holds for > all GCC configurations (back ends), that complex types' component types > will always match one of the already existing global trees? (I can > certainly imagine some "strange" ISAs/data types breaking this assertion > -- no idea whether GCC is to support such things.) The conservative fix > for that appears to be to add a new global tree for every complex types' > component type -- which in "sane" configurations/ISAs would always match > one of the already existing ones... > > Or, of course, we need to change the tree cache ID format, so that > they're not allocated using a monotonically increasing step-one counter > -- explicitly allowing the cc1/lto1s (etc.) to differ? Conceptually a > bit like (but differently) that we allow the mode table to differ, > streamer_mode_table. > >> and it might need re-ordering of a few globals. >> Can you try if all hell breaks lose if you change the recursions to inst= ead do >> >> gcc_assert (streamer_tree_cache_lookup (cache, , = &tem)); >> >> ? > > Yes, ;-) because in certain cases when we reach this code, > cache->node_map is NULL, and so streamer_tree_cache_lookup can't be > called. Anyway, something like the following? (Likely, something needs > to be done to handle more effectively the "!cache->node_map" case, > instead of the linear search.) But such a patch would again solve the > problem only for complex types, so far, and only if that complex > type/component type concern raised above is actually moot. Anyway, with > the following "step II" patch, offloading appears to work (but I have not > tested the _FloatN, _FloatNx types), and no breakage in lto.exp either, > but only lightly tested so far. > > commit cad8fb0754797927ecdd32210d5f981872635b1a > Author: Thomas Schwinge > Date: Fri Sep 16 14:58:06 2016 +0200 > > [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _Float= Nx types, step II > --- > gcc/tree-core.h | 31 +++++++++++++++++-------------- > gcc/tree-streamer.c | 23 ++++++++++++++++------- > 2 files changed, 33 insertions(+), 21 deletions(-) > > diff --git gcc/tree-core.h gcc/tree-core.h > index 8b3e5cc..0d4653a 100644 > --- gcc/tree-core.h > +++ gcc/tree-core.h > @@ -553,20 +553,6 @@ enum tree_index { > TI_BOOLEAN_FALSE, > TI_BOOLEAN_TRUE, > > - TI_COMPLEX_INTEGER_TYPE, > - TI_COMPLEX_FLOAT_TYPE, > - TI_COMPLEX_DOUBLE_TYPE, > - TI_COMPLEX_LONG_DOUBLE_TYPE, > - > - TI_COMPLEX_FLOAT16_TYPE, > - TI_COMPLEX_FLOATN_NX_TYPE_FIRST =3D TI_COMPLEX_FLOAT16_TYPE, > - TI_COMPLEX_FLOAT32_TYPE, > - TI_COMPLEX_FLOAT64_TYPE, > - TI_COMPLEX_FLOAT128_TYPE, > - TI_COMPLEX_FLOAT32X_TYPE, > - TI_COMPLEX_FLOAT64X_TYPE, > - TI_COMPLEX_FLOAT128X_TYPE, > - > TI_FLOAT_TYPE, > TI_DOUBLE_TYPE, > TI_LONG_DOUBLE_TYPE, > @@ -596,6 +582,23 @@ enum tree_index { > - TI_FLOATN_NX_TYPE_FIRST \ > + 1) > > + /* Put the complex types after their component types, so that in (sequ= ential) > + tree streaming we can assert that their component types have alread= y been > + handled (see tree-streamer.c:record_common_node). */ > + TI_COMPLEX_INTEGER_TYPE, > + TI_COMPLEX_FLOAT_TYPE, > + TI_COMPLEX_DOUBLE_TYPE, > + TI_COMPLEX_LONG_DOUBLE_TYPE, > + > + TI_COMPLEX_FLOAT16_TYPE, > + TI_COMPLEX_FLOATN_NX_TYPE_FIRST =3D TI_COMPLEX_FLOAT16_TYPE, > + TI_COMPLEX_FLOAT32_TYPE, > + TI_COMPLEX_FLOAT64_TYPE, > + TI_COMPLEX_FLOAT128_TYPE, > + TI_COMPLEX_FLOAT32X_TYPE, > + TI_COMPLEX_FLOAT64X_TYPE, > + TI_COMPLEX_FLOAT128X_TYPE, > + > TI_FLOAT_PTR_TYPE, > TI_DOUBLE_PTR_TYPE, > TI_LONG_DOUBLE_PTR_TYPE, If the above change alone fixes your issues then it is fine to commit. > diff --git gcc/tree-streamer.c gcc/tree-streamer.c > index 061d831..2a5538e 100644 > --- gcc/tree-streamer.c > +++ gcc/tree-streamer.c > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d *ca= che, tree node) > streamer_tree_cache_append (cache, node, cache->nodes.length ()); > > if (POINTER_TYPE_P (node) > - || TREE_CODE (node) =3D=3D COMPLEX_TYPE > || TREE_CODE (node) =3D=3D ARRAY_TYPE) > record_common_node (cache, TREE_TYPE (node)); > + else if (TREE_CODE (node) =3D=3D COMPLEX_TYPE) > + { > + /* Assert that complex types' component types have already been ha= ndled > + (and we thus don't need to recurse here). See PR lto/77458. */ > + if (cache->node_map) > + gcc_assert (streamer_tree_cache_lookup (cache, TREE_TYPE (node), = NULL)); > + else > + { > + gcc_assert (cache->nodes.exists ()); > + bool found =3D false; > + for (unsigned i =3D 0; !found && i < cache->nodes.length (); ++= i) > + found =3D true; hmm, this doesn't actually test anything? ;) > + gcc_assert (found); > + } > + } > else if (TREE_CODE (node) =3D=3D RECORD_TYPE) > { > /* The FIELD_DECLs of structures should be shared, so that every > @@ -333,12 +347,7 @@ preload_common_nodes (struct streamer_tree_cache_d *= cache) > && (!lto_stream_offload_p > || (i !=3D TI_VA_LIST_TYPE > && i !=3D TI_VA_LIST_GPR_COUNTER_FIELD > - && i !=3D TI_VA_LIST_FPR_COUNTER_FIELD > - /* Likewise for the _FloatN and _FloatNx types. */ > - && (i < TI_FLOATN_NX_TYPE_FIRST > - || i > TI_FLOATN_NX_TYPE_LAST) > - && (i < TI_COMPLEX_FLOATN_NX_TYPE_FIRST > - || i > /* TODO */ TI_COMPLEX_FLOAT128X_TYPE)))) > + && i !=3D TI_VA_LIST_FPR_COUNTER_FIELD))) > record_common_node (cache, global_trees[i]); > } So I very much like to go forward with this kind of change as well (the assert code should go to a separate helper function). Did you try it on more than just the complex type case? Thanks, Richard. > > Here is again the proposed temporary band-aid patch, which at least > unbreaks trunk for different-architecture offloading (without claiming > support for using the _FloatN, _FloatNx types across the offloading > boundary, admittedly): > >> > commit d2045bd028104be2ede5ae1d5d7b9395e67a8180 >> > Author: Thomas Schwinge >> > Date: Thu Sep 15 21:56:16 2016 +0200 >> > >> > [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _Fl= oatNx types >> > >> > gcc/ >> > PR lto/77458 >> > * tree-streamer.c (preload_common_nodes): Skip _FloatN and >> > _FloatNx types if offloading. >> > --- >> > gcc/tree-streamer.c | 7 ++++++- >> > 1 file changed, 6 insertions(+), 1 deletion(-) >> > >> > diff --git gcc/tree-streamer.c gcc/tree-streamer.c >> > index 7ea7096..061d831 100644 >> > --- gcc/tree-streamer.c >> > +++ gcc/tree-streamer.c >> > @@ -333,7 +333,12 @@ preload_common_nodes (struct streamer_tree_cache_= d *cache) >> > && (!lto_stream_offload_p >> > || (i !=3D TI_VA_LIST_TYPE >> > && i !=3D TI_VA_LIST_GPR_COUNTER_FIELD >> > - && i !=3D TI_VA_LIST_FPR_COUNTER_FIELD))) >> > + && i !=3D TI_VA_LIST_FPR_COUNTER_FIELD >> > + /* Likewise for the _FloatN and _FloatNx types. */ >> > + && (i < TI_FLOATN_NX_TYPE_FIRST >> > + || i > TI_FLOATN_NX_TYPE_LAST) >> > + && (i < TI_COMPLEX_FLOATN_NX_TYPE_FIRST >> > + || i > /* TODO */ TI_COMPLEX_FLOAT128X_TYPE)))) >> > record_common_node (cache, global_trees[i]); >> > } > > > Gr=C3=BC=C3=9Fe > Thomas