From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23066 invoked by alias); 13 Nov 2017 18:47:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22505 invoked by uid 89); 13 Nov 2017 18:47:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=Min, wrapped, 2.69, slowdown X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Nov 2017 18:47:53 +0000 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4B32176534; Mon, 13 Nov 2017 18:47:52 +0000 (UTC) Received: from ovpn-116-218.phx2.redhat.com (ovpn-116-218.phx2.redhat.com [10.3.116.218]) by smtp.corp.redhat.com (Postfix) with ESMTP id 88AE298271; Mon, 13 Nov 2017 18:47:51 +0000 (UTC) Message-ID: <1510598870.15212.31.camel@redhat.com> Subject: Re: [PATCH v2: 00/14] Preserving locations for variable-uses and constants (PR 43486) From: David Malcolm To: Jason Merrill Cc: Nathan Sidwell , Jakub Jelinek , Richard Biener , gcc-patches List Date: Mon, 13 Nov 2017 19:20:00 -0000 In-Reply-To: <1510350329-48956-1-git-send-email-dmalcolm@redhat.com> References: <1510350329-48956-1-git-send-email-dmalcolm@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2017-11/txt/msg01013.txt.bz2 On Fri, 2017-11-10 at 16:45 -0500, David Malcolm wrote: > On Thu, 2017-11-02 at 10:46 -0400, Jason Merrill wrote: > > On Tue, Oct 31, 2017 at 5:09 PM, David Malcolm > > > > wrote: > > > On Tue, 2017-10-24 at 09:53 -0400, Jason Merrill wrote: > > > > On Fri, Oct 20, 2017 at 5:53 PM, David Malcolm > > > .c > > > > om> > > > > wrote: > > > > > Design questions: > > > > > > > > > > * The patch introduces a new kind of tree node, currently > > > > > called > > > > > DECL_WRAPPER_EXPR (although it's used for wrapping > > > > > constants > > > > > as > > > > > well > > > > > as decls). Should wrappers be a new kind of tree node, or > > > > > should > > > > > they > > > > > reuse an existing TREE_CODE? (e.g. NOP_EXPR, CONVERT_EXPR, > > > > > etc). > > > > > * NOP_EXPR: seems to be for use as an rvalue > > > > > * CONVERT_EXPR: for type conversions > > > > > * NON_LVALUE_EXPR: "Value is same as argument, but > > > > > guaranteed > > > > > not an > > > > > lvalue" > > > > > * but we *do* want to support lvalues here > > > > > > > > I think using NON_LVALUE_EXPR for constants would be > > > > appropriate. > > > > > > > > > * VIEW_CONVERT_EXPR: viewing one thing as of a different > > > > > type > > > > > * can it support lvalues? > > > > > > > > Yes, the purpose of VIEW_CONVERT_EXPR is to support lvalues, it > > > > seems > > > > like the right choice. > > > > > > > > Jason > > > > > > Thanks. I've been working on a new version of the patch using > > > those > > > tree codes, but have run into an issue. > > > > > > In g++.dg/conversion/reinterpret1.C: > > > > > > // PR c++/15076 > > > > > > struct Y { Y(int &); }; > > > > > > int v; > > > Y y1(reinterpret_cast(v)); // { dg-error "" } > > > > > > With trunk, this successfully generates an error: > > > > > > reinterpret1.C:6:6: error: cannot bind non-const lvalue > > > reference > > > of type ‘int&’ to an rvalue of type ‘int’ > > > Y y1(reinterpret_cast(v)); // { dg-error "" } > > > ^~~~~~~~~~~~~~~~~~~~~~~~ > > > reinterpret1.C:3:12: note: initializing argument 1 of > > > ‘Y::Y(int&)’ > > > struct Y { Y(int &); }; > > > ^ > > > > > > where internally there's a NON_LVALUE_EXPR around a VAR_DECL, > > > where > > > both have the same type: > > > > > > (gdb) call debug_tree (expr) > > > > > type > > size > > > unit-size > > > align:32 warn_if_not_align:0 symtab:0 alias-set -1 > > > canonical-type 0x7ffff132e5e8 precision:32 min > > 0x7ffff13310d8 -2147483648> max > > 2147483647> > > > pointer_to_this > > > reference_to_this > > > > > > > arg:0 > > 0x7ffff132e5e8 int> > > > used public static tree_1 read SI /home/david/coding- > > > 3/gcc- > > > git-expr-vs- > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:5:5 size > > > unit-size > > 0x7ffff1331138 4> > > > align:32 warn_if_not_align:0 context > > > > > 0x7ffff131e168 /home/david/coding-3/gcc-git-expr-vs- > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C> > > > chain > > 0x7ffff144c150 Y> > > > public decl_2 VOID /home/david/coding-3/gcc-git-expr- > > > vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:3:8 > > > align:8 warn_if_not_align:0 context > > > > > git- > > > expr-vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C> > > > chain >> > > > /home/david/coding-3/gcc-git-expr-vs- > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6 > > > start: > > > /home/david/coding-3/gcc-git-expr-vs- > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6 > > > finish: > > > /home/david/coding-3/gcc-git-expr-vs- > > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:29> > > > > > > The problem is that this reinterpret cast "looks" just like one > > > of > > > my > > > location wrappers. > > > > Your code shouldn't strip a NON_LVALUE_EXPR around a VAR_DECL. > > > I see a similar issue with constants, where with: > > > > > > struct Y { Y(int &); }; > > > Y y1(reinterpret_cast(42)); > > > > > > trunk generates an error like the above, but my code handles the > > > NON_LVALUE_EXPR(INTEGER_CST(42)) > > > as if it were a location wrapper around the INTEGER_CST, and thus > > > doesn't emit the error. > > > > Why doesn't it emit the error? We should get the same error > > whether > > or not we strip the wrapper. > > Thanks: my stripping macro was over-zealous: it was stripping any > NON_LVALUE_EXPR or VIEW_CONVERT_EXPR where the type matched that of > the wrapped node. I've added the additional condition that a > NON_LVALUE_EXPR has to be around a CONSTANT_CLASS_P, and > a VIEW_CONVERT_EXPR around a !CONSTANT_CLASS_P. > > Here's an updated version of the patch (v2), now a patch kit (on top > of r254387). I split it up thematically for ease of review, but all > the patches go together. > > This version of the patch kit bootstraps and passes the regression > tests (on x86_64-pc-linux-gnu) > > To do so, I've made some simplfications to how wrappers nodes are > added. > > The previous patch added wrappers in the C++ parser around constants > and uses-of-declarations, along with some other places in the > parser (typeid, alignof, sizeof, offsetof). > > This version takes a much more minimal approach: it only adds > location wrapper nodes around the arguments at callsites, thus > not adding wrapper nodes around uses of constants and decls in other > locations. > > It keeps them for the other places in the parser (typeid, alignof, > sizeof, offsetof). > > In addition, for now, each site that adds wrapper nodes is guarded > with !processing_template_decl, suppressing the creation of wrapper > nodes when processing template declarations. This is to simplify > the patch kit so that we don't have to support wrapper nodes during > template expansion. > > With this, we get a big usability win: we always have a location_t > for every argument at a callsite, and so various errors involving > mismatching arguments are much easier to read (as the pertinent > argument is underlined). > > I marked this as "PR 43486" as it's a big step towards solving that > PR (which seeks to preserve locations all the way through to the > middle end), but much more would need to be done to solve it: > > * this patch kit only adds wrappers to the C++ frontend, not to C > > * as noted above, location_t wrapper nodes are only added at > arguments of callsites (and a few other places), with some > restrictions. > > * none of the wrapper nodes survive past gimplification; > it's not clear to me how best to preserve them into the > middle-end. But even without doing so, we get the big usability > win. > > The later parts of the patch kit add STRIP_ANY_LOCATION_WRAPPER uses > in various places where the tree code of an expression is examined, > so that such conditions use the code of the wrapped node, rather > than that of the wrapper. One of the risks of the patch kit is that > although the testsuite passes, there are probably places in our code > which still need uses of STRIP_ANY_LOCATION_WRAPPER. > > > Performance of the patch kit > **************************** > > Benchmarking shows an apparent 2-3% in cc1plus wallclock compile-time > for kdecore.cc -O3 -g: > > Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu: > wall > control: [56.55, 56.54, 56.68, 56.51, 56.45, 56.5, 56.45, > 56.46, 56.49, 56.5, 56.42, 56.37, 56.41, 56.55] > experiment: [57.32, 58.37, 58.17, 58.18, 58.78, 58.48, 57.99, > 58.16, 58.14, 57.62, 58.36, 58.1, 57.71, 57.7] > Min: 56.370000 -> 57.320000: 1.02x slower > Avg: 56.491429 -> 58.077143: 1.03x slower > Significant (t=-15.14) > Stddev: 0.07655 -> 0.38426: 5.0199x larger > > although I'm not quite sure why; the difference is in "phase setup", > which > gains a large "wall" value (but not in usr or sys), whereas "phase > parsing" > is unaffected: > > unpatched: > > phase setup : 0.00 ( 0%) usr 0.02 ( 0%) sys 0.02 ( > 0%) wall 1488 kB ( 0%) ggc > phase parsing : 1.83 ( 4%) usr 0.82 (16%) sys 2.66 ( > 5%) wall 156603 kB (12%) ggc > phase lang. deferred : 0.30 ( 1%) usr 0.09 ( 2%) sys 0.38 ( > 1%) wall 29861 kB ( 2%) ggc > phase opt and generate : 48.09 (94%) usr 4.28 (81%) sys 52.55 > (93%) wall 1085420 kB (84%) ggc > phase last asm : 0.86 ( 2%) usr 0.05 ( 1%) sys 0.92 ( > 2%) wall 15806 kB ( 1%) ggc > phase finalize : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.01 ( > 0%) wall 0 kB ( 0%) ggc > |name lookup : 0.30 ( 1%) usr 0.13 ( 2%) sys 0.21 ( > 0%) wall 4955 kB ( 0%) ggc > |overload resolution : 0.49 ( 1%) usr 0.06 ( 1%) sys 0.68 ( > 1%) wall 27263 kB ( 2%) ggc > garbage collection : 0.77 ( 2%) usr 0.00 ( 0%) sys 0.77 ( > 1%) wall 0 kB ( 0%) ggc > [...] > TOTAL : 51.08 5.27 56.54 > 1289189 kB > > patched: > phase setup : 0.01 ( 0%) usr 0.03 ( 1%) sys 1.91 ( > 3%) wall 1488 kB ( 0%) ggc > phase parsing : 1.80 ( 4%) usr 0.84 (16%) sys 2.66 ( > 5%) wall 158066 kB (12%) ggc > phase lang. deferred : 0.29 ( 1%) usr 0.09 ( 2%) sys 0.39 ( > 1%) wall 29861 kB ( 2%) ggc > phase opt and generate : 48.09 (94%) usr 4.27 (81%) sys 52.52 > (90%) wall 1085428 kB (84%) ggc > phase last asm : 0.82 ( 2%) usr 0.05 ( 1%) sys 0.89 ( > 2%) wall 15806 kB ( 1%) ggc > phase finalize : 0.00 ( 0%) usr 0.01 ( 0%) sys 0.00 ( > 0%) wall 0 kB ( 0%) ggc > |name lookup : 0.29 ( 1%) usr 0.07 ( 1%) sys 0.36 ( > 1%) wall 4955 kB ( 0%) ggc > |overload resolution : 0.51 ( 1%) usr 0.07 ( 1%) sys 0.50 ( > 1%) wall 27296 kB ( 2%) ggc > garbage collection : 0.79 ( 2%) usr 0.00 ( 0%) sys 0.78 ( > 1%) wall 0 kB ( 0%) ggc > TOTAL : 51.01 5.29 58.37 > 1290660 kB > > What's up with that "phase setup" change? Did I mess up my testing > somehow? I had used taskset to pin everything onto one processor. On using "echo 3 > /proc/sys/vm/drop_caches" and rerunning, after a warmup, the slowdown moved to the unpatched binary, with the patched binary becoming faster. My theory is that this is NUMA-related, and that the buffer cache's copy of the binaries can end up on the "wrong" NUMA node. Depending on which I run first, I can make the patched or unpatched binary faster (based on this slowdown of "wall" seen in "phase setup"). Speed differences in the "parsing" phase are lost in the noise e.g.: Compilation of cp-stdlib.cc at -O3 with -g for x86_64-pc-linux-gnu: phase parsing: usr control: [2.2, 2.25, 2.27, 2.25, 2.23, 2.22, 2.37, 2.23, 2.22, 2.23, 2.25, 2.21, 2.21, 2.26, 2.23, 2.26, 2.26, 2.23, 2.22, 2.25] experiment: [2.22, 2.26, 2.26, 2.2, 2.25, 2.2, 2.21, 2.17, 2.25, 2.23, 2.21, 2.21, 2.2, 2.2, 2.25, 2.28, 2.19, 2.2, 2.24, 2.24] Min: 2.200000 -> 2.170000: 1.01x faster Avg: 2.242500 -> 2.223500: 1.01x faster Stddev: 0.03582 -> 0.02889: 1.2399x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.17,3.37&chco=FF0000,0000FF&chdl=control|experiment&chds=1.17,3.37&chd=t:2.2,2.25,2.27,2.25,2.23,2.22,2.37,2.23,2.22,2.23,2.25,2.21,2.21,2.26,2.23,2.26,2.26,2.23,2.22,2.25|2.22,2.26,2.26,2.2,2.25,2.2,2.21,2.17,2.25,2.23,2.21,2.21,2.2,2.2,2.25,2.28,2.19,2.2,2.24,2.24&chxl=0:|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+cp-stdlib.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+phase+parsing:+usr Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu: phase parsing: usr control: [2.66, 2.61, 2.65, 2.69, 2.62, 2.67, 2.63, 2.6, 2.68, 2.67, 2.64, 2.67, 2.66, 2.67, 2.65, 2.64, 2.69, 2.68, 2.67, 2.67] experiment: [2.64, 2.66, 2.7, 2.69, 2.71, 2.67, 2.67, 2.73, 2.68, 2.64, 2.66, 2.71, 2.66, 2.68, 2.66, 2.66, 2.66, 2.72, 2.64, 2.68] Min: 2.600000 -> 2.640000: 1.02x slower Avg: 2.656000 -> 2.676000: 1.01x slower Stddev: 0.02563 -> 0.02664: 1.0393x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.6,3.73&chco=FF0000,0000FF&chdl=control|experiment&chds=1.6,3.73&chd=t:2.66,2.61,2.65,2.69,2.62,2.67,2.63,2.6,2.68,2.67,2.64,2.67,2.66,2.67,2.65,2.64,2.69,2.68,2.67,2.67|2.64,2.66,2.7,2.69,2.71,2.67,2.67,2.73,2.68,2.64,2.66,2.71,2.66,2.68,2.66,2.66,2.66,2.72,2.64,2.68&chxl=0:|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|2:||Iteration|3:||Time+(secs)&chtt=Compilation+of+kdecore.cc+at+-O3+with+-g+for+x86_64-pc-linux-gnu:+phase+parsing:+usr > ...and a slight increase in GGC usage: > > Compilation of kdecore.cc at -O3 with -g for x86_64-pc-linux-gnu: ggc > control: [1289179.0, 1289189.0, 1289170.0, 1289186.0, > 1289194.0, 1289172.0, 1289176.0, 1289192.0, 1289189.0, 1289179.0, > 1289172.0, 1289190.0, 1289169.0, 1289185.0] > experiment: [1290654.0, 1290660.0, 1290655.0, 1290659.0, > 1290631.0, 1290655.0, 1290650.0, 1290652.0, 1290642.0, 1290650.0, > 1290658.0, 1290662.0, 1290638.0, 1290655.0] > Mem max: 1289194.000 -> 1290662.000: 1.0011x larger > > (this is with stripped binaries, and --enable-checking=release) > > Testing with a simple file that includes all of the C++ standard > library > (but doesn't do anything with it) shows no change in time, and GGC > usage > in the parsing phase increasing 50kB from 149954 kB to 150004 kB. > > > Next steps? > *********** > > I'm working on extending the patch kit to add wrapper nodes at all > constants and uses-of-decls in the C++ parser, but of course this > could have some effect on time/memory, and require more uses of > STRIP_ANY_LOCATION_WRAPPER. > > An alternative appoach is: > > "[PATCH] C++: use an optional vec for callsites" > https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01392.html > > which eschews wrapper nodes in favor of duplicating the C frontend's > workaround here (though Jason disliked this approach, when we > discussed > it at Cauldron). > > Thoughts? > > (I'm keen on getting *some* solution for providing location_t values > for arguments at C++ callsites into gcc 8) > > > David Malcolm (14): > C++: preserve locations within build_address > Support for adding and stripping location_t wrapper nodes > C++: add location_t wrapper nodes during parsing (minimal impl) > Update testsuite to show improvements > tree.c: strip location wrappers from integer_zerop etc > Fix Wsizeof-pointer-memaccess*.c > reject_gcc_builtin: strip any location wrappers > cp/tree.c: strip location wrappers in lvalue_kind > Strip location wrappers in null_ptr_cst_p > warn_for_memset: handle location wrappers > Handle location wrappers in string_conv_p > C++: introduce null_node_p > c-format.c: handle location wrappers > pp_c_cast_expression: don't print casts for location wrappers > > gcc/c-family/c-common.c | 3 + > gcc/c-family/c-common.h | 1 + > gcc/c-family/c-format.c | 9 +- > gcc/c-family/c-pretty-print.c | 66 +++++- > gcc/c-family/c-warn.c | 8 + > gcc/cp/call.c | 6 +- > gcc/cp/cp-tree.h | 13 ++ > gcc/cp/cvt.c | 2 +- > gcc/cp/error.c | 2 +- > gcc/cp/except.c | 2 +- > gcc/cp/parser.c | 30 ++- > gcc/cp/tree.c | 2 + > gcc/cp/typeck.c | 6 +- > .../g++.dg/diagnostic/param-type-mismatch.C | 27 +-- > .../g++.dg/plugin/diagnostic-test-expressions-1.C | 260 > +++++++++++++-------- > gcc/tree.c | 59 +++++ > gcc/tree.h | 26 +++ > 17 files changed, 392 insertions(+), 130 deletions(-) >