From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 69699 invoked by alias); 25 Sep 2017 18:13:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 69671 invoked by uid 89); 25 Sep 2017 18:13:44 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.7 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=ham version=3.3.2 spammy=grateful, hehe, Hubicka, H*Ad:U*rguenther X-HELO: mail-wm0-f49.google.com Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 25 Sep 2017 18:13:41 +0000 Received: by mail-wm0-f49.google.com with SMTP id m72so182382wmc.1 for ; Mon, 25 Sep 2017 11:13:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=RlLnGom2OsQ2NkhuwVhYXQdtJukFXAEEVZAPJBavXks=; b=L+wIQPPd0vjBz6Lw9MNBpSP6ptcDhDtkotcY4tllopAcWCR7JDxREvyYl6VgM1nFCc msn/yZjUHeg0BqXoz7cxnlCo6J+ou/MSnE2/Z/ehhBiyb8X0hQWMTawGhUy6zDcCc0yH 0rWTJbqCJSIuMHr60Oa6i8PDyb9+pyvsI75l02QGd8FdJsBlkKhIvTdlTUFvCBwMsHlh zi3sHJB5jIq+5CKTraIOO0OTJvx4/j1OGrjsBJlhPkRsxn8a5k25CwHlg2xr0ZA+46HV hPT2Y3pmFsQaHGDME6s+kLCelVt9FXefjf+aZgynoQnQV0LvUKZramxqtFL2aXkZtnRZ sjtA== X-Gm-Message-State: AHPjjUjE7sEgNRlcuU9qiJfmZVfTM2CLn7FJ9kqI4t8Hu38gJBUDQ7YO 6fjjStg8VI4H0nVSjnWbDwF+fJ1H3GDIKp5PKqOaDw== X-Google-Smtp-Source: AOwi7QDuHBDCnmroCaSBSmyK/CwRPkzZUh/Pp0QM8cS1bboZ/5/PlBH9wGhSBMLaoT7CWOSNk3X+2jsLvj0gb6wkzoM= X-Received: by 10.28.197.133 with SMTP id v127mr1160739wmf.52.1506363218605; Mon, 25 Sep 2017 11:13:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.130.110 with HTTP; Mon, 25 Sep 2017 11:13:37 -0700 (PDT) In-Reply-To: References: <20170519133212.GB36419@kam.mff.cuni.cz> From: Prathamesh Kulkarni Date: Mon, 25 Sep 2017 18:13:00 -0000 Message-ID: Subject: Re: [RFC] propagate malloc attribute in ipa-pure-const pass To: Jan Hubicka Cc: gcc Patches , Richard Biener Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2017-09/txt/msg01678.txt.bz2 On 15 September 2017 at 17:49, Prathamesh Kulkarni wrote: > On 1 September 2017 at 08:09, Prathamesh Kulkarni > wrote: >> On 17 August 2017 at 18:02, Prathamesh Kulkarni >> wrote: >>> On 8 August 2017 at 09:50, Prathamesh Kulkarni >>> wrote: >>>> On 31 July 2017 at 23:53, Prathamesh Kulkarni >>>> wrote: >>>>> On 23 May 2017 at 19:10, Prathamesh Kulkarni >>>>> wrote: >>>>>> On 19 May 2017 at 19:02, Jan Hubicka wrote: >>>>>>>> >>>>>>>> * LTO and memory management >>>>>>>> This is a general question about LTO and memory management. >>>>>>>> IIUC the following sequence takes place during normal LTO: >>>>>>>> LGEN: generate_summary, write_summary >>>>>>>> WPA: read_summary, execute ipa passes, write_opt_summary >>>>>>>> >>>>>>>> So I assumed it was OK in LGEN to allocate return_callees_map in >>>>>>>> generate_summary and free it in write_summary and during WPA, allocate >>>>>>>> return_callees_map in read_summary and free it after execute (since >>>>>>>> write_opt_summary does not require return_callees_map). >>>>>>>> >>>>>>>> However with fat LTO, it seems the sequence changes for LGEN with >>>>>>>> execute phase takes place after write_summary. However since >>>>>>>> return_callees_map is freed in pure_const_write_summary and >>>>>>>> propagate_malloc() accesses it in execute stage, it results in >>>>>>>> segmentation fault. >>>>>>>> >>>>>>>> To work around this, I am using the following hack in pure_const_write_summary: >>>>>>>> // FIXME: Do not free if -ffat-lto-objects is enabled. >>>>>>>> if (!global_options.x_flag_fat_lto_objects) >>>>>>>> free_return_callees_map (); >>>>>>>> Is there a better approach for handling this ? >>>>>>> >>>>>>> I think most passes just do not free summaries with -flto. We probably want >>>>>>> to fix it to make it possible to compile multiple units i.e. from plugin by >>>>>>> adding release_summaries method... >>>>>>> So I would say it is OK to do the same as others do and leak it with -flto. >>>>>>>> diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c >>>>>>>> index e457166ea39..724c26e03f6 100644 >>>>>>>> --- a/gcc/ipa-pure-const.c >>>>>>>> +++ b/gcc/ipa-pure-const.c >>>>>>>> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see >>>>>>>> #include "tree-scalar-evolution.h" >>>>>>>> #include "intl.h" >>>>>>>> #include "opts.h" >>>>>>>> +#include "ssa.h" >>>>>>>> >>>>>>>> /* Lattice values for const and pure functions. Everything starts out >>>>>>>> being const, then may drop to pure and then neither depending on >>>>>>>> @@ -69,6 +70,15 @@ enum pure_const_state_e >>>>>>>> >>>>>>>> const char *pure_const_names[3] = {"const", "pure", "neither"}; >>>>>>>> >>>>>>>> +enum malloc_state_e >>>>>>>> +{ >>>>>>>> + PURE_CONST_MALLOC_TOP, >>>>>>>> + PURE_CONST_MALLOC, >>>>>>>> + PURE_CONST_MALLOC_BOTTOM >>>>>>>> +}; >>>>>>> >>>>>>> It took me a while to work out what PURE_CONST means here :) >>>>>>> I would just call it something like STATE_MALLOC_TOP... or so. >>>>>>> ipa_pure_const is outdated name from the time pass was doing only >>>>>>> those two. >>>>>>>> @@ -109,6 +121,10 @@ typedef struct funct_state_d * funct_state; >>>>>>>> >>>>>>>> static vec funct_state_vec; >>>>>>>> >>>>>>>> +/* A map from node to subset of callees. The subset contains those callees >>>>>>>> + * whose return-value is returned by the node. */ >>>>>>>> +static hash_map< cgraph_node *, vec* > *return_callees_map; >>>>>>>> + >>>>>>> >>>>>>> Hehe, a special case of return jump function. We ought to support those more generally. >>>>>>> How do you keep it up to date over callgraph changes? >>>>>>>> @@ -921,6 +1055,23 @@ end: >>>>>>>> if (TREE_NOTHROW (decl)) >>>>>>>> l->can_throw = false; >>>>>>>> >>>>>>>> + if (ipa) >>>>>>>> + { >>>>>>>> + vec v = vNULL; >>>>>>>> + l->malloc_state = PURE_CONST_MALLOC_BOTTOM; >>>>>>>> + if (DECL_IS_MALLOC (decl)) >>>>>>>> + l->malloc_state = PURE_CONST_MALLOC; >>>>>>>> + else if (malloc_candidate_p (DECL_STRUCT_FUNCTION (decl), v)) >>>>>>>> + { >>>>>>>> + l->malloc_state = PURE_CONST_MALLOC_TOP; >>>>>>>> + vec *callees_p = new vec (vNULL); >>>>>>>> + for (unsigned i = 0; i < v.length (); ++i) >>>>>>>> + callees_p->safe_push (v[i]); >>>>>>>> + return_callees_map->put (fn, callees_p); >>>>>>>> + } >>>>>>>> + v.release (); >>>>>>>> + } >>>>>>>> + >>>>>>> >>>>>>> I would do non-ipa variant, too. I think most attributes can be detected that way >>>>>>> as well. >>>>>>> >>>>>>> The patch generally makes sense to me. It would be nice to make it easier to write such >>>>>>> a basic propagators across callgraph (perhaps adding a template doing the basic >>>>>>> propagation logic). Also I think you need to solve the problem with keeping your >>>>>>> summaries up to date across callgraph node removal and duplications. >>>>>> Thanks for the suggestions, I will try to address them in a follow-up patch. >>>>>> IIUC, I would need to modify ipa-pure-const cgraph hooks - >>>>>> add_new_function, remove_node_data, duplicate_node_data >>>>>> to keep return_callees_map up-to-date across callgraph node insertions >>>>>> and removal ? >>>>>> >>>>>> Also, if instead of having a separate data-structure like return_callees_map, >>>>>> should we rather have a flag within cgraph_edge, which marks that the >>>>>> caller may return the value of the callee ? >>>>> Hi, >>>>> Sorry for the very late response. I have attached an updated version >>>>> of the prototype patch, >>>>> which adds a non-ipa variant, and keeps return_callees_map up-to-date >>>>> across callgraph >>>>> node insertions and removal. For the non-ipa variant, >>>>> malloc_candidate_p() additionally checks >>>>> that all the "return callees" have DECL_IS_MALLOC set to true. >>>>> Bootstrapped+tested and LTO bootstrapped+tested on x86_64-unknown-linux-gnu. >>>>> Does it look OK so far ? >>>>> >>>>> Um sorry for this silly question, but I don't really understand how >>>>> does indirect call propagation >>>>> work in ipa-pure-const ? For example consider propagation of nothrow >>>>> attribute in following >>>>> test-case: >>>>> >>>>> __attribute__((noinline, noclone, nothrow)) >>>>> int f1(int k) { return k; } >>>>> >>>>> __attribute__((noinline, noclone)) >>>>> static int foo(int (*p)(int)) >>>>> { >>>>> return p(10); >>>>> } >>>>> >>>>> __attribute__((noinline, noclone)) >>>>> int bar(void) >>>>> { >>>>> return foo(f1); >>>>> } >>>>> >>>>> Shouldn't foo and bar be also marked as nothrow ? >>>>> Since foo indirectly calls f1 which is nothrow and bar only calls foo ? >>>>> The local-pure-const2 dump shows function is locally throwing for >>>>> "foo" and "bar". >>>>> >>>>> Um, I was wondering how to get "points-to" analysis for function-pointers, >>>>> to get list of callees that may be indirectly called from that >>>>> function pointer ? >>>>> In the patch I just set node to bottom if it contains indirect calls >>>>> which is far from ideal :( >>>>> I would be much grateful for suggestions on how to handle indirect calls. >>>>> Thanks! >>>> ping https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html >>> ping * 2 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html >> ping * 3 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html > ping * 4 https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html Hi Honza, Could you please have a look at this patch ? https://gcc.gnu.org/ml/gcc-patches/2017-07/msg02063.html I tested it with SPEC2006 on AArch64 Cortex-a57 processor and saw some improvement for 433.milc (+1.79%), 437.leslie3d (+2.84%) and 470.lbm (+4%) and not much differences for other benchmarks. I don't expect them to be precise though, it was run with only one iteration of SPEC. Thanks! Regards, Prathamesh > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh >>>> >>>> Thanks, >>>> Prathamesh >>>>> >>>>> Regards, >>>>> Prathamesh >>>>>> >>>>>> Thanks, >>>>>> Prathamesh >>>>>>> >>>>>>> Honza