From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 55709 invoked by alias); 5 Jul 2016 09:53:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 55694 invoked by uid 89); 5 Jul 2016 09:53:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=prathameshkulkarnilinaroorg, prathamesh.kulkarni@linaro.org, IPA, 82526 X-HELO: mail-io0-f172.google.com Received: from mail-io0-f172.google.com (HELO mail-io0-f172.google.com) (209.85.223.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 05 Jul 2016 09:53:16 +0000 Received: by mail-io0-f172.google.com with SMTP id g13so168493578ioj.1 for ; Tue, 05 Jul 2016 02:53:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=4hU7cbYGTXoQGkpFo03+ItDshafC50D/d3gR4U4+E3k=; b=T2bvutuxiW8bN4xSyy5bozeoCePkSqE+PS7KGhee3pLODR3J7RPcbrIolwZIxdsPiP ZFnH2VFU/dfy3rbvSMKMQeG90Gw2gqA8FEWU9+cjbxt2kDvG0CO6kw1bsc7bxkAFRIek cdhhQqHNuK/dW627QJW+r7EoOl2g+84uDqoHJatV1Yfx1LvIL35P3ztgZb0G+3xDrB5I upNuBP1X9b8akpXb7KucDYqLiauSjJdvs/PQmlTkieY/XQCFPTMJnVwQaMJ5VvEzhn9o ltPBrS6nPOfCeCAXTtv8ssbLji7t6mkR+N2FkF5Ad6jFEr+DJxMNLpuGUU9g/TQnGir3 3SaA== X-Gm-Message-State: ALyK8tIN0QIrCuq1/NxKQNKJOihQv0F89ZXRLei/bcDdU0JSlG4ZuTBZ1Zt5Qp3MZsq+EgmW/mB8192obivd2U2h X-Received: by 10.107.130.195 with SMTP id m64mr14044118ioi.137.1467712393835; Tue, 05 Jul 2016 02:53:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.48.133 with HTTP; Tue, 5 Jul 2016 02:53:13 -0700 (PDT) In-Reply-To: References: <20160603080543.GA78035@kam.mff.cuni.cz> <20160608150855.GB2550@atrey.karlin.mff.cuni.cz> <20160609202322.GB98613@kam.mff.cuni.cz> <20160613104321.GA26957@kam.mff.cuni.cz> From: Prathamesh Kulkarni Date: Tue, 05 Jul 2016 09:53:00 -0000 Message-ID: Subject: Re: move increase_alignment from simple to regular ipa pass To: Jan Hubicka Cc: Richard Biener , David Edelsohn , GCC Patches , "William J. Schmidt" , Segher Boessenkool Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-07/txt/msg00172.txt.bz2 ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html Thanks, Prathamesh On 28 June 2016 at 14:49, Prathamesh Kulkarni wrote: > ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html > > Thanks, > Prathamesh > > On 23 June 2016 at 22:51, Prathamesh Kulkarni > wrote: >> On 17 June 2016 at 19:52, Prathamesh Kulkarni >> wrote: >>> On 14 June 2016 at 18:31, Prathamesh Kulkarni >>> wrote: >>>> On 13 June 2016 at 16:13, Jan Hubicka wrote: >>>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>>> index ecafe63..41ac408 100644 >>>>>> --- a/gcc/cgraph.h >>>>>> +++ b/gcc/cgraph.h >>>>>> @@ -1874,6 +1874,9 @@ public: >>>>>> if we did not do any inter-procedural code movement. */ >>>>>> unsigned used_by_single_function : 1; >>>>>> >>>>>> + /* Set if -fsection-anchors is set. */ >>>>>> + unsigned section_anchor : 1; >>>>>> + >>>>>> private: >>>>>> /* Assemble thunks and aliases associated to varpool node. */ >>>>>> void assemble_aliases (void); >>>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>>> index 4bfcad7..e75d5c0 100644 >>>>>> --- a/gcc/cgraphunit.c >>>>>> +++ b/gcc/cgraphunit.c >>>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>>> it is available to notice_global_symbol. */ >>>>>> node->definition = true; >>>>>> notice_global_symbol (decl); >>>>>> + >>>>>> + node->section_anchor = flag_section_anchors; >>>>>> + >>>>>> if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>>> /* Traditionally we do not eliminate static variables when not >>>>>> optimizing and when not doing toplevel reoder. */ >>>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>>> index f0d7196..e497795 100644 >>>>>> --- a/gcc/common.opt >>>>>> +++ b/gcc/common.opt >>>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>>> Common Joined RejectNegative Enum(ira_algorithm) Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization >>>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>>> >>>>>> +fipa-increase_alignment >>>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>>> +Option to gate increase_alignment ipa pass. >>>>>> + >>>>>> Enum >>>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA algorithm %qs) >>>>>> >>>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) Init(1) Optimization >>>>>> Enable the dependent count heuristic in the scheduler. >>>>>> >>>>>> fsection-anchors >>>>>> -Common Report Var(flag_section_anchors) Optimization >>>>>> +Common Report Var(flag_section_anchors) >>>>>> Access data in the same section from shared anchor points. >>>>>> >>>>>> fsee >>>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>>> index a0db3a4..1482566 100644 >>>>>> --- a/gcc/config/aarch64/aarch64.c >>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>>> >>>>>> aarch64_register_fma_steering (); >>>>>> >>>>>> + /* Enable increase_alignment pass. */ >>>>>> + flag_ipa_increase_alignment = 1; >>>>> >>>>> I would rather enable it always on targets that do support anchors. >>>> AFAIK aarch64 supports section anchors. >>>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c >>>>>> index ce9e146..7f09f3a 100644 >>>>>> --- a/gcc/lto/lto-symtab.c >>>>>> +++ b/gcc/lto/lto-symtab.c >>>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node *entry) >>>>>> The type compatibility checks or the completing of types has properly >>>>>> dealt with most issues. */ >>>>>> >>>>>> + /* ??? is this assert necessary ? */ >>>>>> + varpool_node *v_prevailing = dyn_cast (prevailing); >>>>>> + varpool_node *v_entry = dyn_cast (entry); >>>>>> + gcc_assert (v_prevailing && v_entry); >>>>>> + /* section_anchor of prevailing_decl wins. */ >>>>>> + v_entry->section_anchor = v_prevailing->section_anchor; >>>>>> + >>>>> Other flags are merged in lto_varpool_replace_node so please move this there. >>>> Ah indeed, thanks for the pointers. >>>> I wonder though if we need to set >>>> prevailing_node->section_anchor = vnode->section_anchor ? >>>> IIUC, the function merges flags from vnode into prevailing_node >>>> and removes vnode. However we want prevailing_node->section_anchor >>>> to always take precedence. >>>>>> +/* Return true if alignment should be increased for this vnode. >>>>>> + This is done if every function that references/referring to vnode >>>>>> + has flag_tree_loop_vectorize set. */ >>>>>> + >>>>>> +static bool >>>>>> +increase_alignment_p (varpool_node *vnode) >>>>>> +{ >>>>>> + ipa_ref *ref; >>>>>> + >>>>>> + for (int i = 0; vnode->iterate_reference (i, ref); i++) >>>>>> + if (cgraph_node *cnode = dyn_cast (ref->referred)) >>>>>> + { >>>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>>> + return false; >>>>>> + } >>>>> >>>>> If you take address of function that has vectorizer enabled probably doesn't >>>>> imply need to increase alignment of that var. So please drop the loop. >>>>> >>>>> You only want function that read/writes or takes address of the symbol. But >>>>> onthe other hand, you need to walk all aliases of the symbol by >>>>> call_for_symbol_and_aliases >>>>>> + >>>>>> + for (int i = 0; vnode->iterate_referring (i, ref); i++) >>>>>> + if (cgraph_node *cnode = dyn_cast (ref->referring)) >>>>>> + { >>>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>>> + return false; >>>>>> + } >>>>>> + >>>>>> + return true; >>>>>> +} >>>>>> + >>>>>> /* Entry point to increase_alignment pass. */ >>>>>> static unsigned int >>>>>> increase_alignment (void) >>>>>> @@ -914,9 +942,12 @@ increase_alignment (void) >>>>>> tree decl = vnode->decl; >>>>>> unsigned int alignment; >>>>>> >>>>>> - if ((decl_in_symtab_p (decl) >>>>>> - && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>>> - || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl)) >>>>>> + if (!vnode->section_anchor >>>>>> + || (decl_in_symtab_p (decl) >>>>>> + && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>>> + || DECL_USER_ALIGN (decl) >>>>>> + || DECL_ARTIFICIAL (decl) >>>>>> + || !increase_alignment_p (vnode)) >>>>> >>>>> Incrementally we probably should do more testing whether the variable looks like >>>>> someting that can be vectorized, i.e. it contains array, has address taken or the >>>>> accesses are array accesses within loop. >>>>> This can be done by the analysis phase of the IPA pass inspecting the function >>>>> bodies. >>>> Thanks, I will try to check for array accesses are within a loop in >>>> followup patch. >>>> I was wondering if we could we treat a homogeneous global struct >>>> (having members of one type), >>>> as a global array of that type and increase it's alignment if required ? >>>>> >>>>> I think it is important waste to bump up everything including error messages etc. >>>>> At least on i386 the effect on firefox datasegment of various alignment setting is >>>>> very visible. >>>> Um for a start, would it be OK to check if all functions referencing variable >>>> have attribute noreturn, and in that case we skip increasing the alignment ? >>>> I suppose that error functions would be having attribute noreturn set ? >>>>> >>>>> Looks OK to me otherwise. please send updated patch. >>>> I have done the changes in the attached patch (stage-1 built). >>>> I am not sure what to return from the callback function and >>>> arbitrarily chose to return true. >>> Hi, >>> In this version (stage-1 built), I added read/write summaries which >>> stream those variables >>> which we want to increase alignment for. >>> >>> I have a few questions: >>> >>> a) To check if global var is used inside a loop, I obtained >>> corresponding ipa_ref >>> and checked loop_containing_stmt (ref->stmt), however it returned non-NULL >>> even when ref->stmt was not inside a loop. >>> for instance: >>> int a[32]; >>> int f() { int x = a[0]; return x; } >>> Um how to check if stmt is used inside a loop ? >>> >>> b) Is it necessary during WPA to check if function has >>> flag_tree_vectorize_set since >>> during analysis phase in increase_alignment_generate_summary() I check >>> if cnode has flag_tree_loop_vectorize_set ? >>> >>> c) In LTO_increase_alignment_section, the following is streamed: >>> i) a "count" of type uwhi, to represent number of variables >>> ii) decls corresponding to the variables >>> The variables are then obtained during read_summay using >>> symtab_node::get_create(). >>> I suppose since decls for varpool_nodes would already be streamed in >>> LTO_section_decls, I was wondering if I >>> could somehow refer to the decls in that section to avoid duplicate >>> streaming of decls ? >> Hi, >> In this version, the variable is streamed if it occurs within a loop >> or it's address is taken. >> To check if stmt is inside a loop I am using: >> >> struct loop *l = loop_containing_stmt (ref->stmt); >> if (l != DECL_STRUCT_FUNCTION (cnode->decl)->x_current_loops->tree_root) >> vars->add (vnode); >> Is this correct ? >> >> I came across an unexpected issue in my previous patch with -ffat-lto-objects. >> I am allocating vars = new hash_set () in >> generate_summary() and freeing it in write_summary(). >> However with -ffat-lto-objects, it appears execute() gets called after >> write_summary() >> during LGEN and we hit segfault in execute() at: >> for (hash_set::iterator it = vars.begin (); it != >> vars.end (); it++) >> { ... } >> because write_summary() has freed vars. >> To workaround the issue, I gated call to free vars in write_summary on >> flag_fat_lto_objects, >> I am not sure if that's a good solution. >> >> Cross tested on arm*-*-*, aarch64*-*-*. >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh >>>> >>>> Thanks, >>>> Prathamesh >>>>> >>>>> Honza