From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 48929 invoked by alias); 28 Jun 2016 09:19:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 48843 invoked by uid 89); 28 Jun 2016 09:19:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=movement, wins, ira, aarch64.c X-HELO: mail-it0-f48.google.com Received: from mail-it0-f48.google.com (HELO mail-it0-f48.google.com) (209.85.214.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 28 Jun 2016 09:19:36 +0000 Received: by mail-it0-f48.google.com with SMTP id f6so19995537ith.0 for ; Tue, 28 Jun 2016 02:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=WpO9Q4WWcRZs/xquGNb9k+5cHWFTn/kSaewgqwKwmjA=; b=J7Tkzv7U9JpvOQTGAU0kkNDSCr+zXDMZPJcCanWYifDXZWPKfJr5IyvYrsc6Tj/DwQ P9tHpy60GEYsbo7S/Y8gLoBZODQMtWGn76mt6HR7Xkv4g0k1ubjN49bMjbHzu6LjV0K6 WRlmSLyZeOSQLvVDuUs3dFPenvWHTjKEPN5ACxuxpshIu/EZwtVOlA+FXw5C7zMzRcEQ dsgYe0JgcMQZyGnTfaI8xLwWTJjzFSRmd+GG3xXxdB48EdvbwghaFQ/1S1F1Hq8kIx7o fBzMlST84yxMFE+3gb2EBJi0dAcfayYAXZCvvPel1RgBYdp2XVJ4xiPY2NUa+pL2hg9J 3Zuw== X-Gm-Message-State: ALyK8tLcb/6sKtsjjqxYDDX3mrtiVKHcI0X0rOkdKs9zGieQTZqeUKDNU3VGiigJvI77FAPIBlIXDDxULFKZ1hR4 X-Received: by 10.36.123.75 with SMTP id q72mr12538714itc.44.1467105574474; Tue, 28 Jun 2016 02:19:34 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.48.133 with HTTP; Tue, 28 Jun 2016 02:19:33 -0700 (PDT) In-Reply-To: References: <20160603080543.GA78035@kam.mff.cuni.cz> <20160608150855.GB2550@atrey.karlin.mff.cuni.cz> <20160609202322.GB98613@kam.mff.cuni.cz> <20160613104321.GA26957@kam.mff.cuni.cz> From: Prathamesh Kulkarni Date: Tue, 28 Jun 2016 09:27:00 -0000 Message-ID: Subject: Re: move increase_alignment from simple to regular ipa pass To: Jan Hubicka Cc: Richard Biener , David Edelsohn , GCC Patches , "William J. Schmidt" , Segher Boessenkool Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg01865.txt.bz2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html Thanks, Prathamesh On 23 June 2016 at 22:51, Prathamesh Kulkarni wrote: > On 17 June 2016 at 19:52, Prathamesh Kulkarni > wrote: >> On 14 June 2016 at 18:31, Prathamesh Kulkarni >> wrote: >>> On 13 June 2016 at 16:13, Jan Hubicka wrote: >>>>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h >>>>> index ecafe63..41ac408 100644 >>>>> --- a/gcc/cgraph.h >>>>> +++ b/gcc/cgraph.h >>>>> @@ -1874,6 +1874,9 @@ public: >>>>> if we did not do any inter-procedural code movement. */ >>>>> unsigned used_by_single_function : 1; >>>>> >>>>> + /* Set if -fsection-anchors is set. */ >>>>> + unsigned section_anchor : 1; >>>>> + >>>>> private: >>>>> /* Assemble thunks and aliases associated to varpool node. */ >>>>> void assemble_aliases (void); >>>>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c >>>>> index 4bfcad7..e75d5c0 100644 >>>>> --- a/gcc/cgraphunit.c >>>>> +++ b/gcc/cgraphunit.c >>>>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl) >>>>> it is available to notice_global_symbol. */ >>>>> node->definition = true; >>>>> notice_global_symbol (decl); >>>>> + >>>>> + node->section_anchor = flag_section_anchors; >>>>> + >>>>> if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl) >>>>> /* Traditionally we do not eliminate static variables when not >>>>> optimizing and when not doing toplevel reoder. */ >>>>> diff --git a/gcc/common.opt b/gcc/common.opt >>>>> index f0d7196..e497795 100644 >>>>> --- a/gcc/common.opt >>>>> +++ b/gcc/common.opt >>>>> @@ -1590,6 +1590,10 @@ fira-algorithm= >>>>> Common Joined RejectNegative Enum(ira_algorithm) Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization >>>>> -fira-algorithm=[CB|priority] Set the used IRA algorithm. >>>>> >>>>> +fipa-increase_alignment >>>>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization >>>>> +Option to gate increase_alignment ipa pass. >>>>> + >>>>> Enum >>>>> Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA algorithm %qs) >>>>> >>>>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) Init(1) Optimization >>>>> Enable the dependent count heuristic in the scheduler. >>>>> >>>>> fsection-anchors >>>>> -Common Report Var(flag_section_anchors) Optimization >>>>> +Common Report Var(flag_section_anchors) >>>>> Access data in the same section from shared anchor points. >>>>> >>>>> fsee >>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>>>> index a0db3a4..1482566 100644 >>>>> --- a/gcc/config/aarch64/aarch64.c >>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void) >>>>> >>>>> aarch64_register_fma_steering (); >>>>> >>>>> + /* Enable increase_alignment pass. */ >>>>> + flag_ipa_increase_alignment = 1; >>>> >>>> I would rather enable it always on targets that do support anchors. >>> AFAIK aarch64 supports section anchors. >>>>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c >>>>> index ce9e146..7f09f3a 100644 >>>>> --- a/gcc/lto/lto-symtab.c >>>>> +++ b/gcc/lto/lto-symtab.c >>>>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, symtab_node *entry) >>>>> The type compatibility checks or the completing of types has properly >>>>> dealt with most issues. */ >>>>> >>>>> + /* ??? is this assert necessary ? */ >>>>> + varpool_node *v_prevailing = dyn_cast (prevailing); >>>>> + varpool_node *v_entry = dyn_cast (entry); >>>>> + gcc_assert (v_prevailing && v_entry); >>>>> + /* section_anchor of prevailing_decl wins. */ >>>>> + v_entry->section_anchor = v_prevailing->section_anchor; >>>>> + >>>> Other flags are merged in lto_varpool_replace_node so please move this there. >>> Ah indeed, thanks for the pointers. >>> I wonder though if we need to set >>> prevailing_node->section_anchor = vnode->section_anchor ? >>> IIUC, the function merges flags from vnode into prevailing_node >>> and removes vnode. However we want prevailing_node->section_anchor >>> to always take precedence. >>>>> +/* Return true if alignment should be increased for this vnode. >>>>> + This is done if every function that references/referring to vnode >>>>> + has flag_tree_loop_vectorize set. */ >>>>> + >>>>> +static bool >>>>> +increase_alignment_p (varpool_node *vnode) >>>>> +{ >>>>> + ipa_ref *ref; >>>>> + >>>>> + for (int i = 0; vnode->iterate_reference (i, ref); i++) >>>>> + if (cgraph_node *cnode = dyn_cast (ref->referred)) >>>>> + { >>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>> + return false; >>>>> + } >>>> >>>> If you take address of function that has vectorizer enabled probably doesn't >>>> imply need to increase alignment of that var. So please drop the loop. >>>> >>>> You only want function that read/writes or takes address of the symbol. But >>>> onthe other hand, you need to walk all aliases of the symbol by >>>> call_for_symbol_and_aliases >>>>> + >>>>> + for (int i = 0; vnode->iterate_referring (i, ref); i++) >>>>> + if (cgraph_node *cnode = dyn_cast (ref->referring)) >>>>> + { >>>>> + struct cl_optimization *opts = opts_for_fn (cnode->decl); >>>>> + if (!opts->x_flag_tree_loop_vectorize) >>>>> + return false; >>>>> + } >>>>> + >>>>> + return true; >>>>> +} >>>>> + >>>>> /* Entry point to increase_alignment pass. */ >>>>> static unsigned int >>>>> increase_alignment (void) >>>>> @@ -914,9 +942,12 @@ increase_alignment (void) >>>>> tree decl = vnode->decl; >>>>> unsigned int alignment; >>>>> >>>>> - if ((decl_in_symtab_p (decl) >>>>> - && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>> - || DECL_USER_ALIGN (decl) || DECL_ARTIFICIAL (decl)) >>>>> + if (!vnode->section_anchor >>>>> + || (decl_in_symtab_p (decl) >>>>> + && !symtab_node::get (decl)->can_increase_alignment_p ()) >>>>> + || DECL_USER_ALIGN (decl) >>>>> + || DECL_ARTIFICIAL (decl) >>>>> + || !increase_alignment_p (vnode)) >>>> >>>> Incrementally we probably should do more testing whether the variable looks like >>>> someting that can be vectorized, i.e. it contains array, has address taken or the >>>> accesses are array accesses within loop. >>>> This can be done by the analysis phase of the IPA pass inspecting the function >>>> bodies. >>> Thanks, I will try to check for array accesses are within a loop in >>> followup patch. >>> I was wondering if we could we treat a homogeneous global struct >>> (having members of one type), >>> as a global array of that type and increase it's alignment if required ? >>>> >>>> I think it is important waste to bump up everything including error messages etc. >>>> At least on i386 the effect on firefox datasegment of various alignment setting is >>>> very visible. >>> Um for a start, would it be OK to check if all functions referencing variable >>> have attribute noreturn, and in that case we skip increasing the alignment ? >>> I suppose that error functions would be having attribute noreturn set ? >>>> >>>> Looks OK to me otherwise. please send updated patch. >>> I have done the changes in the attached patch (stage-1 built). >>> I am not sure what to return from the callback function and >>> arbitrarily chose to return true. >> Hi, >> In this version (stage-1 built), I added read/write summaries which >> stream those variables >> which we want to increase alignment for. >> >> I have a few questions: >> >> a) To check if global var is used inside a loop, I obtained >> corresponding ipa_ref >> and checked loop_containing_stmt (ref->stmt), however it returned non-NULL >> even when ref->stmt was not inside a loop. >> for instance: >> int a[32]; >> int f() { int x = a[0]; return x; } >> Um how to check if stmt is used inside a loop ? >> >> b) Is it necessary during WPA to check if function has >> flag_tree_vectorize_set since >> during analysis phase in increase_alignment_generate_summary() I check >> if cnode has flag_tree_loop_vectorize_set ? >> >> c) In LTO_increase_alignment_section, the following is streamed: >> i) a "count" of type uwhi, to represent number of variables >> ii) decls corresponding to the variables >> The variables are then obtained during read_summay using >> symtab_node::get_create(). >> I suppose since decls for varpool_nodes would already be streamed in >> LTO_section_decls, I was wondering if I >> could somehow refer to the decls in that section to avoid duplicate >> streaming of decls ? > Hi, > In this version, the variable is streamed if it occurs within a loop > or it's address is taken. > To check if stmt is inside a loop I am using: > > struct loop *l = loop_containing_stmt (ref->stmt); > if (l != DECL_STRUCT_FUNCTION (cnode->decl)->x_current_loops->tree_root) > vars->add (vnode); > Is this correct ? > > I came across an unexpected issue in my previous patch with -ffat-lto-objects. > I am allocating vars = new hash_set () in > generate_summary() and freeing it in write_summary(). > However with -ffat-lto-objects, it appears execute() gets called after > write_summary() > during LGEN and we hit segfault in execute() at: > for (hash_set::iterator it = vars.begin (); it != > vars.end (); it++) > { ... } > because write_summary() has freed vars. > To workaround the issue, I gated call to free vars in write_summary on > flag_fat_lto_objects, > I am not sure if that's a good solution. > > Cross tested on arm*-*-*, aarch64*-*-*. > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Prathamesh >>>> >>>> Honza