From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 96586 invoked by alias); 16 May 2016 22:27:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 96487 invoked by uid 89); 16 May 2016 22:27:47 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=speeds, pressure X-HELO: mail-vk0-f47.google.com Received: from mail-vk0-f47.google.com (HELO mail-vk0-f47.google.com) (209.85.213.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 16 May 2016 22:27:40 +0000 Received: by mail-vk0-f47.google.com with SMTP id c189so78363244vkb.1 for ; Mon, 16 May 2016 15:27:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=Oc6gftk3saq1CM6nvAlBHaLs9WnEWcRoafZCqMSt+Zg=; b=mF0GbkIvxsoyVorS4XmFljgMO7rXwGgr98Ov8fEumBxlDtTw0okogc2fjU4IwQZ5Lh JCn1NTihTZTRZc+ALr55AwV6VfQU1GM2EdtjIID0PISvw3gM0+j+qBLa7e2APdHaFY/T Ab4pKHpbEM8chqQfLjYuFewLZwAmIWF5b0EzbMRJ4SpjxKhpUnQAeGOyl7h6qoxnAVrY D5gCHElM3lFcKU4t4obDnU0Kr9Oq3jdGmH3/8mc4y8Zw83ommDgiiSxfBen5AoOv3/lo dgOn2mWY9DNDibFhNwIx9nxwzTq+ScgiLd4Wc/EUnHbQ3My/Oj8Q3GCHjo8QSFowfZHo G5Rg== X-Gm-Message-State: AOPr4FWNpMxxf+tf3xiBnNLGTlkkO62NguzStYZP+A7yXwl7R5tj6Y0BDprAutjC9UCMgv8aTJBSSl3B+4uyCQ== MIME-Version: 1.0 X-Received: by 10.176.2.87 with SMTP id 81mr5907041uas.104.1463437658337; Mon, 16 May 2016 15:27:38 -0700 (PDT) Received: by 10.103.44.148 with HTTP; Mon, 16 May 2016 15:27:38 -0700 (PDT) In-Reply-To: <5739D16A.9020907@suse.cz> References: <00578bce6fdccccacdd740ff0067ccde46f98f51.1461931011.git.mliska@suse.cz> <5739D16A.9020907@suse.cz> Date: Mon, 16 May 2016 22:27:00 -0000 Message-ID: Subject: Re: [PATCH 2/3] Add profiling support for IVOPTS From: "Bin.Cheng" To: =?UTF-8?Q?Martin_Li=C5=A1ka?= Cc: gcc-patches List Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-05/txt/msg01153.txt.bz2 > As profile-guided optimization can provide very useful information > about basic block frequencies within a loop, following patch set leverages > that information. It speeds up a single benchmark from upcoming SPECv6 > suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can > also improve others (currently measuring numbers for PGO). Hi, Is this 20% improvement from this patch, or does it include the existing PGO's improvement? For the patch: > + > + /* Return true if the frequency has a valid value. */ > + bool has_frequency (); > + > /* Return infinite comp_cost. */ > static comp_cost get_infinite (); > > @@ -249,6 +272,9 @@ private: > complexity field should be larger for more > complex expressions and addressing modes). */ > int m_scratch; /* Scratch used during cost computation. */ > + sreal m_frequency; /* Frequency of the basic block this comp_cost > + belongs to. */ > + sreal m_cost_scaled; /* Scalled runtime cost. */ IMHO we shouldn't embed frequency in comp_cost, neither record scaled cost in it. I would suggest we compute cost and amortize the cost over frequency in get_computation_cost_at before storing it into comp_cost. That is, once cost is computed/stored in comp_cost, it is already scaled with frequency. One argument is frequency info is only valid for use's statement/basic_block, it really doesn't have clear meaning in comp_cost structure. Outside of function get_computation_cost_at, I found it's hard to understand/remember what's the meaning of comp_cost.m_frequency and where it came from. There are other reasons embedded in below comments. > > > comp_cost& > @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other) > m_cost = other.m_cost; > m_complexity = other.m_complexity; > m_scratch = other.m_scratch; > + m_frequency = other.m_frequency; > + m_cost_scaled = other.m_cost_scaled; > > return *this; > } > @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2) > > cost1.m_cost += cost2.m_cost; > cost1.m_complexity += cost2.m_complexity; > + cost1.m_cost_scaled += cost2.m_cost_scaled; > > return cost1; > } > @@ -290,6 +319,8 @@ comp_cost > comp_cost::operator+= (HOST_WIDE_INT c) This and below operators need check for infinite cost first and return immediately. > { > this->m_cost += c; > + if (has_frequency ()) > + this->m_cost_scaled += scale_cost (c); > > return *this; > } > @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data, > (symbol/var1/const parts may be omitted). If we are looking for an > address, find the cost of addressing this. */ > if (address_p) > - return cost + get_address_cost (symbol_present, var_present, > - offset, ratio, cstepi, > - mem_mode, > - TYPE_ADDR_SPACE (TREE_TYPE (utype)), > - speed, stmt_is_after_inc, can_autoinc); > + { > + cost += get_address_cost (symbol_present, var_present, > + offset, ratio, cstepi, > + mem_mode, > + TYPE_ADDR_SPACE (TREE_TYPE (utype)), > + speed, stmt_is_after_inc, can_autoinc); > + goto ret; > + } > > /* Otherwise estimate the costs for computing the expression. */ > if (!symbol_present && !var_present && !offset) > { > if (ratio != 1) > cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed); > - return cost; > + goto ret; > } > > /* Symbol + offset should be compile-time computable so consider that they > @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data, > aratio = ratio > 0 ? ratio : -ratio; > if (aratio != 1) > cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed); > - return cost; > + > + goto ret; > > fallback: > if (can_autoinc) > @@ -5093,8 +5178,13 @@ fallback: > if (address_p) > comp = build_simple_mem_ref (comp); > > - return comp_cost (computation_cost (comp, speed), 0); > + cost = comp_cost (computation_cost (comp, speed), 0); > } > + > +ret: > + cost.calculate_scaled_cost (at->bb->frequency, > + data->current_loop->header->frequency); Here cost consists of two parts. One is for loop invariant computation, we amortize is against avg_loop_niter and record register pressure (either via invriant variables or invariant expressions) for it; the other is loop variant part. For the first part, we should not scaled it using frequency, since we have already assumed it would be hoisted out of loop. No matter where the use is, hoisted loop invariant has the same frequency as loop header. This is the second reason I want to factor frequency out of comp_cost. It's easier to scale with frequency only it's necessary. > + return cost; > } > > /* Determines the cost of the computation by that USE is expressed > @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data) > group = data->vgroups[i]; > > fprintf (dump_file, "Group %d:\n", i); > - fprintf (dump_file, " cand\tcost\tcompl.\tinv.ex.\tdepends on\n"); > + fprintf (dump_file, " cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex." > + "\tdepends on\n"); > for (j = 0; j < group->n_map_members; j++) > { > if (!group->cost_map[j].cand > || group->cost_map[j].cost.infinite_cost_p ()) > continue; > > - fprintf (dump_file, " %d\t%d\t%d\t", > + fprintf (dump_file, " %d\t%d\t%2.2f\t%2.2f\t%d\t", > group->cost_map[j].cand->id, > group->cost_map[j].cost.get_cost (), > + group->cost_map[j].cost.get_cost_scaled (), > + group->cost_map[j].cost.get_frequency (), > group->cost_map[j].cost.get_complexity ()); > if (group->cost_map[j].inv_expr != NULL) > fprintf (dump_file, "%d\t", > @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data) > } > fprintf (dump_file, "\n"); > } > + > + for (i = 0; i < data->vgroups.length (); i++) > + { > + group = data->vgroups[i]; > + for (j = 0; j < group->n_map_members; j++) > + { > + if (!group->cost_map[j].cand > + || group->cost_map[j].cost.infinite_cost_p ()) > + continue; > + > + group->cost_map[j].cost.propagate_scaled_cost (); > + } > + } This is wrong. m_frequency and m_cost_scaled are initialized to sreal(0) by default, and are never changed later for conditional iv_use. As a matter of factor, costs computed for all conditional iv_uses are wrong (value is 0). This makes the observed improvement not that promising. Considering code generation is very sensitive to cost computation, it maybe just hit some special cases. Eitherway we need more work/investigation on the impact of this patch. Again, I would suggest we factor out frequency out of comp_cost and only scale the cost in place when we compute cost for each use. Then we can measure what's the impact on code generation. Thanks, bin