From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-427404-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 96586 invoked by alias); 16 May 2016 22:27:48 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 96487 invoked by uid 89); 16 May 2016 22:27:47 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=speeds, pressure
X-HELO: mail-vk0-f47.google.com
Received: from mail-vk0-f47.google.com (HELO mail-vk0-f47.google.com) (209.85.213.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 16 May 2016 22:27:40 +0000
Received: by mail-vk0-f47.google.com with SMTP id c189so78363244vkb.1        for <gcc-patches@gcc.gnu.org>; Mon, 16 May 2016 15:27:40 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:date         :message-id:subject:from:to:cc;        bh=Oc6gftk3saq1CM6nvAlBHaLs9WnEWcRoafZCqMSt+Zg=;        b=mF0GbkIvxsoyVorS4XmFljgMO7rXwGgr98Ov8fEumBxlDtTw0okogc2fjU4IwQZ5Lh         JCn1NTihTZTRZc+ALr55AwV6VfQU1GM2EdtjIID0PISvw3gM0+j+qBLa7e2APdHaFY/T         Ab4pKHpbEM8chqQfLjYuFewLZwAmIWF5b0EzbMRJ4SpjxKhpUnQAeGOyl7h6qoxnAVrY         D5gCHElM3lFcKU4t4obDnU0Kr9Oq3jdGmH3/8mc4y8Zw83ommDgiiSxfBen5AoOv3/lo         dgOn2mWY9DNDibFhNwIx9nxwzTq+ScgiLd4Wc/EUnHbQ3My/Oj8Q3GCHjo8QSFowfZHo         G5Rg==
X-Gm-Message-State: AOPr4FWNpMxxf+tf3xiBnNLGTlkkO62NguzStYZP+A7yXwl7R5tj6Y0BDprAutjC9UCMgv8aTJBSSl3B+4uyCQ==
MIME-Version: 1.0
X-Received: by 10.176.2.87 with SMTP id 81mr5907041uas.104.1463437658337; Mon, 16 May 2016 15:27:38 -0700 (PDT)
Received: by 10.103.44.148 with HTTP; Mon, 16 May 2016 15:27:38 -0700 (PDT)
In-Reply-To: <5739D16A.9020907@suse.cz>
References: <cover.1461931011.git.mliska@suse.cz>	<00578bce6fdccccacdd740ff0067ccde46f98f51.1461931011.git.mliska@suse.cz>	<5739D16A.9020907@suse.cz>
Date: Mon, 16 May 2016 22:27:00 -0000
Message-ID: <CAHFci2_tWaneWe2kWfyrcAC-s+Mw_zc7f42X0ARZ12oD3wbtwg@mail.gmail.com>
Subject: Re: [PATCH 2/3] Add profiling support for IVOPTS
From: "Bin.Cheng" <amker.cheng@gmail.com>
To: =?UTF-8?Q?Martin_Li=C5=A1ka?= <mliska@suse.cz>
Cc: gcc-patches List <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-05/txt/msg01153.txt.bz2

> As profile-guided optimization can provide very useful information
> about basic block frequencies within a loop, following patch set leverages
> that information. It speeds up a single benchmark from upcoming SPECv6
> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
> also improve others (currently measuring numbers for PGO).
Hi,
Is this 20% improvement from this patch, or does it include the
existing PGO's improvement?

For the patch:
> +
> +  /* Return true if the frequency has a valid value.  */
> +  bool has_frequency ();
> +
>    /* Return infinite comp_cost.  */
>    static comp_cost get_infinite ();
>
> @@ -249,6 +272,9 @@ private:
>       complexity field should be larger for more
>       complex expressions and addressing modes).  */
>    int m_scratch;  /* Scratch used during cost computation.  */
> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
> +     belongs to.  */
> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
IMHO we shouldn't embed frequency in comp_cost, neither record scaled
cost in it.  I would suggest we compute cost and amortize the cost
over frequency in get_computation_cost_at before storing it into
comp_cost.  That is, once cost is computed/stored in comp_cost, it is
already scaled with frequency.  One argument is frequency info is only
valid for use's statement/basic_block, it really doesn't have clear
meaning in comp_cost structure.  Outside of function
get_computation_cost_at, I found it's hard to understand/remember
what's the meaning of comp_cost.m_frequency and where it came from.
There are other reasons embedded in below comments.
>
>
>  comp_cost&
> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>    m_cost = other.m_cost;
>    m_complexity = other.m_complexity;
>    m_scratch = other.m_scratch;
> +  m_frequency = other.m_frequency;
> +  m_cost_scaled = other.m_cost_scaled;
>
>    return *this;
>  }
> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>
>    cost1.m_cost += cost2.m_cost;
>    cost1.m_complexity += cost2.m_complexity;
> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>
>    return cost1;
>  }
> @@ -290,6 +319,8 @@ comp_cost
>  comp_cost::operator+= (HOST_WIDE_INT c)
This and below operators need check for infinite cost first and return
immediately.
>  {
>    this->m_cost += c;
> +  if (has_frequency ())
> +    this->m_cost_scaled += scale_cost (c);
>
>    return *this;
>  }
> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>       (symbol/var1/const parts may be omitted).  If we are looking for an
>       address, find the cost of addressing this.  */
>    if (address_p)
> -    return cost + get_address_cost (symbol_present, var_present,
> -    offset, ratio, cstepi,
> -    mem_mode,
> -    TYPE_ADDR_SPACE (TREE_TYPE (utype)),
> -    speed, stmt_is_after_inc, can_autoinc);
> +    {
> +      cost += get_address_cost (symbol_present, var_present,
> + offset, ratio, cstepi,
> + mem_mode,
> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
> + speed, stmt_is_after_inc, can_autoinc);
> +      goto ret;
> +    }
>
>    /* Otherwise estimate the costs for computing the expression.  */
>    if (!symbol_present && !var_present && !offset)
>      {
>        if (ratio != 1)
>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
> -      return cost;
> +      goto ret;
>      }
>
>    /* Symbol + offset should be compile-time computable so consider that they
> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>    aratio = ratio > 0 ? ratio : -ratio;
>    if (aratio != 1)
>      cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
> -  return cost;
> +
> +  goto ret;
>
>  fallback:
>    if (can_autoinc)
> @@ -5093,8 +5178,13 @@ fallback:
>      if (address_p)
>        comp = build_simple_mem_ref (comp);
>
> -    return comp_cost (computation_cost (comp, speed), 0);
> +    cost = comp_cost (computation_cost (comp, speed), 0);
>    }
> +
> +ret:
> +  cost.calculate_scaled_cost (at->bb->frequency,
> +      data->current_loop->header->frequency);
Here cost consists of two parts.  One is for loop invariant
computation, we amortize is against avg_loop_niter and record register
pressure (either via invriant variables or invariant expressions) for
it;  the other is loop variant part.  For the first part, we should
not scaled it using frequency, since we have already assumed it would
be hoisted out of loop.  No matter where the use is, hoisted loop
invariant has the same frequency as loop header.  This is the second
reason I want to factor frequency out of comp_cost.  It's easier to
scale with frequency only it's necessary.

> +  return cost;
>  }
>
>  /* Determines the cost of the computation by that USE is expressed
> @@ -5922,16 +6012,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>    group = data->vgroups[i];
>
>    fprintf (dump_file, "Group %d:\n", i);
> -  fprintf (dump_file, "  cand\tcost\tcompl.\tinv.ex.\tdepends on\n");
> +  fprintf (dump_file, "  cand\tcost\tscaled\tfreq.\tcompl.\tinv.ex."
> +   "\tdepends on\n");
>    for (j = 0; j < group->n_map_members; j++)
>      {
>        if (!group->cost_map[j].cand
>    || group->cost_map[j].cost.infinite_cost_p ())
>   continue;
>
> -      fprintf (dump_file, "  %d\t%d\t%d\t",
> +      fprintf (dump_file, "  %d\t%d\t%2.2f\t%2.2f\t%d\t",
>         group->cost_map[j].cand->id,
>         group->cost_map[j].cost.get_cost (),
> +       group->cost_map[j].cost.get_cost_scaled (),
> +       group->cost_map[j].cost.get_frequency (),
>         group->cost_map[j].cost.get_complexity ());
>        if (group->cost_map[j].inv_expr != NULL)
>   fprintf (dump_file, "%d\t",
> @@ -5948,6 +6041,19 @@ determine_group_iv_costs (struct ivopts_data *data)
>   }
>        fprintf (dump_file, "\n");
>      }
> +
> +  for (i = 0; i < data->vgroups.length (); i++)
> +    {
> +      group = data->vgroups[i];
> +      for (j = 0; j < group->n_map_members; j++)
> + {
> +  if (!group->cost_map[j].cand
> +      || group->cost_map[j].cost.infinite_cost_p ())
> +    continue;
> +
> +  group->cost_map[j].cost.propagate_scaled_cost ();
> + }
> +    }
This is wrong.  m_frequency and m_cost_scaled are initialized to
sreal(0) by default, and are never changed later for conditional
iv_use.  As a matter of factor, costs computed for all conditional
iv_uses are wrong (value is 0).  This makes the observed improvement
not that promising.  Considering code generation is very sensitive to
cost computation, it maybe just hit some special cases.  Eitherway we
need more work/investigation on the impact of this patch.

Again, I would suggest we factor out frequency out of comp_cost and
only scale the cost in place when we compute cost for each use.  Then
we can measure what's the impact on code generation.

Thanks,
bin