[PATCH] Fix bdverN vector cost of cond_[not_]taken_branch

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost
@ 2015-04-07  8:19 Richard Biener
  2015-04-07 11:20 ` Uros Bizjak
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2015-04-07  8:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak


They are suspiciously low (compared to say scalar_stmt_cost) and with
them and the fix for the vectorizer cost model to properly account
scalar stmt costs (and thus correctly dealing with odd costs as bdverN
have) we regress 252.eon because we consider a loop vectorized and
peeled for alignment loop profitable which clearly isn't.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.  I've
tested with all b[dt]verN -marchs and the slp-pr56812.cc testcase
(yes, we've run into a similar issue earlier).  I've also put the
patch on our SPEC tester to look for fallout.

It really looks like the costs were derived by some automatic
searching of the parameter space and thus "optimizing" for bugs
in the vectorizer cost model that have meanwhile been fixed
(scalar stmt cost == 6 but scalar load/store cost == 4!?).  It is 
not a good idea to put in paramters that you can't make sense of
from an architectural point of view (yes, taken/not-taken branch
is somewhat bogus kinds, I'd like to change that to correctly
predicted / wrongly predicted for GCC 6).

Ok for trunk and 4.9 branch?

Thanks,
Richard.

2015-04-07  Richard Biener  <rguenther@suse.de>

	PR target/65660
	* config/i386/i386.c (bdver1_cost): Double cond_taken_branch_cost
	and cond_not_taken_branch_cost to 4 and 2.
	(bdver2_cost): Likewise.
	(bdver3_cost): Likewise.
	(bdver4_cost): Likewise.

Index: gcc/config/i386/i386.c
===================================================================
*** gcc/config/i386/i386.c	(revision 221888)
--- gcc/config/i386/i386.c	(working copy)
*************** const struct processor_costs bdver1_cost
*** 1025,1032 ****
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   2,					/* cond_taken_branch_cost.  */
!   1,					/* cond_not_taken_branch_cost.  */
  };
  
  /*  BDVER2 has optimized REP instruction for medium sized blocks, but for
--- 1025,1032 ----
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   4,					/* cond_taken_branch_cost.  */
!   2,					/* cond_not_taken_branch_cost.  */
  };
  
  /*  BDVER2 has optimized REP instruction for medium sized blocks, but for
*************** const struct processor_costs bdver2_cost
*** 1121,1128 ****
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   2,					/* cond_taken_branch_cost.  */
!   1,					/* cond_not_taken_branch_cost.  */
  };
  
  
--- 1121,1128 ----
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   4,					/* cond_taken_branch_cost.  */
!   2,					/* cond_not_taken_branch_cost.  */
  };
  
  
*************** struct processor_costs bdver3_cost = {
*** 1208,1215 ****
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   2,					/* cond_taken_branch_cost.  */
!   1,					/* cond_not_taken_branch_cost.  */
  };
  
  /*  BDVER4 has optimized REP instruction for medium sized blocks, but for
--- 1208,1215 ----
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   4,					/* cond_taken_branch_cost.  */
!   2,					/* cond_not_taken_branch_cost.  */
  };
  
  /*  BDVER4 has optimized REP instruction for medium sized blocks, but for
*************** struct processor_costs bdver4_cost = {
*** 1294,1301 ****
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   2,					/* cond_taken_branch_cost.  */
!   1,					/* cond_not_taken_branch_cost.  */
  };
  
    /* BTVER1 has optimized REP instruction for medium sized blocks, but for
--- 1294,1301 ----
    4,					/* vec_align_load_cost.  */
    4,					/* vec_unalign_load_cost.  */
    4,					/* vec_store_cost.  */
!   4,					/* cond_taken_branch_cost.  */
!   2,					/* cond_not_taken_branch_cost.  */
  };
  
    /* BTVER1 has optimized REP instruction for medium sized blocks, but for

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost
  2015-04-07  8:19 [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost Richard Biener
@ 2015-04-07 11:20 ` Uros Bizjak
  2015-04-08  6:39   ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 5+ messages in thread
From: Uros Bizjak @ 2015-04-07 11:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Gopalasubramanian, Ganesh

On Tue, Apr 7, 2015 at 10:19 AM, Richard Biener <rguenther@suse.de> wrote:
>
> They are suspiciously low (compared to say scalar_stmt_cost) and with
> them and the fix for the vectorizer cost model to properly account
> scalar stmt costs (and thus correctly dealing with odd costs as bdverN
> have) we regress 252.eon because we consider a loop vectorized and
> peeled for alignment loop profitable which clearly isn't.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.  I've
> tested with all b[dt]verN -marchs and the slp-pr56812.cc testcase
> (yes, we've run into a similar issue earlier).  I've also put the
> patch on our SPEC tester to look for fallout.
>
> It really looks like the costs were derived by some automatic
> searching of the parameter space and thus "optimizing" for bugs
> in the vectorizer cost model that have meanwhile been fixed
> (scalar stmt cost == 6 but scalar load/store cost == 4!?).  It is
> not a good idea to put in paramters that you can't make sense of
> from an architectural point of view (yes, taken/not-taken branch
> is somewhat bogus kinds, I'd like to change that to correctly
> predicted / wrongly predicted for GCC 6).
>
> Ok for trunk and 4.9 branch?

I have added a person from AMD to comment on the decision.

Otherwise, the patch looks OK, but please wait a couple of days for
possible comments.

Thanks,
Uros.

> Thanks,
> Richard.
>
> 2015-04-07  Richard Biener  <rguenther@suse.de>
>
>         PR target/65660
>         * config/i386/i386.c (bdver1_cost): Double cond_taken_branch_cost
>         and cond_not_taken_branch_cost to 4 and 2.
>         (bdver2_cost): Likewise.
>         (bdver3_cost): Likewise.
>         (bdver4_cost): Likewise.
>
> Index: gcc/config/i386/i386.c
> ===================================================================
> *** gcc/config/i386/i386.c      (revision 221888)
> --- gcc/config/i386/i386.c      (working copy)
> *************** const struct processor_costs bdver1_cost
> *** 1025,1032 ****
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   2,                                  /* cond_taken_branch_cost.  */
> !   1,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>   /*  BDVER2 has optimized REP instruction for medium sized blocks, but for
> --- 1025,1032 ----
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   4,                                  /* cond_taken_branch_cost.  */
> !   2,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>   /*  BDVER2 has optimized REP instruction for medium sized blocks, but for
> *************** const struct processor_costs bdver2_cost
> *** 1121,1128 ****
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   2,                                  /* cond_taken_branch_cost.  */
> !   1,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>
> --- 1121,1128 ----
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   4,                                  /* cond_taken_branch_cost.  */
> !   2,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>
> *************** struct processor_costs bdver3_cost = {
> *** 1208,1215 ****
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   2,                                  /* cond_taken_branch_cost.  */
> !   1,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>   /*  BDVER4 has optimized REP instruction for medium sized blocks, but for
> --- 1208,1215 ----
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   4,                                  /* cond_taken_branch_cost.  */
> !   2,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>   /*  BDVER4 has optimized REP instruction for medium sized blocks, but for
> *************** struct processor_costs bdver4_cost = {
> *** 1294,1301 ****
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   2,                                  /* cond_taken_branch_cost.  */
> !   1,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>     /* BTVER1 has optimized REP instruction for medium sized blocks, but for
> --- 1294,1301 ----
>     4,                                  /* vec_align_load_cost.  */
>     4,                                  /* vec_unalign_load_cost.  */
>     4,                                  /* vec_store_cost.  */
> !   4,                                  /* cond_taken_branch_cost.  */
> !   2,                                  /* cond_not_taken_branch_cost.  */
>   };
>
>     /* BTVER1 has optimized REP instruction for medium sized blocks, but for

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost
  2015-04-07 11:20 ` Uros Bizjak
@ 2015-04-08  6:39   ` Gopalasubramanian, Ganesh
  2015-04-08  7:38     ` Richard Biener
  0 siblings, 1 reply; 5+ messages in thread
From: Gopalasubramanian, Ganesh @ 2015-04-08  6:39 UTC (permalink / raw)
  To: Uros Bizjak, Richard Biener; +Cc: gcc-patches

> I have added a person from AMD to comment on the decision.
> Otherwise, the patch looks OK, but please wait a couple of days for possible comments.

Thank you Uros!
I am checking the changes with few tests and benchmarking them.
Please wait for a couple of days.

-Ganesh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost
  2015-04-08  6:39   ` Gopalasubramanian, Ganesh
@ 2015-04-08  7:38     ` Richard Biener
  2015-04-13  5:49       ` Gopalasubramanian, Ganesh
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Biener @ 2015-04-08  7:38 UTC (permalink / raw)
  To: Gopalasubramanian, Ganesh; +Cc: Uros Bizjak, gcc-patches

On Wed, 8 Apr 2015, Gopalasubramanian, Ganesh wrote:

> > I have added a person from AMD to comment on the decision.
> > Otherwise, the patch looks OK, but please wait a couple of days for possible comments.
> 
> Thank you Uros!
> I am checking the changes with few tests and benchmarking them.
> Please wait for a couple of days.

Note that before the fixes for PR64909 the epilogue/prologue loops
had very large costs associated due to a bug in the cost model
implementation.  After the fix their cost is reasonable but the
cost of the extra jumps is way under-accounted for due to the
numbers for cond_taken_branch_cost and cond_not_taken_branch_cost.
The proposes match mitigates that somewhat.

How did you arrive at the original cost model?

Thanks,
Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild,
Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost
  2015-04-08  7:38     ` Richard Biener
@ 2015-04-13  5:49       ` Gopalasubramanian, Ganesh
  0 siblings, 0 replies; 5+ messages in thread
From: Gopalasubramanian, Ganesh @ 2015-04-13  5:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Uros Bizjak, gcc-patches

>Note that before the fixes for PR64909 the epilogue/prologue loops had very large costs associated due to a bug in the cost model implementation.  After the fix their cost is reasonable but the cost of the extra jumps is way under-accounted for due to the numbers for >cond_taken_branch_cost and cond_not_taken_branch_cost.
> The proposes match mitigates that somewhat.

Richard! The patch is good. We are done with our benchmarking and found no regressions.

> How did you arrive at the original cost model?

The original cost model as you suspect is not based on architecture alone. Those are the numbers arrived at by analyzing benchmarks and the cost model bugs then. These initial numbers were copied for subsequent architectures too.
Cost assignments saying "scalar_stmt_cost = 6" and "scalar load_cost = 4" doesn't make sense at all. We will have a look into it.

Regards
Ganesh

-----Original Message-----
From: Richard Biener [mailto:rguenther@suse.de] 
Sent: Wednesday, April 08, 2015 1:08 PM
To: Gopalasubramanian, Ganesh
Cc: Uros Bizjak; gcc-patches@gcc.gnu.org
Subject: RE: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost

On Wed, 8 Apr 2015, Gopalasubramanian, Ganesh wrote:

> > I have added a person from AMD to comment on the decision.
> > Otherwise, the patch looks OK, but please wait a couple of days for possible comments.
> 
> Thank you Uros!
> I am checking the changes with few tests and benchmarking them.
> Please wait for a couple of days.

Note that before the fixes for PR64909 the epilogue/prologue loops had very large costs associated due to a bug in the cost model implementation.  After the fix their cost is reasonable but the cost of the extra jumps is way under-accounted for due to the numbers for cond_taken_branch_cost and cond_not_taken_branch_cost.
The proposes match mitigates that somewhat.

How did you arrive at the original cost model?

Thanks,
Richard.

--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-04-13  5:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-07  8:19 [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost Richard Biener
2015-04-07 11:20 ` Uros Bizjak
2015-04-08  6:39   ` Gopalasubramanian, Ganesh
2015-04-08  7:38     ` Richard Biener
2015-04-13  5:49       ` Gopalasubramanian, Ganesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).