From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13437 invoked by alias); 7 Apr 2015 11:20:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 13415 invoked by uid 89); 7 Apr 2015 11:20:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.8 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-ob0-f170.google.com Received: from mail-ob0-f170.google.com (HELO mail-ob0-f170.google.com) (209.85.214.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 07 Apr 2015 11:20:10 +0000 Received: by obbfy7 with SMTP id fy7so83408152obb.2 for ; Tue, 07 Apr 2015 04:20:08 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.182.255.195 with SMTP id as3mr24560709obd.56.1428405608239; Tue, 07 Apr 2015 04:20:08 -0700 (PDT) Received: by 10.60.168.19 with HTTP; Tue, 7 Apr 2015 04:20:08 -0700 (PDT) In-Reply-To: References: Date: Tue, 07 Apr 2015 11:20:00 -0000 Message-ID: Subject: Re: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost From: Uros Bizjak To: Richard Biener Cc: "gcc-patches@gcc.gnu.org" , "Gopalasubramanian, Ganesh" Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2015-04/txt/msg00229.txt.bz2 On Tue, Apr 7, 2015 at 10:19 AM, Richard Biener wrote: > > They are suspiciously low (compared to say scalar_stmt_cost) and with > them and the fix for the vectorizer cost model to properly account > scalar stmt costs (and thus correctly dealing with odd costs as bdverN > have) we regress 252.eon because we consider a loop vectorized and > peeled for alignment loop profitable which clearly isn't. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. I've > tested with all b[dt]verN -marchs and the slp-pr56812.cc testcase > (yes, we've run into a similar issue earlier). I've also put the > patch on our SPEC tester to look for fallout. > > It really looks like the costs were derived by some automatic > searching of the parameter space and thus "optimizing" for bugs > in the vectorizer cost model that have meanwhile been fixed > (scalar stmt cost == 6 but scalar load/store cost == 4!?). It is > not a good idea to put in paramters that you can't make sense of > from an architectural point of view (yes, taken/not-taken branch > is somewhat bogus kinds, I'd like to change that to correctly > predicted / wrongly predicted for GCC 6). > > Ok for trunk and 4.9 branch? I have added a person from AMD to comment on the decision. Otherwise, the patch looks OK, but please wait a couple of days for possible comments. Thanks, Uros. > Thanks, > Richard. > > 2015-04-07 Richard Biener > > PR target/65660 > * config/i386/i386.c (bdver1_cost): Double cond_taken_branch_cost > and cond_not_taken_branch_cost to 4 and 2. > (bdver2_cost): Likewise. > (bdver3_cost): Likewise. > (bdver4_cost): Likewise. > > Index: gcc/config/i386/i386.c > =================================================================== > *** gcc/config/i386/i386.c (revision 221888) > --- gcc/config/i386/i386.c (working copy) > *************** const struct processor_costs bdver1_cost > *** 1025,1032 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER2 has optimized REP instruction for medium sized blocks, but for > --- 1025,1032 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER2 has optimized REP instruction for medium sized blocks, but for > *************** const struct processor_costs bdver2_cost > *** 1121,1128 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > > --- 1121,1128 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > > *************** struct processor_costs bdver3_cost = { > *** 1208,1215 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER4 has optimized REP instruction for medium sized blocks, but for > --- 1208,1215 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BDVER4 has optimized REP instruction for medium sized blocks, but for > *************** struct processor_costs bdver4_cost = { > *** 1294,1301 **** > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 2, /* cond_taken_branch_cost. */ > ! 1, /* cond_not_taken_branch_cost. */ > }; > > /* BTVER1 has optimized REP instruction for medium sized blocks, but for > --- 1294,1301 ---- > 4, /* vec_align_load_cost. */ > 4, /* vec_unalign_load_cost. */ > 4, /* vec_store_cost. */ > ! 4, /* cond_taken_branch_cost. */ > ! 2, /* cond_not_taken_branch_cost. */ > }; > > /* BTVER1 has optimized REP instruction for medium sized blocks, but for