From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 82992 invoked by alias); 7 Apr 2015 08:19:18 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 82978 invoked by uid 89); 7 Apr 2015 08:19:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: mx2.suse.de Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Tue, 07 Apr 2015 08:19:16 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id D2289ABB3; Tue, 7 Apr 2015 08:19:12 +0000 (UTC) Date: Tue, 07 Apr 2015 08:19:00 -0000 From: Richard Biener To: gcc-patches@gcc.gnu.org cc: ubizjak@gmail.com Subject: [PATCH] Fix bdverN vector cost of cond_[not_]taken_branch_cost Message-ID: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-SW-Source: 2015-04/txt/msg00216.txt.bz2 They are suspiciously low (compared to say scalar_stmt_cost) and with them and the fix for the vectorizer cost model to properly account scalar stmt costs (and thus correctly dealing with odd costs as bdverN have) we regress 252.eon because we consider a loop vectorized and peeled for alignment loop profitable which clearly isn't. Bootstrap and regtest running on x86_64-unknown-linux-gnu. I've tested with all b[dt]verN -marchs and the slp-pr56812.cc testcase (yes, we've run into a similar issue earlier). I've also put the patch on our SPEC tester to look for fallout. It really looks like the costs were derived by some automatic searching of the parameter space and thus "optimizing" for bugs in the vectorizer cost model that have meanwhile been fixed (scalar stmt cost == 6 but scalar load/store cost == 4!?). It is not a good idea to put in paramters that you can't make sense of from an architectural point of view (yes, taken/not-taken branch is somewhat bogus kinds, I'd like to change that to correctly predicted / wrongly predicted for GCC 6). Ok for trunk and 4.9 branch? Thanks, Richard. 2015-04-07 Richard Biener PR target/65660 * config/i386/i386.c (bdver1_cost): Double cond_taken_branch_cost and cond_not_taken_branch_cost to 4 and 2. (bdver2_cost): Likewise. (bdver3_cost): Likewise. (bdver4_cost): Likewise. Index: gcc/config/i386/i386.c =================================================================== *** gcc/config/i386/i386.c (revision 221888) --- gcc/config/i386/i386.c (working copy) *************** const struct processor_costs bdver1_cost *** 1025,1032 **** 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 2, /* cond_taken_branch_cost. */ ! 1, /* cond_not_taken_branch_cost. */ }; /* BDVER2 has optimized REP instruction for medium sized blocks, but for --- 1025,1032 ---- 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 4, /* cond_taken_branch_cost. */ ! 2, /* cond_not_taken_branch_cost. */ }; /* BDVER2 has optimized REP instruction for medium sized blocks, but for *************** const struct processor_costs bdver2_cost *** 1121,1128 **** 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 2, /* cond_taken_branch_cost. */ ! 1, /* cond_not_taken_branch_cost. */ }; --- 1121,1128 ---- 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 4, /* cond_taken_branch_cost. */ ! 2, /* cond_not_taken_branch_cost. */ }; *************** struct processor_costs bdver3_cost = { *** 1208,1215 **** 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 2, /* cond_taken_branch_cost. */ ! 1, /* cond_not_taken_branch_cost. */ }; /* BDVER4 has optimized REP instruction for medium sized blocks, but for --- 1208,1215 ---- 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 4, /* cond_taken_branch_cost. */ ! 2, /* cond_not_taken_branch_cost. */ }; /* BDVER4 has optimized REP instruction for medium sized blocks, but for *************** struct processor_costs bdver4_cost = { *** 1294,1301 **** 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 2, /* cond_taken_branch_cost. */ ! 1, /* cond_not_taken_branch_cost. */ }; /* BTVER1 has optimized REP instruction for medium sized blocks, but for --- 1294,1301 ---- 4, /* vec_align_load_cost. */ 4, /* vec_unalign_load_cost. */ 4, /* vec_store_cost. */ ! 4, /* cond_taken_branch_cost. */ ! 2, /* cond_not_taken_branch_cost. */ }; /* BTVER1 has optimized REP instruction for medium sized blocks, but for