On Mon, Jun 19, 2017 at 2:00 PM, Andrew Pinski wrote: > On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh > wrote: >> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote: >>> Hi, >>> Currently for the following function: >>> int f(int a, int b) >>> { >>> return a + (b <<7); >>> } >>> >>> GCC produces: >>> add w0, w0, w1, lsl 7 >>> But for ThunderX 1, it is better if the instruction was split allowing >>> better scheduling to happen in most cases, the latency is the same. I >>> get a small improvement in coremarks, ~1%. >>> >>> Currently the code does not take into account Arith_shift even though >>> the comment: >>> /* Strip any extend, leave shifts behind as we will >>> cost them through mult_cost. */ >>> Say it does not strip out the shift, aarch64_strip_extend does and has >>> always has since the back-end was added to GCC. >>> >>> Once I fixed the code around aarch64_strip_extend, I got a regression >>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero >>> extends) are considered free so I needed to add a new tuning flag. >>> >>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I >>> have not measured it yet as I have not made the change to >>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming >>> patch first before submitting any of the cost table changes. Also I >>> noticed this problem with this tuning first and then looked back at >>> what I needed to do for ThunderX 1. >>> >>> OK? Bootstrapped and tested on aarch64-linux-gnu without any >>> regressions (both with and without --with-cpu=thunderx). >> >> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap >> or free seems better. I have some other minor points below. > > > Ok, that seems like a good idea. I used easy since that was the > wording our hardware folks had came up with. I am changing the > comments to make clearer when this flag should be used. > I should a new patch out by the end of today. Due to the LSE ICE which I reported in the other thread, it took me longer to send out a new patch. Anyways here is the updated patch with the changes requested. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs): Increment Arith_shift and Arith_shift_reg by 1. * config/aarch64/aarch64-tuning-flags.def (cheap_shift_extend): New tuning flag. * config/aarch64/aarch64.c (thunderx_tunings): Enable AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. (aarch64_strip_extend): Add new argument and test for it. (aarch64_cheap_mult_shift_p): New function. (aarch64_rtx_mult_cost): Call aarch64_cheap_mult_shift_p and don't add a cost if it is true. Update calls to aarch64_strip_extend. (aarch64_rtx_costs): Update calls to aarch64_strip_extend. > > Thanks, > Andrew > > >> >>> Index: config/aarch64/aarch64-tuning-flags.def >>> =================================================================== >>> --- config/aarch64/aarch64-tuning-flags.def (revision 243974) >>> +++ config/aarch64/aarch64-tuning-flags.def (working copy) >>> @@ -35,4 +35,8 @@ two load/stores are not at least 8 byte >>> pairs. */ >>> AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) >>> >>> +/* Logical shift left <=4 with/without zero extend are considered easy >>> + extended, also zero extends without the shift. */ >> >> >> I'm struggling to parse this comment. "also zero extends without the shift" >> is what is getting me. I'm also not certain I follow when I should set this >> flag. If all shifts are cheap/free on my platform, should I set this flag? >> >>> +AARCH64_EXTRA_TUNING_OPTION ("easy_shift_extend", EASY_SHIFT_EXTEND) >>> + >>> #undef AARCH64_EXTRA_TUNING_OPTION >> >> >>> + >>> +/* Return true iff X is an easy shift without a sign extend. */ >>> + >> >> Again I don't like calling <= 4 "easy", it feels imprecise. >> >> Thanks, >> James >>