Hi Jeff, On 29/07/15 23:38, Jeff Law wrote: > On 07/29/2015 07:49 AM, Kyrill Tkachov wrote: >> Hi all, >> >> This patch improves RTL if-conversion on sequences that perform a >> conditional select on integer constants. >> Most of the smart logic to do that already exists in the >> noce_try_store_flag_constants function. >> However, currently that function is tried after noce_try_cmove. >> noce_try_cmove is a simple catch-all function that just loads the two >> immediates and performs a conditional >> select between them. It returns true and then the caller >> noce_process_if_block doesn't try any other transformations, >> completely skipping the more aggressive transformations that >> noce_try_store_flag_constants allows! >> >> Calling noce_try_store_flag_constants before noce_try_cmove allows for >> the smarter if-conversion transformations >> to be used. An example that occurs a lot in the gcc code itself is for >> the C code: >> int >> foo (int a, int b) >> { >> return ((a & (1 << 25)) ? 5 : 4); >> } >> >> i.e. test a bit in a and return 5 or 4. Currently on aarch64 this >> generates the naive: >> and w2, w0, 33554432 // mask away all bits except bit 25 >> mov w1, 4 >> cmp w2, wzr >> mov w0, 5 >> csel w0, w0, w1, ne >> >> >> whereas with this patch this can be transformed into the much better: >> ubfx x0, x0, 25, 1 // extract bit 25 >> add w0, w0, 4 > I suspect the PA would benefit from this as well, probably several > other architectures with reasonable bitfield extraction capabilities. This helps on arm as well but isn't enabled there at the moment because this function is gated on !have_conditional_execution (). We can look at enabling this for conditional execution as well as a separate piece of work. > >> Another issue I encountered is that the code that claims to perform the >> transformation: >> /* if (test) x = 3; else x = 4; >> => x = 3 + (test == 0); */ >> >> doesn't seem to do exactly that in all cases. In fact for that case it >> will try something like: >> x = 4 - (test == 0) >> which is suboptimal for targets like aarch64 which have a conditional >> increment operation. > I vaguely recall targets that don't have const - X insns, but do have X > + const style insns. And more generally I think we're better off > generating the PLUS version. Now that I think of it, aarch64 does have the CSETM instruction that conditionally sets a register to 0 or -1, but I didn't see it being utilised as frequently as I'd like to. Anyway, I do prefer generating the PLUS version when possible too, which is what this patch tries to do. > > >> This patch tweaks that code to always try to generate an addition of the >> condition rather than >> a subtraction. >> >> Anyway, for code: >> int >> fooinc (int x) >> { >> return x ? 1025 : 1026; >> } >> >> we currently generate: >> mov w2, 1025 >> mov w1, 1026 >> cmp w0, wzr >> csel w0, w2, w1, ne >> >> whereas with this patch we will generate: >> cmp w0, wzr >> cset w0, eq >> add w0, w0, 1025 >> >> Bootstrapped and tested on arm, aarch64, x86_64. >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> P.S. noce_try_store_flag_constants is currently gated on >> !targetm.have_conditional_execution () but I don't see >> any reason to restrict it on targets with conditional execution. For >> example, I think the first example above >> would show a benefit on arm if it was enabled there. But that can be a >> separate investigation. >> >> 2015-07-29 Kyrylo Tkachov >> >> * ifcvt.c (noce_try_store_flag_constants): Reverse when diff is >> -STORE_FLAG and condition is reversable. Prefer to add to the >> flag value. >> (noce_process_if_block): Try noce_try_store_flag_constants before >> noce_try_cmove. >> >> 2015-07-29 Kyrylo Tkachov >> >> * gcc.target/aarch64/csel_bfx_1.c: New test. >> * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. >> >> ifcvt-csel-imms.patch >> >> >> commit 0164ef164483bdf0b2f73e267e2ff1df7800dd6d >> Author: Kyrylo Tkachov >> Date: Tue Jul 28 14:59:46 2015 +0100 >> >> [RTL-ifcvt] Improve conditional increment ops on immediates >> >> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c >> index a57d78c..80d0285 100644 >> --- a/gcc/ifcvt.c >> +++ b/gcc/ifcvt.c >> @@ -1222,7 +1222,7 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) >> >> reversep = 0; >> if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) >> - normalize = 0; >> + normalize = 0, reversep = (diff == -STORE_FLAG_VALUE) && can_reverse; > We generally avoid using a ',' operator like this. However, I can see > that you're just following existing convention in that code. So I won't > object. Ok, I've respun the patch to convert this part to use a more conventional style and also took the liberty of converting reversep and can_reverse into bool variables. > > > >> else if (ifalse == 0 && exact_log2 (itrue) >= 0 >> && (STORE_FLAG_VALUE == 1 >> || if_info->branch_cost >= 2)) >> @@ -1261,10 +1261,13 @@ noce_try_store_flag_constants (struct noce_if_info *if_info) >> => x = 3 + (test == 0); */ >> if (diff == STORE_FLAG_VALUE || diff == -STORE_FLAG_VALUE) >> { >> - target = expand_simple_binop (mode, >> - (diff == STORE_FLAG_VALUE >> - ? PLUS : MINUS), >> - gen_int_mode (ifalse, mode), target, >> + rtx_code code = reversep ? PLUS : >> + (diff == STORE_FLAG_VALUE ? PLUS >> + : MINUS); >> + HOST_WIDE_INT to_add = reversep ? MIN (ifalse, itrue) : ifalse; >> + >> + target = expand_simple_binop (mode, code, >> + gen_int_mode (to_add, mode), target, >> if_info->x, 0, OPTAB_WIDEN); >> } > Note that STORE_FLAG_VALUE can potentially be any value. Most targets > use "1", but others use -1 (m68k for example, there are others). I'm > concerned that the reversep ? MIN (ifalse, itrue) won't do what we want > on targets such as the m68k. > > The logic here is also somewhat convoluted. When reversep is true, we > always use PLUS, but is that actually correct? Normally we'd compute > the code, then invert the code. ie, if we normally want PLUS, then > we've flip to MINUS and vice-versa. I agree that it's hard to follow. In this respin I have explicitly laid out the different combinations of diff and STORE_FLAG_VALUE. It is more verbose now, but hopefully it's easier to follow. > > >> @@ -3120,13 +3123,14 @@ noce_process_if_block (struct noce_if_info *if_info) >> goto success; >> if (noce_try_abs (if_info)) >> goto success; >> + if (!targetm.have_conditional_execution () >> + && noce_try_store_flag_constants (if_info)) >> + goto success; >> if (HAVE_conditional_move >> && noce_try_cmove (if_info)) >> goto success; >> if (! targetm.have_conditional_execution ()) >> { >> - if (noce_try_store_flag_constants (if_info)) >> - goto success; >> if (noce_try_addcc (if_info)) >> goto success; >> if (noce_try_store_flag_mask (if_info)) > This part seems fine and ought to be able to go forward independently. > Consider this hunk and its associated testcase approved if you want to > test it independently and get it installed while we iterate on the other > hunks, This respin includes that hunk, however if this patch goes through any more respins then I'll separate it out into a separate patch. This version has been bootstrapped and tested on x86_64 and aarch64. Thanks for the feedback! Kyrill 2015-07-30 Kyrylo Tkachov * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. > > jeff > >