* Re: [3/3] Fix PR78120, in ifcvt/rtlanal/i386. @ 2016-11-23 19:45 Uros Bizjak 0 siblings, 0 replies; 3+ messages in thread From: Uros Bizjak @ 2016-11-23 19:45 UTC (permalink / raw) To: gcc-patches; +Cc: Bernd Schmidt > 3. ifcvt computes the sum of costs for the involved blocks, but only > makes a before/after comparison when optimizing for size. When > optimizing for speed, it uses max_seq_cost, which is an estimate > computed from BRANCH_COST, which in turn can be zero for predictable > branches on x86. Can you please correct the addition below? It makes me cry thinking that buggy function will be immortalized in the gcc testsuite... > +static inline u128 add_u128 (u128 a, u128 b) > +{ a.hi += b.hi; > + a.lo += b.lo; > + if (a.lo < b.lo) > + a.hi++; > + > + return a; Uros. ^ permalink raw reply [flat|nested] 3+ messages in thread
* [0/3] Fix PR71280, in ifcvt/rtlanal/i386. @ 2016-11-23 18:58 Bernd Schmidt 2016-11-23 19:03 ` [3/3] Fix PR78120, " Bernd Schmidt 0 siblings, 1 reply; 3+ messages in thread From: Bernd Schmidt @ 2016-11-23 18:58 UTC (permalink / raw) To: GCC Patches This is a small series of patches to fix various problems in cost calculations that together caused PR71280, a missed optimization opportunity. A summary of the problems: 1. I noticed comparisons between set_src_cost and set_rtx_cost seemed to be invalid. There seems to be no good reason that insn_rtx_cost shouldn't use the latter. It also makes the numbers comparable to the ones you get from seq_cost. 2. The i386 backend mishandles SET rtxs. If you have a fairly plain single-insn SET, you tend to get COSTS_N_INSNS (2) out of set_rtx_cost, because rtx_costs has a default of COSTS_N_INSNS (1) for a SET, and you get the cost of the src in addition to that. 3. ifcvt computes the sum of costs for the involved blocks, but only makes a before/after comparison when optimizing for size. When optimizing for speed, it uses max_seq_cost, which is an estimate computed from BRANCH_COST, which in turn can be zero for predictable branches on x86. It seems a little risky to tweak costs this late in the process, but all of these should be improvements so it would put us on a better footing for fixing performance issues. I'll leave it to the reviewer to decide whether we want this now or after gcc-7. The series was bootstrapped and tested on x86_64-linux. There's the following new guality fail: -PASS: gcc.dg/guality/pr54693-2.c -Os line 21 x == 10 - i -PASS: gcc.dg/guality/pr54693-2.c -Os line 21 y == 20 - 2 * i +FAIL: gcc.dg/guality/pr54693-2.c -Os line 21 x == 10 - i +FAIL: gcc.dg/guality/pr54693-2.c -Os line 21 y == 20 - 2 * i which appears to be caused by loss of debuginfo in ivopts: - # DEBUG x => (int) ((unsigned int) x_9(D) - (unsigned int) i_14) - # DEBUG y => (int) ((unsigned int) y_10(D) - (unsigned int) i_14 * 2) + # DEBUG x => NULL + # DEBUG y => NULL I'd claim this is out of scope for this patch series. So, ok? Bernd ^ permalink raw reply [flat|nested] 3+ messages in thread
* [3/3] Fix PR78120, in ifcvt/rtlanal/i386. 2016-11-23 18:58 [0/3] Fix PR71280, " Bernd Schmidt @ 2016-11-23 19:03 ` Bernd Schmidt 2016-11-23 19:38 ` Jeff Law 0 siblings, 1 reply; 3+ messages in thread From: Bernd Schmidt @ 2016-11-23 19:03 UTC (permalink / raw) To: GCC Patches [-- Attachment #1: Type: text/plain, Size: 502 bytes --] On 11/23/2016 07:57 PM, Bernd Schmidt wrote: > 3. ifcvt computes the sum of costs for the involved blocks, but only > makes a before/after comparison when optimizing for size. When > optimizing for speed, it uses max_seq_cost, which is an estimate > computed from BRANCH_COST, which in turn can be zero for predictable > branches on x86. This is the final patch and has the testcase. It also happens to be the least risky of the series so it could be applied on its own (without the test). Bernd [-- Attachment #2: 71280-3.diff --] [-- Type: text/x-patch, Size: 3388 bytes --] PR rtl-optimization/78120 * ifcvt.c (noce_conversion_profitable_p): Check original cost in all cases, and additionally test against max_seq_cost for speed optimization. (noce_process_if_block): Compute an estimate for the original cost when optimizing for speed, using the minimum of then and else block costs. PR rtl-optimization/78120 * gcc.target/i386/pr78120.c: New test. Index: gcc/ifcvt.c =================================================================== --- gcc/ifcvt.c (revision 242038) +++ gcc/ifcvt.c (working copy) @@ -812,8 +812,10 @@ struct noce_if_info we're optimizing for size. */ bool speed_p; - /* The combined cost of COND, JUMP and the costs for THEN_BB and - ELSE_BB. */ + /* An estimate of the original costs. When optimizing for size, this is the + combined cost of COND, JUMP and the costs for THEN_BB and ELSE_BB. + When optimizing for speed, we use the costs of COND plus the minimum of + the costs for THEN_BB and ELSE_BB, as computed in the next field. */ unsigned int original_cost; /* Maximum permissible cost for the unconditional sequence we should @@ -852,12 +857,12 @@ noce_conversion_profitable_p (rtx_insn * /* Cost up the new sequence. */ unsigned int cost = seq_cost (seq, speed_p); + if (cost <= if_info->original_cost) + return true; + /* When compiling for size, we can make a reasonably accurately guess - at the size growth. */ - if (!speed_p) - return cost <= if_info->original_cost; - else - return cost <= if_info->max_seq_cost; + at the size growth. When compiling for speed, use the maximum. */ + return speed_p && cost <= if_info->max_seq_cost; } /* Helper function for noce_try_store_flag*. */ @@ -3441,15 +3446,24 @@ noce_process_if_block (struct noce_if_in } } - if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->original_cost, + bool speed_p = optimize_bb_for_speed_p (test_bb); + unsigned int then_cost = 0, else_cost = 0; + if (!bb_valid_for_noce_process_p (then_bb, cond, &then_cost, &if_info->then_simple)) return false; if (else_bb - && ! bb_valid_for_noce_process_p (else_bb, cond, &if_info->original_cost, - &if_info->else_simple)) + && !bb_valid_for_noce_process_p (else_bb, cond, &else_cost, + &if_info->else_simple)) return false; + if (else_bb == NULL) + if_info->original_cost += then_cost; + else if (speed_p) + if_info->original_cost += MIN (then_cost, else_cost); + else + if_info->original_cost += then_cost + else_cost; + insn_a = last_active_insn (then_bb, FALSE); set_a = single_set (insn_a); gcc_assert (set_a); Index: gcc/testsuite/gcc.target/i386/pr78120.c =================================================================== --- gcc/testsuite/gcc.target/i386/pr78120.c (nonexistent) +++ gcc/testsuite/gcc.target/i386/pr78120.c (working copy) @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mtune=generic" } */ +/* { dg-final { scan-assembler "adc" } } */ +/* { dg-final { scan-assembler-not "jmp" } } */ + +typedef unsigned long u64; + +typedef struct { + u64 hi, lo; +} u128; + +static inline u128 add_u128 (u128 a, u128 b) +{ + a.lo += b.lo; + if (a.lo < b.lo) + a.hi++; + + return a; +} + +extern u128 t1, t2, t3; + +void foo (void) +{ + t1 = add_u128 (t1, t2); + t1 = add_u128 (t1, t3); +} ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [3/3] Fix PR78120, in ifcvt/rtlanal/i386. 2016-11-23 19:03 ` [3/3] Fix PR78120, " Bernd Schmidt @ 2016-11-23 19:38 ` Jeff Law 0 siblings, 0 replies; 3+ messages in thread From: Jeff Law @ 2016-11-23 19:38 UTC (permalink / raw) To: Bernd Schmidt, GCC Patches On 11/23/2016 12:02 PM, Bernd Schmidt wrote: > On 11/23/2016 07:57 PM, Bernd Schmidt wrote: >> 3. ifcvt computes the sum of costs for the involved blocks, but only >> makes a before/after comparison when optimizing for size. When >> optimizing for speed, it uses max_seq_cost, which is an estimate >> computed from BRANCH_COST, which in turn can be zero for predictable >> branches on x86. > > This is the final patch and has the testcase. It also happens to be the > least risky of the series so it could be applied on its own (without the > test). > > > Bernd > > > 71280-3.diff > > > PR rtl-optimization/78120 > * ifcvt.c (noce_conversion_profitable_p): Check original cost in all > cases, and additionally test against max_seq_cost for speed > optimization. > (noce_process_if_block): Compute an estimate for the original cost when > optimizing for speed, using the minimum of then and else block costs. > > PR rtl-optimization/78120 > * gcc.target/i386/pr78120.c: New test. Also OK. Obviously Uros has the call on the x86 target change. Stage the series in as you see fit given the dependencies. jeff ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2016-11-23 19:45 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-23 19:45 [3/3] Fix PR78120, in ifcvt/rtlanal/i386 Uros Bizjak -- strict thread matches above, loose matches on Subject: below -- 2016-11-23 18:58 [0/3] Fix PR71280, " Bernd Schmidt 2016-11-23 19:03 ` [3/3] Fix PR78120, " Bernd Schmidt 2016-11-23 19:38 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).