Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
    std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
    tmp = d;
    d = e;
    e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James

---
gcc/

2015-09-07  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.