On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > This patch introduces a new target hook, to be used like BRANCH_COST but
> > with a guaranteed unit of measurement. We want this to break away from
> > the current ambiguous uses of BRANCH_COST.
> >
> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
> > instruction counts - where it is used as the limit on the number of new
> > instructions we are permitted to generate. The other (after multiplying
> > by COSTS_N_INSNS (1)) directly against RTX costs.
> >
> > Of these, a comparison against RTX costs is the more easily understood
> > metric across the compiler, and the one I've pulled out to the new hook.
> > To keep things consistent for targets which don't migrate, this new hook
> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).
> >
> > OK?
>
> How does the caller compute "predictable"?  There are some archs where
> an information on whether this is a forward or backward jump is more
> useful I guess.  Also at least for !speed_p the distance of the branch is
> important given not all targets support arbitrary branch offsets.

Just through a call to predictable_edge_p. It isn't perfect. My worry
with adding more details of the branch is that you end up with a nonsense
target implementation that tries way too hard to be clever. But, I don't
mind passing the edge through to the target hook, that way a target has
it if they want it. In this patch revision, I pass the edge through.

> I remember that at the last Cauldron we discussed to change things to
> compare costs of sequences of instructions rather than giving targets no
> context with just asking for single (sub-)insn rtx costs.

I've made better use of seq_cost in this respin. Bernd was right,
constructing dummy RTX just for costs, then discarding it, then
constructing the actual RTX for matching doesn't make sense as a pipeline.
Better just to construct the real sequence and use the cost of that.

In this patch revision, I started by removing the idea that this costs
a branch at all. It doesn't, the use of this hook is really a target
trying to limit if-convert to not end up pulling too much on to the
unconditional path. It seems better to expose that limit directly by
explicitly asking for the maximum cost of an unconditional sequence we
would create, and comparing against seq_cost of the new RTL. This saves
a target trying to figure out what is meant by a cost of a branch.

Having done that, I think I can see a clearer path to getting the
default hook implementation in shape. I've introduced two new params,
which give maximum costs for the generated sequence (one for a "predictable"
branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
to be useful to give the user control in the case we are compiling for
size - whether this is a size win or not is independent of whether the
branch is predictable.

For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation. This means we're
still potentially going to introduce performance regressions for targets
that don't provide an implementation of the new hook, or a default value
for the new parameters. It does mean we can keep the testsuite clean by
setting parameter values suitably high for all targets that have
conditional move instructions.

The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.

This first patch introduces the two new parameters, and uses them in the
default implementation of the target hook.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (max_noce_ifcvt_seq_cost): New.
	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
	* doc/invoke.texi: Document new params.