public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR84986
@ 2018-03-20 10:35 Richard Biener
  2018-03-20 11:37 ` Jakub Jelinek
  2018-03-24 17:42 ` H.J. Lu
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Biener @ 2018-03-20 10:35 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jan Hubicka


With the x86 vectorizer cost-model rewrite we ended up costing
scalar conversions as nothing.  After my patch using the proper
target cost estimates for the scalar version this now exposes
underestimating scalar cost and thus no longer vectorizing
the testcase in this PR.  This fix is to restrict zero-costing
to sign-conversions, all other conversions are possibly
value-changing.  I guess some zero-extensions are free as well
but I didn't want to get too fancy as I'm not sure about
QImode -> SImode conversions for example since whether
that's free (can just use %eax instead of %ax) likely depends on 
context.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

2018-03-20  Richard Biener  <rguenther@suse.de>

	PR target/84986
	* config/i386/i386.c (ix86_add_stmt_cost): Only cost
	sign-conversions as zero, fall back to standard scalar_stmt
	cost for the rest.

	* gcc.dg/vect/costmodel/x86_64/costmodel-pr84986.c: New testcase.

Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	(revision 258674)
+++ gcc/config/i386/i386.c	(working copy)
@@ -50462,7 +50462,11 @@ ix86_add_stmt_cost (void *data, int coun
 	  }
 	  break;
 	case NOP_EXPR:
-	  stmt_cost = 0;
+	  /* Only sign-conversions are free.  */
+	  if (tree_nop_conversion_p
+	        (TREE_TYPE (gimple_assign_lhs (stmt_info->stmt)),
+		 TREE_TYPE (gimple_assign_rhs1 (stmt_info->stmt))))
+	    stmt_cost = 0;
 	  break;
 
 	case BIT_IOR_EXPR:
Index: gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr84986.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr84986.c	(nonexistent)
+++ gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-pr84986.c	(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_long } */
+
+int N;
+long fn1(void) {
+  short i;
+  long a;
+  i = a = 0;
+  while (i < N)
+    a -= i++;
+  return a;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 11:37 ` Jakub Jelinek
@ 2018-03-20 10:50   ` Richard Biener
  2018-03-20 13:36     ` Jakub Jelinek
  2018-03-20 13:53     ` Jan Hubicka
  0 siblings, 2 replies; 7+ messages in thread
From: Richard Biener @ 2018-03-20 10:50 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Jan Hubicka

On Tue, 20 Mar 2018, Jakub Jelinek wrote:

> On Tue, Mar 20, 2018 at 11:08:12AM +0100, Richard Biener wrote:
> > With the x86 vectorizer cost-model rewrite we ended up costing
> > scalar conversions as nothing.  After my patch using the proper
> > target cost estimates for the scalar version this now exposes
> > underestimating scalar cost and thus no longer vectorizing
> > the testcase in this PR.  This fix is to restrict zero-costing
> > to sign-conversions, all other conversions are possibly
> > value-changing.  I guess some zero-extensions are free as well
> > but I didn't want to get too fancy as I'm not sure about
> > QImode -> SImode conversions for example since whether
> > that's free (can just use %eax instead of %ax) likely depends on 
> > context.
> 
> Aren't casts from integral to integral with smaller precision also always
> zero cost, at least for scalar code?  It just expands to using a SUBREG
> of whatever holds the larger value.

If the precision matches the mode maybe.  But I thought we try to
avoid using %al (HImode) or %ax (QImode) operands in the end?
So it doesn't matter what we expand to but the code-generation
in the end is what matters?

Btw, the patched behavior is now that of GCC 7 apart from the
sign-conversion case costing nothing.

If x86 maintainers want to get fancy they can do that but I'm not
too familiar with the code to tell.  The only case I'm reasonably
sure is zero-extension from SImode to DImode which should be
always free since moves to %eax are zero-extending to 64bits.

Btw, float <-> int conversions will run into the generic costing for
vector_stmt and all other conversions into generic vec_promote_demote 
costing.

An alternative patch would be to just cost sign-extensions
(and maybe non-mode-size ones of any kind).

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 13:36     ` Jakub Jelinek
@ 2018-03-20 11:09       ` Richard Biener
  0 siblings, 0 replies; 7+ messages in thread
From: Richard Biener @ 2018-03-20 11:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Jan Hubicka

On Tue, 20 Mar 2018, Jakub Jelinek wrote:

> On Tue, Mar 20, 2018 at 11:41:01AM +0100, Richard Biener wrote:
> > If the precision matches the mode maybe.  But I thought we try to
> > avoid using %al (HImode) or %ax (QImode) operands in the end?
> 
> Yes, but we try to do that only on the level of the emitted assembly,
> sometimes we simply emit a QImode or HImode instruction that does
> movl %eax, %ebx
> and similar.  That is also zero cost compared to what it would cost to do it
> in SImode.

Sure - but that doesn't actually truncate?

That said, even sign-conversions might cost sth like unsigned HImode
to signed HImode might end up doing a HI->SImode sign-extension to
have the reg in promoted form.

Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 10:35 [PATCH] Fix PR84986 Richard Biener
@ 2018-03-20 11:37 ` Jakub Jelinek
  2018-03-20 10:50   ` Richard Biener
  2018-03-24 17:42 ` H.J. Lu
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2018-03-20 11:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jan Hubicka

On Tue, Mar 20, 2018 at 11:08:12AM +0100, Richard Biener wrote:
> With the x86 vectorizer cost-model rewrite we ended up costing
> scalar conversions as nothing.  After my patch using the proper
> target cost estimates for the scalar version this now exposes
> underestimating scalar cost and thus no longer vectorizing
> the testcase in this PR.  This fix is to restrict zero-costing
> to sign-conversions, all other conversions are possibly
> value-changing.  I guess some zero-extensions are free as well
> but I didn't want to get too fancy as I'm not sure about
> QImode -> SImode conversions for example since whether
> that's free (can just use %eax instead of %ax) likely depends on 
> context.

Aren't casts from integral to integral with smaller precision also always
zero cost, at least for scalar code?  It just expands to using a SUBREG
of whatever holds the larger value.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 10:50   ` Richard Biener
@ 2018-03-20 13:36     ` Jakub Jelinek
  2018-03-20 11:09       ` Richard Biener
  2018-03-20 13:53     ` Jan Hubicka
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2018-03-20 13:36 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Jan Hubicka

On Tue, Mar 20, 2018 at 11:41:01AM +0100, Richard Biener wrote:
> If the precision matches the mode maybe.  But I thought we try to
> avoid using %al (HImode) or %ax (QImode) operands in the end?

Yes, but we try to do that only on the level of the emitted assembly,
sometimes we simply emit a QImode or HImode instruction that does
movl %eax, %ebx
and similar.  That is also zero cost compared to what it would cost to do it
in SImode.

	Jakub

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 10:50   ` Richard Biener
  2018-03-20 13:36     ` Jakub Jelinek
@ 2018-03-20 13:53     ` Jan Hubicka
  1 sibling, 0 replies; 7+ messages in thread
From: Jan Hubicka @ 2018-03-20 13:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, gcc-patches

The way i originally made the code was going through ix86_rtx_costs
and factoring out/simplifying logic that translates directly to trees.
With NOP_EXPR I probably only tought of subgregs and other free stuff
which truly_noop_* probably simulates quite well.

ix86_rtx_costs understand how to zero extend, sign extend, and fp
conversions, but for scalar code only. I do remember looking into some
testcases with noop conversions that needs to be vectorized, so
costing them zero did improve code, but hopefully noop test will pass
for those.

We may get fancier next stage1 or if regressions surfaces.
Patch is OK. Thanks for looking into this!

Honza

Dne 2018-03-20 05:41, Richard Biener napsal:
> On Tue, 20 Mar 2018, Jakub Jelinek wrote:
> 
>> On Tue, Mar 20, 2018 at 11:08:12AM +0100, Richard Biener wrote:
>> > With the x86 vectorizer cost-model rewrite we ended up costing
>> > scalar conversions as nothing.  After my patch using the proper
>> > target cost estimates for the scalar version this now exposes
>> > underestimating scalar cost and thus no longer vectorizing
>> > the testcase in this PR.  This fix is to restrict zero-costing
>> > to sign-conversions, all other conversions are possibly
>> > value-changing.  I guess some zero-extensions are free as well
>> > but I didn't want to get too fancy as I'm not sure about
>> > QImode -> SImode conversions for example since whether
>> > that's free (can just use %eax instead of %ax) likely depends on
>> > context.
>> 
>> Aren't casts from integral to integral with smaller precision also 
>> always
>> zero cost, at least for scalar code?  It just expands to using a 
>> SUBREG
>> of whatever holds the larger value.
> 
> If the precision matches the mode maybe.  But I thought we try to
> avoid using %al (HImode) or %ax (QImode) operands in the end?
> So it doesn't matter what we expand to but the code-generation
> in the end is what matters?
> 
> Btw, the patched behavior is now that of GCC 7 apart from the
> sign-conversion case costing nothing.
> 
> If x86 maintainers want to get fancy they can do that but I'm not
> too familiar with the code to tell.  The only case I'm reasonably
> sure is zero-extension from SImode to DImode which should be
> always free since moves to %eax are zero-extending to 64bits.
> 
> Btw, float <-> int conversions will run into the generic costing for
> vector_stmt and all other conversions into generic vec_promote_demote
> costing.
> 
> An alternative patch would be to just cost sign-extensions
> (and maybe non-mode-size ones of any kind).
> 
> Richard.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Fix PR84986
  2018-03-20 10:35 [PATCH] Fix PR84986 Richard Biener
  2018-03-20 11:37 ` Jakub Jelinek
@ 2018-03-24 17:42 ` H.J. Lu
  1 sibling, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2018-03-24 17:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jan Hubicka

On Tue, Mar 20, 2018 at 3:08 AM, Richard Biener <rguenther@suse.de> wrote:
>
> With the x86 vectorizer cost-model rewrite we ended up costing
> scalar conversions as nothing.  After my patch using the proper
> target cost estimates for the scalar version this now exposes
> underestimating scalar cost and thus no longer vectorizing
> the testcase in this PR.  This fix is to restrict zero-costing
> to sign-conversions, all other conversions are possibly
> value-changing.  I guess some zero-extensions are free as well
> but I didn't want to get too fancy as I'm not sure about
> QImode -> SImode conversions for example since whether
> that's free (can just use %eax instead of %ax) likely depends on
> context.
>
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>
> OK?
>
> Thanks,
> Richard.
>
> 2018-03-20  Richard Biener  <rguenther@suse.de>
>
>         PR target/84986
>         * config/i386/i386.c (ix86_add_stmt_cost): Only cost
>         sign-conversions as zero, fall back to standard scalar_stmt
>         cost for the rest.
>
>         * gcc.dg/vect/costmodel/x86_64/costmodel-pr84986.c: New testcase.
>

The test failed for -mx32:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85066

-- 
H.J.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-24 17:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 10:35 [PATCH] Fix PR84986 Richard Biener
2018-03-20 11:37 ` Jakub Jelinek
2018-03-20 10:50   ` Richard Biener
2018-03-20 13:36     ` Jakub Jelinek
2018-03-20 11:09       ` Richard Biener
2018-03-20 13:53     ` Jan Hubicka
2018-03-24 17:42 ` H.J. Lu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).