public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, ARM] stop changing signedness in PROMOTE_MODE
@ 2015-06-30  1:56 Jim Wilson
  2015-07-02  9:07 ` Richard Earnshaw
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jim Wilson @ 2015-06-30  1:56 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3594 bytes --]

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

My change was tested with an arm bootstrap, make check, and SPEC
CPU2000 run.  The original poster verified that this gives a linux
kernel that boots correctly.

The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
are tests to verify that smulbb and/or smlabb are generated.
Eliminating the unnecessary sign conversions causes us to get better
code that doesn't include the smulbb and smlabb instructions.  I had
to modify the testcases to get them to emit the desired instructions.
With the testcase changes there are no additional testsuite failures,
though I'm concerned that these testcases with the changes may be
fragile, and future changes may break them again.

If there are ARM parts where smulbb/smlabb are faster than mul/mla,
then maybe we should try to add new patterns to get the instructions
emitted again for the unmodified testcases.

Jim

[-- Attachment #2: pr65932-3.patch --]
[-- Type: text/x-patch, Size: 2775 bytes --]

gcc/
2015-06-29  Jim Wilson  <jim.wilson@linaro.org>

	PR target/65932
	* config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and
	HImode.

gcc/testsuite/
2015-06-29  Jim Wilson  <jim.wilson@linaro.org>

	PR target/65932
	* gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers.  Cast
	multiply operands to short.
	* gcc.target/arm/wmul-2.c (vec_mpy): Cast multiply result to short.
	* gcc.target/arm/wmul-3.c (mac): Cast multiply results to short.

Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 224672)
+++ config/arm/arm.h	(working copy)
@@ -523,16 +523,10 @@ extern int arm_arch_crc;
    type, but kept valid in the wider mode.  The signedness of the
    extension may differ from that of the type.  */
 
-/* It is far faster to zero extend chars than to sign extend them */
-
 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
   if (GET_MODE_CLASS (MODE) == MODE_INT		\
       && GET_MODE_SIZE (MODE) < 4)      	\
     {						\
-      if (MODE == QImode)			\
-	UNSIGNEDP = 1;				\
-      else if (MODE == HImode)			\
-	UNSIGNEDP = 1;				\
       (MODE) = SImode;				\
     }
 
Index: testsuite/gcc.target/arm/wmul-1.c
===================================================================
--- testsuite/gcc.target/arm/wmul-1.c	(revision 224672)
+++ testsuite/gcc.target/arm/wmul-1.c	(working copy)
@@ -2,14 +2,14 @@
 /* { dg-require-effective-target arm_dsp } */
 /* { dg-options "-O1 -fexpensive-optimizations" } */
 
-int mac(const short *a, const short *b, int sqr, int *sum)
+int mac(const int *a, const int *b, int sqr, int *sum)
 {
   int i;
   int dotp = *sum;
 
   for (i = 0; i < 150; i++) {
-    dotp += b[i] * a[i];
-    sqr += b[i] * b[i];
+    dotp += (short)b[i] * (short)a[i];
+    sqr += (short)b[i] * (short)b[i];
   }
 
   *sum = dotp;
Index: testsuite/gcc.target/arm/wmul-2.c
===================================================================
--- testsuite/gcc.target/arm/wmul-2.c	(revision 224672)
+++ testsuite/gcc.target/arm/wmul-2.c	(working copy)
@@ -7,7 +7,7 @@ void vec_mpy(int y[], const short x[], s
  int i;
 
  for (i = 0; i < 150; i++)
-   y[i] += ((scaler * x[i]) >> 31);
+   y[i] += ((short)(scaler * x[i]) >> 31);
 }
 
 /* { dg-final { scan-assembler-times "smulbb" 1 } } */
Index: testsuite/gcc.target/arm/wmul-3.c
===================================================================
--- testsuite/gcc.target/arm/wmul-3.c	(revision 224672)
+++ testsuite/gcc.target/arm/wmul-3.c	(working copy)
@@ -8,8 +8,8 @@ int mac(const short *a, const short *b,
   int dotp = *sum;
 
   for (i = 0; i < 150; i++) {
-    dotp -= b[i] * a[i];
-    sqr -= b[i] * b[i];
+    dotp -= (short)(b[i] * a[i]);
+    sqr -= (short)(b[i] * b[i]);
   }
 
   *sum = dotp;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-06-30  1:56 [PATCH, ARM] stop changing signedness in PROMOTE_MODE Jim Wilson
@ 2015-07-02  9:07 ` Richard Earnshaw
  2015-07-07 18:25   ` Jim Wilson
  2015-07-07 15:07 ` Jeff Law
  2016-02-04  8:58 ` Ramana Radhakrishnan
  2 siblings, 1 reply; 32+ messages in thread
From: Richard Earnshaw @ 2015-07-02  9:07 UTC (permalink / raw)
  To: Jim Wilson, gcc-patches

On 30/06/15 02:15, Jim Wilson wrote:
> This is my suggested fix for PR 65932, which is a linux kernel
> miscompile with gcc-5.1.
> 
> The problem here is caused by a chain of events.  The first is that
> the relatively new eipa_sra pass creates fake parameters that behave
> slightly differently than normal parameters.  The second is that the
> optimizer creates phi nodes that copy local variables to fake
> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
> assumes that it can emit simple move instructions for these phi nodes.
> And the fourth is that the ARM port has a PROMOTE_MODE macro that
> forces QImode and HImode to unsigned, but a
> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
> short parameters have different in register representations than local
> variables, and require a conversion when copying between them, a
> conversion that the out-of-ssa pass can't easily emit.
> 
> Ultimately, I think this is a problem in the arm backend.  It should
> not have a PROMOTE_MODE macro that is changing the sign of char and
> short local variables.  I also think that we should merge the
> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
> prevent this from happening again.
> 

The documentation for PROMOTE_MODE says:

For most machines, the macro definition does not change @var{unsignedp}.
However, some machines, have instructions that preferentially handle
either signed or unsigned quantities of certain modes.  For example, on
the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
sign-extend the result to 64 bits.  On such machines, set
@var{unsignedp} according to which kind of extension is more efficient.

So it seems to me that the ARM backend is only doing what the
documentation says should work.

> I see four general problems with the current ARM PROMOTE_MODE definition.
> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
> instruction was added.  It is a lose for armv6 and later.

Not quite, ARM state still has more flexible addressing modes for
unsigned byte loads than for signed byte loads.  It's even worse with
thumb1 where some signed loads have no single-register addressing mode
(ie you have to copy zero into another register to use as an index
before doing the load).


R.

> 2) Unsigned short was only faster for targets that don't support
> unaligned accesses.  Support for these targets was removed a while
> ago, and this PROMODE_MODE hunk should have been removed at the same
> time.  It was accidentally left behind.
> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
> converted to a function, the PROMOTE_MODE code was copied without the
> UNSIGNEDP changes.  Thus it is only an accident that
> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
> changes to resolve the difference are safe.
> 4) There is a general principle that you should only change signedness
> in PROMOTE_MODE if the hardware forces it, as otherwise this results
> in extra conversion instructions that make code slower.  The mips64
> hardware for instance requires that 32-bit values be sign-extended
> regardless of type, and instructions may trap if this is not true.
> However, it has a set of 32-bit instructions that operate on these
> values, and hence no conversions are required.  There is no similar
> case on ARM. Thus the conversions are unnecessary and unwise.  This
> can be seen in the testcases where gcc emits both a zero-extend and a
> sign-extend inside a loop, as the sign-extend is required for a
> compare, and the zero-extend is required by PROMOTE_MODE.
> 
> My change was tested with an arm bootstrap, make check, and SPEC
> CPU2000 run.  The original poster verified that this gives a linux
> kernel that boots correctly.
> 
> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
> are tests to verify that smulbb and/or smlabb are generated.
> Eliminating the unnecessary sign conversions causes us to get better
> code that doesn't include the smulbb and smlabb instructions.  I had
> to modify the testcases to get them to emit the desired instructions.
> With the testcase changes there are no additional testsuite failures,
> though I'm concerned that these testcases with the changes may be
> fragile, and future changes may break them again.
> 
> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
> then maybe we should try to add new patterns to get the instructions
> emitted again for the unmodified testcases.
> 
> Jim
> 
> 
> pr65932-3.patch
> 
> 
> gcc/
> 2015-06-29  Jim Wilson  <jim.wilson@linaro.org>
> 
> 	PR target/65932
> 	* config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and
> 	HImode.
> 
> gcc/testsuite/
> 2015-06-29  Jim Wilson  <jim.wilson@linaro.org>
> 
> 	PR target/65932
> 	* gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers.  Cast
> 	multiply operands to short.
> 	* gcc.target/arm/wmul-2.c (vec_mpy): Cast multiply result to short.
> 	* gcc.target/arm/wmul-3.c (mac): Cast multiply results to short.
> 
> Index: config/arm/arm.h
> ===================================================================
> --- config/arm/arm.h	(revision 224672)
> +++ config/arm/arm.h	(working copy)
> @@ -523,16 +523,10 @@ extern int arm_arch_crc;
>     type, but kept valid in the wider mode.  The signedness of the
>     extension may differ from that of the type.  */
>  
> -/* It is far faster to zero extend chars than to sign extend them */
> -
>  #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
>    if (GET_MODE_CLASS (MODE) == MODE_INT		\
>        && GET_MODE_SIZE (MODE) < 4)      	\
>      {						\
> -      if (MODE == QImode)			\
> -	UNSIGNEDP = 1;				\
> -      else if (MODE == HImode)			\
> -	UNSIGNEDP = 1;				\
>        (MODE) = SImode;				\
>      }
>  
> Index: testsuite/gcc.target/arm/wmul-1.c
> ===================================================================
> --- testsuite/gcc.target/arm/wmul-1.c	(revision 224672)
> +++ testsuite/gcc.target/arm/wmul-1.c	(working copy)
> @@ -2,14 +2,14 @@
>  /* { dg-require-effective-target arm_dsp } */
>  /* { dg-options "-O1 -fexpensive-optimizations" } */
>  
> -int mac(const short *a, const short *b, int sqr, int *sum)
> +int mac(const int *a, const int *b, int sqr, int *sum)
>  {
>    int i;
>    int dotp = *sum;
>  
>    for (i = 0; i < 150; i++) {
> -    dotp += b[i] * a[i];
> -    sqr += b[i] * b[i];
> +    dotp += (short)b[i] * (short)a[i];
> +    sqr += (short)b[i] * (short)b[i];
>    }
>  
>    *sum = dotp;
> Index: testsuite/gcc.target/arm/wmul-2.c
> ===================================================================
> --- testsuite/gcc.target/arm/wmul-2.c	(revision 224672)
> +++ testsuite/gcc.target/arm/wmul-2.c	(working copy)
> @@ -7,7 +7,7 @@ void vec_mpy(int y[], const short x[], s
>   int i;
>  
>   for (i = 0; i < 150; i++)
> -   y[i] += ((scaler * x[i]) >> 31);
> +   y[i] += ((short)(scaler * x[i]) >> 31);
>  }
>  
>  /* { dg-final { scan-assembler-times "smulbb" 1 } } */
> Index: testsuite/gcc.target/arm/wmul-3.c
> ===================================================================
> --- testsuite/gcc.target/arm/wmul-3.c	(revision 224672)
> +++ testsuite/gcc.target/arm/wmul-3.c	(working copy)
> @@ -8,8 +8,8 @@ int mac(const short *a, const short *b,
>    int dotp = *sum;
>  
>    for (i = 0; i < 150; i++) {
> -    dotp -= b[i] * a[i];
> -    sqr -= b[i] * b[i];
> +    dotp -= (short)(b[i] * a[i]);
> +    sqr -= (short)(b[i] * b[i]);
>    }
>  
>    *sum = dotp;
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-06-30  1:56 [PATCH, ARM] stop changing signedness in PROMOTE_MODE Jim Wilson
  2015-07-02  9:07 ` Richard Earnshaw
@ 2015-07-07 15:07 ` Jeff Law
  2015-07-07 16:29   ` Jim Wilson
  2016-02-04  8:58 ` Ramana Radhakrishnan
  2 siblings, 1 reply; 32+ messages in thread
From: Jeff Law @ 2015-07-07 15:07 UTC (permalink / raw)
  To: Jim Wilson, gcc-patches

On 06/29/2015 07:15 PM, Jim Wilson wrote:
> This is my suggested fix for PR 65932, which is a linux kernel
> miscompile with gcc-5.1.
>
> The problem here is caused by a chain of events.  The first is that
> the relatively new eipa_sra pass creates fake parameters that behave
> slightly differently than normal parameters.  The second is that the
> optimizer creates phi nodes that copy local variables to fake
> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
> assumes that it can emit simple move instructions for these phi nodes.
> And the fourth is that the ARM port has a PROMOTE_MODE macro that
> forces QImode and HImode to unsigned, but a
> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
> short parameters have different in register representations than local
> variables, and require a conversion when copying between them, a
> conversion that the out-of-ssa pass can't easily emit.
So if these "copies" require a  conversion, then isn't it fundamentally 
wrong to have a PHI node which copies between them?  That would seem to 
implicate the eipa_sra pass as needing to be aware of these promotions 
and avoid having these objects with different representations appearing 
on the lhs/rhs of a PHI node.

Jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-07 15:07 ` Jeff Law
@ 2015-07-07 16:29   ` Jim Wilson
  2015-07-07 21:35     ` Richard Biener
  2015-07-08 22:54     ` Jeff Law
  0 siblings, 2 replies; 32+ messages in thread
From: Jim Wilson @ 2015-07-07 16:29 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law <law@redhat.com> wrote:
> On 06/29/2015 07:15 PM, Jim Wilson wrote:
> So if these "copies" require a  conversion, then isn't it fundamentally
> wrong to have a PHI node which copies between them?  That would seem to
> implicate the eipa_sra pass as needing to be aware of these promotions and
> avoid having these objects with different representations appearing on the
> lhs/rhs of a PHI node.

My years at Cisco didn't give me a chance to work on the SSA passes,
so I don't know much about how they work.  But looking at this, I see
that PHI nodes are eventually handed by emit_partition_copy in
tree-outof-ssa.c, which calls convert_to_mode, so it appears that
conversions between different (closely related?) types are OK in a PHI
node.  The problem in this case is that we have the exact same type
for the src and dest.  The only difference is that the ARM forces
sign-extension for signed sub-word parameters and zero-extension for
signed sub-word locals.  Thus to detect the need for a conversion, you
have to have the decls, and we don't have them here.  There is also
the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
handling into cfglyout/tree-outof-ssa, or modify them to call
expand_expr for PHI nodes, and I haven't had any luck getting that to
work yet. I still need to learn more about the code to figure out if
this is possible.

I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
a signed sub-word and unsigned can lead to inefficient code.  This is
part of the problem is much easier for me to fix.  It may be hard to
convince ARM maintainers that it should be changed though.  I need
more time to work on this too.

I haven't looked at trying to forbid the optimizer from creating PHI
nodes connecting parameters to locals.  That just sounds like a
strange thing to forbid, and seems likely to result in worse code by
disabling too many optimizations.  But maybe it is the right solution
if neither of the other two options work.  This should only be done
when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
the copy to require a conversion.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-02  9:07 ` Richard Earnshaw
@ 2015-07-07 18:25   ` Jim Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Jim Wilson @ 2015-07-07 18:25 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On Thu, Jul 2, 2015 at 2:07 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> Not quite, ARM state still has more flexible addressing modes for
> unsigned byte loads than for signed byte loads.  It's even worse with
> thumb1 where some signed loads have no single-register addressing mode
> (ie you have to copy zero into another register to use as an index
> before doing the load).

I wasn't aware of the load address problem.  That was something I
hadn't considered, and will have to look at that.  Load is just one
instruction though.  For most other instructions, a zero-extend
results in less efficient code, because it then forces a sign-extend
before a signed operation.  The fact that parameters and locals are
handled differently which requires conversions when copying between
them results in more inefficient code.  And changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, and hence would be
unwise, so changing PROMOTE_MODE is the safer option.

Consider this testcase
extern signed short gs;

short
sub (void)
{
  signed short s = gs;
  int i;

  for (i = 0; i < 10; i++)
    {
      s += 1;
      if (s > 10) break;
    }

  return s;
}

The inner loop ends up as
.L3:
adds r3, r3, #1
mov r0, r1
uxth r3, r3
sxth r2, r3
cmp r2, #10
bgt .L8
cmp r2, r1
bne .L3
bx lr

We need the sign-extension for the compare.  We need the
zero-extension for the loop carried dependency.  We have two
extensions in every loop iteration, plus some extra register usage and
register movement.  We get better code for this example if we aren't
forcing signed shorts to be zero-extended via PROMOTE_MODE.

The lack of a reg+immediate address mode for ldrs[bh] in thumb1 does
look like a problem though.  But this means the difference between
generating
    movs r2, #0
    ldrsh r3, [r3, r2]
with my patch, or
    ldrh r3, [r3]
    lsls r2, r3, #16
    asrs r2, r2, #16
without my patch.  It isn't clear which sequence is better.  The
sign-extends in the second sequence can sometimes be optimized away,
and sometimes they can't be optimized away.  Similarly, in the first
sequence, loading zero into a reg can sometimes be optimized, and
sometimes it can't.  There is also no guarantee that you get the first
sequence with the patch or the second sequence without the patch.
There is a splitter for ldrsh, so you can get the second pattern
sometimes with the patch.  Similarly, it might be possible to get the
first pattern without the patch in some cases, though I don't have one
at the moment.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-07 16:29   ` Jim Wilson
@ 2015-07-07 21:35     ` Richard Biener
  2015-07-10 15:46       ` Jim Wilson
  2015-07-08 22:54     ` Jeff Law
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Biener @ 2015-07-07 21:35 UTC (permalink / raw)
  To: Jim Wilson, Jeff Law; +Cc: gcc-patches

On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>On Tue, Jul 7, 2015 at 8:07 AM, Jeff Law <law@redhat.com> wrote:
>> On 06/29/2015 07:15 PM, Jim Wilson wrote:
>> So if these "copies" require a  conversion, then isn't it
>fundamentally
>> wrong to have a PHI node which copies between them?  That would seem
>to
>> implicate the eipa_sra pass as needing to be aware of these
>promotions and
>> avoid having these objects with different representations appearing
>on the
>> lhs/rhs of a PHI node.
>
>My years at Cisco didn't give me a chance to work on the SSA passes,
>so I don't know much about how they work.  But looking at this, I see
>that PHI nodes are eventually handed by emit_partition_copy in
>tree-outof-ssa.c, which calls convert_to_mode, so it appears that
>conversions between different (closely related?) types are OK in a PHI
>node.  The problem in this case is that we have the exact same type
>for the src and dest.  The only difference is that the ARM forces
>sign-extension for signed sub-word parameters and zero-extension for
>signed sub-word locals.  Thus to detect the need for a conversion, you
>have to have the decls, and we don't have them here.  There is also
>the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
>and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
>for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
>handling into cfglyout/tree-outof-ssa, or modify them to call
>expand_expr for PHI nodes, and I haven't had any luck getting that to
>work yet. I still need to learn more about the code to figure out if
>this is possible.

It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.

>I also think that the ARM handling of PROMOTE_MODE is wrong.  Treating
>a signed sub-word and unsigned can lead to inefficient code.  This is
>part of the problem is much easier for me to fix.  It may be hard to
>convince ARM maintainers that it should be changed though.  I need
>more time to work on this too.
>
>I haven't looked at trying to forbid the optimizer from creating PHI
>nodes connecting parameters to locals.  That just sounds like a
>strange thing to forbid, and seems likely to result in worse code by
>disabling too many optimizations.  But maybe it is the right solution
>if neither of the other two options work.  This should only be done
>when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
>the copy to require a conversion.

I don't think disallowing such PHI nodes is the right thing to do.  I'd rather expose the TARGET_PROMOTE_FUNCTION_MODE effect earlier by modifying the parameter types during, say, gimplification.

Richard.

>Jim


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-07 16:29   ` Jim Wilson
  2015-07-07 21:35     ` Richard Biener
@ 2015-07-08 22:54     ` Jeff Law
  2015-07-10 15:35       ` Jim Wilson
  1 sibling, 1 reply; 32+ messages in thread
From: Jeff Law @ 2015-07-08 22:54 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc-patches

On 07/07/2015 10:29 AM, Jim Wilson wrote:
>
> My years at Cisco didn't give me a chance to work on the SSA passes,
> so I don't know much about how they work.  But looking at this, I see
> that PHI nodes are eventually handed by emit_partition_copy in
> tree-outof-ssa.c, which calls convert_to_mode, so it appears that
> conversions between different (closely related?) types are OK in a PHI
> node.
It'd be real interesting to see if that call ever does any real 
conversions -- ie, is it ever called with types that are not useless 
type conversions (useless_type_conversion_p has essentially become the 
definition of the gimple type system, for better or worse).


This is critically important as various parts of the compiler will take 
a degenerate PHI node and propagate the RHS of the PHI into the uses of 
the LHS of the PHI -- without doing any conversions.

So if you had something like

X = PHI (Y);

Where X is a sub-word local and Y is a sub-word parameter, the 
propagators will blindly replace uses of X with Y, which may be wrong if 
I understand your next paragraph correctly.




   The problem in this case is that we have the exact same type
> for the src and dest.  The only difference is that the ARM forces
> sign-extension for signed sub-word parameters and zero-extension for
> signed sub-word locals.  Thus to detect the need for a conversion, you
> have to have the decls, and we don't have them here.  There is also
> the problem that all of the SUBREG_PROMOTED_* stuff is in expand_expr
> and friends, which aren't used by the cfglayout/tree-outof-ssa.c code
> for PHI nodes.  So we need to copy some of the SUBREG_PROMOTED_*
> handling into cfglyout/tree-outof-ssa, or modify them to call
> expand_expr for PHI nodes, and I haven't had any luck getting that to
> work yet. I still need to learn more about the code to figure out if
> this is possible.
And this would seem to imply that those objects can't be set/referenced 
in the same PHI because of the zero vs sign extension issue.

>
> I haven't looked at trying to forbid the optimizer from creating PHI
> nodes connecting parameters to locals.  That just sounds like a
> strange thing to forbid, and seems likely to result in worse code by
> disabling too many optimizations.  But maybe it is the right solution
> if neither of the other two options work.  This should only be done
> when PROMOTE_MODE disagrees with TARGET_PROMOTE_FUNCTION_MODE, forcing
> the copy to require a conversion.
It's less about connecting parameters to locals and more about ensuring 
that objects connected are close enough in their types that one could be 
substituted for the other.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-08 22:54     ` Jeff Law
@ 2015-07-10 15:35       ` Jim Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Jim Wilson @ 2015-07-10 15:35 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Wed, Jul 8, 2015 at 3:54 PM, Jeff Law <law@redhat.com> wrote:
> On 07/07/2015 10:29 AM, Jim Wilson wrote:
> This is critically important as various parts of the compiler will take a
> degenerate PHI node and propagate the RHS of the PHI into the uses of the
> LHS of the PHI -- without doing any conversions.

I think this is OK, because tree-outof-ssa does send code in basic
blocks through expand_expr, which will emit conversions if necessary.
it is only the conversion of PHI nodes to RTL that is the problem, as
it doesn't use expand_expr, and hence doesn't get the
SUBREG_PROMOTED_P conversions.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-07 21:35     ` Richard Biener
@ 2015-07-10 15:46       ` Jim Wilson
  2015-07-13  8:19         ` Richard Biener
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Wilson @ 2015-07-10 15:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>>signed sub-word locals.  Thus to detect the need for a conversion, you
>>have to have the decls, and we don't have them here.  There is also
>
> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.

I tried looking again, and found the decls.  I'm able to get correct
code for my testcase with the attached patch to force the conversion.
It is rather inelegant, but I think I can cache the values I need to
make this simpler and cleaner.  I still don't have decls from
insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
looks like those are for breaking cycles, and hence might not need
conversions.

Jim

[-- Attachment #2: pr65932-4.patch --]
[-- Type: text/x-patch, Size: 2926 bytes --]

Index: tree-outof-ssa.c
===================================================================
--- tree-outof-ssa.c	(revision 225477)
+++ tree-outof-ssa.c	(working copy)
@@ -230,11 +230,32 @@ set_location_for_edge (edge e)
    SRC/DEST might be BLKmode memory locations SIZEEXP is a tree from
    which we deduce the size to copy in that case.  */
 
-static inline rtx_insn *
-emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp)
+rtx_insn *
+emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp,
+		     tree var2 ATTRIBUTE_UNUSED)
 {
   start_sequence ();
 
+  /* If var2 is set, then sizeexp is the src decl and var2 is the dest decl.  */
+  if (var2)
+    {
+      tree src_var = (TREE_CODE (sizeexp) == SSA_NAME
+		      ? SSA_NAME_VAR (sizeexp) : sizeexp);
+      tree dest_var = (TREE_CODE (var2) == SSA_NAME
+		       ? SSA_NAME_VAR (var2) : var2);
+      int src_unsignedp = TYPE_UNSIGNED (TREE_TYPE (src_var));
+      int dest_unsignedp = TYPE_UNSIGNED (TREE_TYPE (dest_var));
+      machine_mode src_mode = promote_decl_mode (src_var, &src_unsignedp);
+      machine_mode dest_mode = promote_decl_mode (dest_var, &dest_unsignedp);
+      if (src_unsignedp != dest_unsignedp
+	  && src_mode != DECL_MODE (src_var)
+	  && dest_mode != DECL_MODE (dest_var))
+	{
+	  src = gen_lowpart_common (DECL_MODE (src_var), src);
+	  unsignedsrcp = dest_unsignedp;
+	}
+    }
+
   if (GET_MODE (src) != VOIDmode && GET_MODE (src) != GET_MODE (dest))
     src = convert_to_mode (GET_MODE (dest), src, unsignedsrcp);
   if (GET_MODE (src) == BLKmode)
@@ -256,7 +277,7 @@ emit_partition_copy (rtx dest, rtx src,
 static void
 insert_partition_copy_on_edge (edge e, int dest, int src, source_location locus)
 {
-  tree var;
+  tree var, var2;
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       fprintf (dump_file,
@@ -276,10 +297,11 @@ insert_partition_copy_on_edge (edge e, i
     set_curr_insn_location (locus);
 
   var = partition_to_var (SA.map, src);
+  var2 = partition_to_var (SA.map, dest);
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
 				       copy_rtx (SA.partition_to_pseudo[src]),
 				       TYPE_UNSIGNED (TREE_TYPE (var)),
-				       var);
+				       var, var2);
 
   insert_insn_on_edge (seq, e);
 }
@@ -373,7 +395,8 @@ insert_rtx_to_part_on_edge (edge e, int
      involved), so it doesn't matter.  */
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
 				       src, unsignedsrcp,
-				       partition_to_var (SA.map, dest));
+				       partition_to_var (SA.map, dest), 0);
+
 
   insert_insn_on_edge (seq, e);
 }
@@ -406,7 +429,7 @@ insert_part_to_rtx_on_edge (edge e, rtx
   rtx_insn *seq = emit_partition_copy (dest,
 				       copy_rtx (SA.partition_to_pseudo[src]),
 				       TYPE_UNSIGNED (TREE_TYPE (var)),
-				       var);
+				       var, 0);
 
   insert_insn_on_edge (seq, e);
 }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-10 15:46       ` Jim Wilson
@ 2015-07-13  8:19         ` Richard Biener
  2015-07-13 15:29           ` Michael Matz
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Biener @ 2015-07-13  8:19 UTC (permalink / raw)
  To: Jim Wilson, Michael Matz; +Cc: Jeff Law, gcc-patches

On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
> On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>signed sub-word locals.  Thus to detect the need for a conversion, you
>>>have to have the decls, and we don't have them here.  There is also
>>
>> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.
>
> I tried looking again, and found the decls.  I'm able to get correct
> code for my testcase with the attached patch to force the conversion.
> It is rather inelegant, but I think I can cache the values I need to
> make this simpler and cleaner.  I still don't have decls from
> insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
> looks like those are for breaking cycles, and hence might not need
> conversions.

Yes, that looks like a defect.  CCing Micha who wrote this code

Richard.

> Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-13  8:19         ` Richard Biener
@ 2015-07-13 15:29           ` Michael Matz
  2015-07-13 15:35             ` H.J. Lu
  2015-07-14 16:38             ` Richard Earnshaw
  0 siblings, 2 replies; 32+ messages in thread
From: Michael Matz @ 2015-07-13 15:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jim Wilson, Jeff Law, gcc-patches

Hi,

On Mon, 13 Jul 2015, Richard Biener wrote:

> On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
> > On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
> > <richard.guenther@gmail.com> wrote:
> >> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
> >>>signed sub-word locals.  Thus to detect the need for a conversion, you
> >>>have to have the decls, and we don't have them here.  There is also
> >>
> >> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.
> >
> > I tried looking again, and found the decls.  I'm able to get correct
> > code for my testcase with the attached patch to force the conversion.
> > It is rather inelegant, but I think I can cache the values I need to
> > make this simpler and cleaner.  I still don't have decls from
> > insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
> > looks like those are for breaking cycles, and hence might not need
> > conversions.
> 
> Yes, that looks like a defect.  CCing Micha who wrote this code

I think it's a backend bug that parameters and locals are extended 
differently.  The code in tree-outof-ssa was written with the assumption 
that the modes of RTL objects might be different (larger) than the tree 
types suggest, but that they be _consistent_, i.e. that the particular 
extension depends on only the types, not on the tree-type of the decl.

I think the above assumption does make sense because it's also a 
fundamental assumption in the whole gimple pipeline, types matter, not the 
objects (or better, we slowly but surely work towards this).  Hence such 
mismatches should either not exist (changing the backend), or should be 
exposed explicitely during gimplification already.  The latter is a large 
change, though.

I think dealing with this situation in outof-ssa is a hack and I don't 
like it.  It would extend the ugliness of different modes for same types 
even more, and that's something we should (gradually) move away from.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-13 15:29           ` Michael Matz
@ 2015-07-13 15:35             ` H.J. Lu
  2015-07-14 16:38             ` Richard Earnshaw
  1 sibling, 0 replies; 32+ messages in thread
From: H.J. Lu @ 2015-07-13 15:35 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, Jim Wilson, Jeff Law, gcc-patches

On Mon, Jul 13, 2015 at 8:29 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 13 Jul 2015, Richard Biener wrote:
>
>> On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
>> > On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
>> > <richard.guenther@gmail.com> wrote:
>> >> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>> >>>signed sub-word locals.  Thus to detect the need for a conversion, you
>> >>>have to have the decls, and we don't have them here.  There is also
>> >>
>> >> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.
>> >
>> > I tried looking again, and found the decls.  I'm able to get correct
>> > code for my testcase with the attached patch to force the conversion.
>> > It is rather inelegant, but I think I can cache the values I need to
>> > make this simpler and cleaner.  I still don't have decls from
>> > insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
>> > looks like those are for breaking cycles, and hence might not need
>> > conversions.
>>
>> Yes, that looks like a defect.  CCing Micha who wrote this code
>
> I think it's a backend bug that parameters and locals are extended
> differently.  The code in tree-outof-ssa was written with the assumption
> that the modes of RTL objects might be different (larger) than the tree
> types suggest, but that they be _consistent_, i.e. that the particular
> extension depends on only the types, not on the tree-type of the decl.
>
> I think the above assumption does make sense because it's also a
> fundamental assumption in the whole gimple pipeline, types matter, not the
> objects (or better, we slowly but surely work towards this).  Hence such
> mismatches should either not exist (changing the backend), or should be
> exposed explicitely during gimplification already.  The latter is a large
> change, though.
>
> I think dealing with this situation in outof-ssa is a hack and I don't
> like it.  It would extend the ugliness of different modes for same types
> even more, and that's something we should (gradually) move away from.
>

Different parts of GCC have their own ideas how parameters should
be promoted, which leads to subtle bugs like PR 64037.


-- 
H.J.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-13 15:29           ` Michael Matz
  2015-07-13 15:35             ` H.J. Lu
@ 2015-07-14 16:38             ` Richard Earnshaw
  2015-07-14 16:49               ` Richard Biener
                                 ` (2 more replies)
  1 sibling, 3 replies; 32+ messages in thread
From: Richard Earnshaw @ 2015-07-14 16:38 UTC (permalink / raw)
  To: Michael Matz, Richard Biener; +Cc: Jim Wilson, Jeff Law, gcc-patches

On 13/07/15 16:29, Michael Matz wrote:
> Hi,
> 
> On Mon, 13 Jul 2015, Richard Biener wrote:
> 
>> On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>> On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>>> signed sub-word locals.  Thus to detect the need for a conversion, you
>>>>> have to have the decls, and we don't have them here.  There is also
>>>>
>>>> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.
>>>
>>> I tried looking again, and found the decls.  I'm able to get correct
>>> code for my testcase with the attached patch to force the conversion.
>>> It is rather inelegant, but I think I can cache the values I need to
>>> make this simpler and cleaner.  I still don't have decls from
>>> insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
>>> looks like those are for breaking cycles, and hence might not need
>>> conversions.
>>
>> Yes, that looks like a defect.  CCing Micha who wrote this code
> 
> I think it's a backend bug that parameters and locals are extended 
> differently.  The code in tree-outof-ssa was written with the assumption 
> that the modes of RTL objects might be different (larger) than the tree 
> types suggest, but that they be _consistent_, i.e. that the particular 
> extension depends on only the types, not on the tree-type of the decl.
> 

We went through this a couple of weeks back.  The backend documentation
for PROMOTE_MODE says:

" For most machines, the macro definition does not change @var{unsignedp}.
However, some machines, have instructions that preferentially handle
either signed or unsigned quantities of certain modes.  For example, on
the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
sign-extend the result to 64 bits.  On such machines, set
@var{unsignedp} according to which kind of extension is more efficient."

So clearly it anticipates that all permitted extensions should work, and
in particular it makes no mention of this having to match some
abi-mandated promotions.  That makes this a MI bug not a target one.

R.


> I think the above assumption does make sense because it's also a 
> fundamental assumption in the whole gimple pipeline, types matter, not the 
> objects (or better, we slowly but surely work towards this).  Hence such 
> mismatches should either not exist (changing the backend), or should be 
> exposed explicitely during gimplification already.  The latter is a large 
> change, though.
> 
> I think dealing with this situation in outof-ssa is a hack and I don't 
> like it.  It would extend the ugliness of different modes for same types 
> even more, and that's something we should (gradually) move away from.
> 
> 
> Ciao,
> Michael.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-14 16:38             ` Richard Earnshaw
@ 2015-07-14 16:49               ` Richard Biener
  2015-07-14 17:07               ` Jim Wilson
  2015-07-15 13:04               ` Michael Matz
  2 siblings, 0 replies; 32+ messages in thread
From: Richard Biener @ 2015-07-14 16:49 UTC (permalink / raw)
  To: Richard Earnshaw, Michael Matz; +Cc: Jim Wilson, Jeff Law, gcc-patches

On July 14, 2015 6:13:13 PM GMT+02:00, Richard Earnshaw <Richard.Earnshaw@foss.arm.com> wrote:
>On 13/07/15 16:29, Michael Matz wrote:
>> Hi,
>> 
>> On Mon, 13 Jul 2015, Richard Biener wrote:
>> 
>>> On Fri, Jul 10, 2015 at 5:46 PM, Jim Wilson <jim.wilson@linaro.org>
>wrote:
>>>> On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson
><jim.wilson@linaro.org> wrote:
>>>>>> signed sub-word locals.  Thus to detect the need for a
>conversion, you
>>>>>> have to have the decls, and we don't have them here.  There is
>also
>>>>>
>>>>> It probably is.  The decks for the parameter based SSA names are
>available, for the PHI destination there might be no decl.
>>>>
>>>> I tried looking again, and found the decls.  I'm able to get
>correct
>>>> code for my testcase with the attached patch to force the
>conversion.
>>>> It is rather inelegant, but I think I can cache the values I need
>to
>>>> make this simpler and cleaner.  I still don't have decls from
>>>> insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
>>>> looks like those are for breaking cycles, and hence might not need
>>>> conversions.
>>>
>>> Yes, that looks like a defect.  CCing Micha who wrote this code
>> 
>> I think it's a backend bug that parameters and locals are extended 
>> differently.  The code in tree-outof-ssa was written with the
>assumption 
>> that the modes of RTL objects might be different (larger) than the
>tree 
>> types suggest, but that they be _consistent_, i.e. that the
>particular 
>> extension depends on only the types, not on the tree-type of the
>decl.
>> 
>
>We went through this a couple of weeks back.  The backend documentation
>for PROMOTE_MODE says:
>
>" For most machines, the macro definition does not change
>@var{unsignedp}.
>However, some machines, have instructions that preferentially handle
>either signed or unsigned quantities of certain modes.  For example, on
>the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
>sign-extend the result to 64 bits.  On such machines, set
>@var{unsignedp} according to which kind of extension is more
>efficient."
>
>So clearly it anticipates that all permitted extensions should work,
>and
>in particular it makes no mention of this having to match some
>abi-mandated promotions.  That makes this a MI bug not a target one.

We could also decide that it is a documentation defect.  Are there any other targets with this inconsistency?

FWIW I'd prefer to expose the promoted incoming decls after gimplification. Independent on any inconsistency.

Richard.

>R.
>
>
>> I think the above assumption does make sense because it's also a 
>> fundamental assumption in the whole gimple pipeline, types matter,
>not the 
>> objects (or better, we slowly but surely work towards this).  Hence
>such 
>> mismatches should either not exist (changing the backend), or should
>be 
>> exposed explicitely during gimplification already.  The latter is a
>large 
>> change, though.
>> 
>> I think dealing with this situation in outof-ssa is a hack and I
>don't 
>> like it.  It would extend the ugliness of different modes for same
>types 
>> even more, and that's something we should (gradually) move away from.
>> 
>> 
>> Ciao,
>> Michael.
>> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-14 16:38             ` Richard Earnshaw
  2015-07-14 16:49               ` Richard Biener
@ 2015-07-14 17:07               ` Jim Wilson
  2015-07-14 17:23                 ` Richard Biener
  2015-07-15 13:25                 ` Michael Matz
  2015-07-15 13:04               ` Michael Matz
  2 siblings, 2 replies; 32+ messages in thread
From: Jim Wilson @ 2015-07-14 17:07 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Michael Matz, Richard Biener, Jeff Law, gcc-patches

On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> We went through this a couple of weeks back.  The backend documentation
> for PROMOTE_MODE says:

I disagree that this is a fully resolved issue.  I see clear problems
with how the ARM port uses PROMOTE_MODE.

> " For most machines, the macro definition does not change @var{unsignedp}.
> However, some machines, have instructions that preferentially handle
> either signed or unsigned quantities of certain modes.  For example, on
> the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
> sign-extend the result to 64 bits.  On such machines, set
> @var{unsignedp} according to which kind of extension is more efficient."

The Alpha case is different than the ARM case.  The Alpha only changes
sign for 32-bit values in 64-bit registers.  The alpha happens to have
a nice set of instructions that operates on 32-bit values, that
accepts these wrong-signed values and handle them correctly.  Thus on
the alpha, it appears that there are no extra zero/sign extends
required, and everything is the same speed or faster with the wrong
sign.  The MIPS port works the same way.  This is not true on the arm
though.  The arm port is changing sign on 8 and 16 bit value, but does
not have instructions that directly operate on 8 and 16 bit values.
This requires the compiler to emit extra zero/sign extend instructions
that affect the performance of the code.  Other than the thumb1 8-bit
load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't
seen anything that is faster in the wrong sign, and there are a number
of operations that are actually slower because of the extra zero/sign
extends required by the arm PROMOTE_MODE definition.  I've given some
examples.

I have since noticed that the relatively new pass to optimize away
unnecessary zero/sign extensions is actually fixing some of the damage
caused by the ARM PROMOTE_MODE definition.  But there are still some
cases that won't get fixed, as per my earlier examples.  It is better
to emit the fast code at the beginning than to rely on an optimization
pass to convert the slow code into fast code.  So I think the ARM
PROMOTE_MODE definition still needs to be fixed.

> So clearly it anticipates that all permitted extensions should work, and
> in particular it makes no mention of this having to match some
> abi-mandated promotions.  That makes this a MI bug not a target one.

If you look at older gcc releases, like gcc-2.95.3, you will see that
there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which
says whether PROMOTE_MODE is applied to function arguments or not.
You can't have different zero/sign extensions for parameters and
locals in gcc-2.95.3.  The fact that you can have it today is more an
accident than a deliberate choice.  When PROMOTE_FUNCTION_ARGS was
hookized, it was made into a separate function, and for the ARM port,
the code for PROMOTE_MODE was not correctly copied into the new hook,
resulting in the accidental difference for parameter and local
zero/sign promotion for ARM.  The PROMOTE_MODE docs were written back
when different sign/zero extensions for parms/locals weren't possible,
and hence did not consider the case.

Now that we do have the problem, we can't fix it without an ARM port
ABI change, which is undesirable, so we may have to fix it with a MI
change.

There were two MI changes suggested, one was fixing the out-of-ssa
pass to handle SUBREG_PROMOTED_P promotions.  The other was to
disallow creating PHI nodes between parms and locals.  I haven't had a
chance to try implementing the second one yet; I hope to work on that
today.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-14 17:07               ` Jim Wilson
@ 2015-07-14 17:23                 ` Richard Biener
  2015-07-15 13:25                 ` Michael Matz
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Biener @ 2015-07-14 17:23 UTC (permalink / raw)
  To: Jim Wilson, Richard Earnshaw; +Cc: Michael Matz, Jeff Law, gcc-patches

On July 14, 2015 6:49:18 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw
><Richard.Earnshaw@foss.arm.com> wrote:
>> We went through this a couple of weeks back.  The backend
>documentation
>> for PROMOTE_MODE says:
>
>I disagree that this is a fully resolved issue.  I see clear problems
>with how the ARM port uses PROMOTE_MODE.
>
>> " For most machines, the macro definition does not change
>@var{unsignedp}.
>> However, some machines, have instructions that preferentially handle
>> either signed or unsigned quantities of certain modes.  For example,
>on
>> the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
>> sign-extend the result to 64 bits.  On such machines, set
>> @var{unsignedp} according to which kind of extension is more
>efficient."
>
>The Alpha case is different than the ARM case.  The Alpha only changes
>sign for 32-bit values in 64-bit registers.  The alpha happens to have
>a nice set of instructions that operates on 32-bit values, that
>accepts these wrong-signed values and handle them correctly.  Thus on
>the alpha, it appears that there are no extra zero/sign extends
>required, and everything is the same speed or faster with the wrong
>sign.  The MIPS port works the same way.  This is not true on the arm
>though.  The arm port is changing sign on 8 and 16 bit value, but does
>not have instructions that directly operate on 8 and 16 bit values.
>This requires the compiler to emit extra zero/sign extend instructions
>that affect the performance of the code.  Other than the thumb1 8-bit
>load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't
>seen anything that is faster in the wrong sign, and there are a number
>of operations that are actually slower because of the extra zero/sign
>extends required by the arm PROMOTE_MODE definition.  I've given some
>examples.
>
>I have since noticed that the relatively new pass to optimize away
>unnecessary zero/sign extensions is actually fixing some of the damage
>caused by the ARM PROMOTE_MODE definition.  But there are still some
>cases that won't get fixed, as per my earlier examples.  It is better
>to emit the fast code at the beginning than to rely on an optimization
>pass to convert the slow code into fast code.  So I think the ARM
>PROMOTE_MODE definition still needs to be fixed.
>
>> So clearly it anticipates that all permitted extensions should work,
>and
>> in particular it makes no mention of this having to match some
>> abi-mandated promotions.  That makes this a MI bug not a target one.
>
>If you look at older gcc releases, like gcc-2.95.3, you will see that
>there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which
>says whether PROMOTE_MODE is applied to function arguments or not.
>You can't have different zero/sign extensions for parameters and
>locals in gcc-2.95.3.  The fact that you can have it today is more an
>accident than a deliberate choice.  When PROMOTE_FUNCTION_ARGS was
>hookized, it was made into a separate function, and for the ARM port,
>the code for PROMOTE_MODE was not correctly copied into the new hook,
>resulting in the accidental difference for parameter and local
>zero/sign promotion for ARM.  The PROMOTE_MODE docs were written back
>when different sign/zero extensions for parms/locals weren't possible,
>and hence did not consider the case.
>
>Now that we do have the problem, we can't fix it without an ARM port
>ABI change, which is undesirable, so we may have to fix it with a MI
>change.
>
>There were two MI changes suggested, one was fixing the out-of-ssa
>pass to handle SUBREG_PROMOTED_P promotions.  The other was to
>disallow creating PHI nodes between parms and locals.  I haven't had a
>chance to try implementing the second one yet; I hope to work on that
>today.

I will definitely veto such restriction.

Richard.

>Jim


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-14 16:38             ` Richard Earnshaw
  2015-07-14 16:49               ` Richard Biener
  2015-07-14 17:07               ` Jim Wilson
@ 2015-07-15 13:04               ` Michael Matz
  2 siblings, 0 replies; 32+ messages in thread
From: Michael Matz @ 2015-07-15 13:04 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Richard Biener, Jim Wilson, Jeff Law, gcc-patches

Hi,

On Tue, 14 Jul 2015, Richard Earnshaw wrote:

> > I think it's a backend bug that parameters and locals are extended 
> > differently.  The code in tree-outof-ssa was written with the 
> > assumption that the modes of RTL objects might be different (larger) 
> > than the tree types suggest, but that they be _consistent_, i.e. that 
> > the particular extension depends on only the types, not on the 
> > tree-type of the decl.
> 
> We went through this a couple of weeks back.  The backend documentation 
> for PROMOTE_MODE says:
> 
> " For most machines, the macro definition does not change 
> @var{unsignedp}. However, some machines, have instructions that 
> preferentially handle either signed or unsigned quantities of certain 
> modes.  For example, on the DEC Alpha, 32-bit loads from memory and 
> 32-bit add instructions sign-extend the result to 64 bits.  On such 
> machines, set @var{unsignedp} according to which kind of extension is 
> more efficient."

We aren't talking about what PROMOTE_MODE is or is not allowed, but about 
an inconsistency between what PROMOTE_MODE does and what 
target.promote_function_mode does.  This is quite clearly a historic 
accident in the ARM port, namely that they extend parameters and locals in 
opposite ways, which originally wasn't even possible to describe in GCC.

> So clearly it anticipates that all permitted extensions should work, and 
> in particular it makes no mention of this having to match some 
> abi-mandated promotions.  That makes this a MI bug not a target one.

All extensions do work just fine, except when the LHS and RHS of a copy 
instruction with the same types require different extension types 
depending on if the rhs is a PARM_DECL or not.  It's also fundamental in 
gimple that copies can't change signedness, and this insonsistency breaks 
that assumption.  Jim says that this is actually harmless in reality, 
because even a copy propagation will at most replace a use of a VAR_DECL 
with a PARM_DECL, and at the point of such use the other promotion would 
be added by expand.  While the latter is true, I haven't really convinced 
myself that this really leads to correct code in all cases.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-14 17:07               ` Jim Wilson
  2015-07-14 17:23                 ` Richard Biener
@ 2015-07-15 13:25                 ` Michael Matz
  2015-07-15 16:01                   ` Jim Wilson
  1 sibling, 1 reply; 32+ messages in thread
From: Michael Matz @ 2015-07-15 13:25 UTC (permalink / raw)
  To: Jim Wilson; +Cc: Richard Earnshaw, Richard Biener, Jeff Law, gcc-patches

Hi,

On Tue, 14 Jul 2015, Jim Wilson wrote:

> Now that we do have the problem, we can't fix it without an ARM port ABI 
> change, which is undesirable, so we may have to fix it with a MI change.

What's the ABI implication of fixing the inconsistency?

> There were two MI changes suggested, one was fixing the out-of-ssa pass 
> to handle SUBREG_PROMOTED_P promotions.  The other was to disallow 
> creating PHI nodes between parms and locals.  I haven't had a chance to 
> try implementing the second one yet; I hope to work on that today.

Don't bother with the latter, it doesn't have a chance of being accepted.

If the terrible hack in outof-ssa really will be necessary (and I really 
really hope it won't) then I think I prefer the approach you partly tried 
in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to 
the promoted subreg and deal with that situation in emit_partition_copy; 
I'd then hope that the unsignedsrcp parameter could go away (unfortunately 
the sizeexp will have to stay).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-15 13:25                 ` Michael Matz
@ 2015-07-15 16:01                   ` Jim Wilson
  2015-07-16  9:40                     ` Richard Earnshaw
  0 siblings, 1 reply; 32+ messages in thread
From: Jim Wilson @ 2015-07-15 16:01 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Earnshaw, Richard Biener, Jeff Law, gcc-patches

On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 14 Jul 2015, Jim Wilson wrote:
>
>> Now that we do have the problem, we can't fix it without an ARM port ABI
>> change, which is undesirable, so we may have to fix it with a MI change.
>
> What's the ABI implication of fixing the inconsistency?

Currently signed chars and signed shorts are passed sign-extended.  If
we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE,
then they will be passed zero-extended.

Given the testcase:

int sub (int) __attribute__ ((noinline));
int sub2 (signed char) __attribute__ ((noinline));
int sub (int i) { return sub2 (i); }
int sub2 (signed char c) { return c & 0xff; }

Currently sub will do a char sign-extend to convert the int to signed
char, and sub2 will do a char zero-extend for the and.  With the
change, sub will do a char zero-extend to convert the int to unsigned
char, and sub2 will do nothing.  If you compile sub without the change
and sub2 with the change, then you lose the and operation and get a
sign-extended char at the end.

>> There were two MI changes suggested, one was fixing the out-of-ssa pass
>> to handle SUBREG_PROMOTED_P promotions.  The other was to disallow
>> creating PHI nodes between parms and locals.  I haven't had a chance to
>> try implementing the second one yet; I hope to work on that today.
>
> Don't bother with the latter, it doesn't have a chance of being accepted.

I tried looking at it anyways, as I need to learn more about this
stuff.  It didn't seem feasible without changing a lot of optimization
passes which doesn't seem reasonable.

> If the terrible hack in outof-ssa really will be necessary (and I really
> really hope it won't) then I think I prefer the approach you partly tried
> in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to
> the promoted subreg and deal with that situation in emit_partition_copy;
> I'd then hope that the unsignedsrcp parameter could go away (unfortunately
> the sizeexp will have to stay).

Yes, I think that is a cleaner way to do it, but I had trouble getting
that to work as I don't know enough about the code yet.  Doing it
directly in emit_partition_copy was easier, just to prove it could
work.  I can go back and try to make this work again.

Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-15 16:01                   ` Jim Wilson
@ 2015-07-16  9:40                     ` Richard Earnshaw
  2015-07-16 15:02                       ` Michael Matz
  0 siblings, 1 reply; 32+ messages in thread
From: Richard Earnshaw @ 2015-07-16  9:40 UTC (permalink / raw)
  To: Jim Wilson, Michael Matz; +Cc: Richard Biener, Jeff Law, gcc-patches

On 15/07/15 16:54, Jim Wilson wrote:
> On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz <matz@suse.de> wrote:
>> Hi,
>>
>> On Tue, 14 Jul 2015, Jim Wilson wrote:
>>
>>> Now that we do have the problem, we can't fix it without an ARM port ABI
>>> change, which is undesirable, so we may have to fix it with a MI change.
>>
>> What's the ABI implication of fixing the inconsistency?
> 

I think that's the wrong question.  We wouldn't change the ABI to fix an
internal problem in GCC.  So the real question is what's the performance
impact of changing PROMOTE_MODE to be the same as the ABI requirements?

There's no easy answer to that since we'd have to consider all the
different architecture variants we currently have to support.  At the
very least we'd have to think about ARMv3 (no signed byte or half-word
loads), ARMv4 (signed byte and half-word loads), ARMv4t (thumb1) and
ARMv7 (thumb2).

R.

> Currently signed chars and signed shorts are passed sign-extended.  If
> we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE,
> then they will be passed zero-extended.
> 
> Given the testcase:
> 
> int sub (int) __attribute__ ((noinline));
> int sub2 (signed char) __attribute__ ((noinline));
> int sub (int i) { return sub2 (i); }
> int sub2 (signed char c) { return c & 0xff; }
> 
> Currently sub will do a char sign-extend to convert the int to signed
> char, and sub2 will do a char zero-extend for the and.  With the
> change, sub will do a char zero-extend to convert the int to unsigned
> char, and sub2 will do nothing.  If you compile sub without the change
> and sub2 with the change, then you lose the and operation and get a
> sign-extended char at the end.
> 
>>> There were two MI changes suggested, one was fixing the out-of-ssa pass
>>> to handle SUBREG_PROMOTED_P promotions.  The other was to disallow
>>> creating PHI nodes between parms and locals.  I haven't had a chance to
>>> try implementing the second one yet; I hope to work on that today.
>>
>> Don't bother with the latter, it doesn't have a chance of being accepted.
> 
> I tried looking at it anyways, as I need to learn more about this
> stuff.  It didn't seem feasible without changing a lot of optimization
> passes which doesn't seem reasonable.
> 
>> If the terrible hack in outof-ssa really will be necessary (and I really
>> really hope it won't) then I think I prefer the approach you partly tried
>> in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to
>> the promoted subreg and deal with that situation in emit_partition_copy;
>> I'd then hope that the unsignedsrcp parameter could go away (unfortunately
>> the sizeexp will have to stay).
> 
> Yes, I think that is a cleaner way to do it, but I had trouble getting
> that to work as I don't know enough about the code yet.  Doing it
> directly in emit_partition_copy was easier, just to prove it could
> work.  I can go back and try to make this work again.
> 
> Jim
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-16  9:40                     ` Richard Earnshaw
@ 2015-07-16 15:02                       ` Michael Matz
  2015-07-16 15:20                         ` Richard Earnshaw
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Matz @ 2015-07-16 15:02 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: Jim Wilson, Richard Biener, Jeff Law, gcc-patches

Hi,

On Thu, 16 Jul 2015, Richard Earnshaw wrote:

> >>> Now that we do have the problem, we can't fix it without an ARM port 
> >>> ABI change, which is undesirable, so we may have to fix it with a MI 
> >>> change.
> >>
> >> What's the ABI implication of fixing the inconsistency?
> > 
> 
> I think that's the wrong question.  We wouldn't change the ABI to fix an 
> internal problem in GCC.  So the real question is what's the performance 
> impact of changing PROMOTE_MODE to be the same as the ABI requirements?

Perhaps, I really only wanted to get a feeling what type of changes 
in code generation would result with the flip.  I wonder why this ABI 
implication was no problem back when PROMOTE_MODE and 
target.promote_function_mode were seperated and the inconsistency 
introduced.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-07-16 15:02                       ` Michael Matz
@ 2015-07-16 15:20                         ` Richard Earnshaw
  0 siblings, 0 replies; 32+ messages in thread
From: Richard Earnshaw @ 2015-07-16 15:20 UTC (permalink / raw)
  To: Michael Matz; +Cc: Jim Wilson, Richard Biener, Jeff Law, gcc-patches

On 16/07/15 16:00, Michael Matz wrote:
> Hi,
> 
> On Thu, 16 Jul 2015, Richard Earnshaw wrote:
> 
>>>>> Now that we do have the problem, we can't fix it without an ARM port 
>>>>> ABI change, which is undesirable, so we may have to fix it with a MI 
>>>>> change.
>>>>
>>>> What's the ABI implication of fixing the inconsistency?
>>>
>>
>> I think that's the wrong question.  We wouldn't change the ABI to fix an 
>> internal problem in GCC.  So the real question is what's the performance 
>> impact of changing PROMOTE_MODE to be the same as the ABI requirements?
> 
> Perhaps, I really only wanted to get a feeling what type of changes 
> in code generation would result with the flip.  I wonder why this ABI 
> implication was no problem back when PROMOTE_MODE and 
> target.promote_function_mode were seperated and the inconsistency 
> introduced.
> 
> 
> Ciao,
> Michael.
> 

Promote_function_mode requirements were new with the EABI.  However, I
think it's probably only recent changes in the mid-end that have exposed
the problem.

R.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2015-06-30  1:56 [PATCH, ARM] stop changing signedness in PROMOTE_MODE Jim Wilson
  2015-07-02  9:07 ` Richard Earnshaw
  2015-07-07 15:07 ` Jeff Law
@ 2016-02-04  8:58 ` Ramana Radhakrishnan
  2016-02-15 11:32   ` Kyrill Tkachov
  2 siblings, 1 reply; 32+ messages in thread
From: Ramana Radhakrishnan @ 2016-02-04  8:58 UTC (permalink / raw)
  To: Jim Wilson; +Cc: gcc-patches

On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
> This is my suggested fix for PR 65932, which is a linux kernel
> miscompile with gcc-5.1.
>
> The problem here is caused by a chain of events.  The first is that
> the relatively new eipa_sra pass creates fake parameters that behave
> slightly differently than normal parameters.  The second is that the
> optimizer creates phi nodes that copy local variables to fake
> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
> assumes that it can emit simple move instructions for these phi nodes.
> And the fourth is that the ARM port has a PROMOTE_MODE macro that
> forces QImode and HImode to unsigned, but a
> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
> short parameters have different in register representations than local
> variables, and require a conversion when copying between them, a
> conversion that the out-of-ssa pass can't easily emit.
>
> Ultimately, I think this is a problem in the arm backend.  It should
> not have a PROMOTE_MODE macro that is changing the sign of char and
> short local variables.  I also think that we should merge the
> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
> prevent this from happening again.
>
> I see four general problems with the current ARM PROMOTE_MODE definition.
> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
> instruction was added.  It is a lose for armv6 and later.
> 2) Unsigned short was only faster for targets that don't support
> unaligned accesses.  Support for these targets was removed a while
> ago, and this PROMODE_MODE hunk should have been removed at the same
> time.  It was accidentally left behind.
> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
> converted to a function, the PROMOTE_MODE code was copied without the
> UNSIGNEDP changes.  Thus it is only an accident that
> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
> changes to resolve the difference are safe.
> 4) There is a general principle that you should only change signedness
> in PROMOTE_MODE if the hardware forces it, as otherwise this results
> in extra conversion instructions that make code slower.  The mips64
> hardware for instance requires that 32-bit values be sign-extended
> regardless of type, and instructions may trap if this is not true.
> However, it has a set of 32-bit instructions that operate on these
> values, and hence no conversions are required.  There is no similar
> case on ARM. Thus the conversions are unnecessary and unwise.  This
> can be seen in the testcases where gcc emits both a zero-extend and a
> sign-extend inside a loop, as the sign-extend is required for a
> compare, and the zero-extend is required by PROMOTE_MODE.

Given Kyrill's testing with the patch and the reasonably detailed
check of the effects of code generation changes - The arm.h hunk is ok
- I do think we should make this explicit in the documentation that
TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
better still maybe put in a checking assert for the same in the
mid-end but that could be the subject of a follow-up patch.

Ok to apply just the arm.h hunk as I think Kyrill has taken care of
the testsuite fallout separately.


regards
Ramana




>
> My change was tested with an arm bootstrap, make check, and SPEC
> CPU2000 run.  The original poster verified that this gives a linux
> kernel that boots correctly.
>
> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
> are tests to verify that smulbb and/or smlabb are generated.
> Eliminating the unnecessary sign conversions causes us to get better
> code that doesn't include the smulbb and smlabb instructions.  I had
> to modify the testcases to get them to emit the desired instructions.
> With the testcase changes there are no additional testsuite failures,
> though I'm concerned that these testcases with the changes may be
> fragile, and future changes may break them again.



>
> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
> then maybe we should try to add new patterns to get the instructions
> emitted again for the unmodified testcases.
>
> Jim

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-04  8:58 ` Ramana Radhakrishnan
@ 2016-02-15 11:32   ` Kyrill Tkachov
  2016-02-16 10:44     ` Ramana Radhakrishnan
  2016-02-17 10:03     ` Christophe Lyon
  0 siblings, 2 replies; 32+ messages in thread
From: Kyrill Tkachov @ 2016-02-15 11:32 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Jim Wilson; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5632 bytes --]


On 04/02/16 08:58, Ramana Radhakrishnan wrote:
> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>> This is my suggested fix for PR 65932, which is a linux kernel
>> miscompile with gcc-5.1.
>>
>> The problem here is caused by a chain of events.  The first is that
>> the relatively new eipa_sra pass creates fake parameters that behave
>> slightly differently than normal parameters.  The second is that the
>> optimizer creates phi nodes that copy local variables to fake
>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>> assumes that it can emit simple move instructions for these phi nodes.
>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>> forces QImode and HImode to unsigned, but a
>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>> short parameters have different in register representations than local
>> variables, and require a conversion when copying between them, a
>> conversion that the out-of-ssa pass can't easily emit.
>>
>> Ultimately, I think this is a problem in the arm backend.  It should
>> not have a PROMOTE_MODE macro that is changing the sign of char and
>> short local variables.  I also think that we should merge the
>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>> prevent this from happening again.
>>
>> I see four general problems with the current ARM PROMOTE_MODE definition.
>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>> instruction was added.  It is a lose for armv6 and later.
>> 2) Unsigned short was only faster for targets that don't support
>> unaligned accesses.  Support for these targets was removed a while
>> ago, and this PROMODE_MODE hunk should have been removed at the same
>> time.  It was accidentally left behind.
>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>> converted to a function, the PROMOTE_MODE code was copied without the
>> UNSIGNEDP changes.  Thus it is only an accident that
>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>> changes to resolve the difference are safe.
>> 4) There is a general principle that you should only change signedness
>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>> in extra conversion instructions that make code slower.  The mips64
>> hardware for instance requires that 32-bit values be sign-extended
>> regardless of type, and instructions may trap if this is not true.
>> However, it has a set of 32-bit instructions that operate on these
>> values, and hence no conversions are required.  There is no similar
>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>> can be seen in the testcases where gcc emits both a zero-extend and a
>> sign-extend inside a loop, as the sign-extend is required for a
>> compare, and the zero-extend is required by PROMOTE_MODE.
> Given Kyrill's testing with the patch and the reasonably detailed
> check of the effects of code generation changes - The arm.h hunk is ok
> - I do think we should make this explicit in the documentation that
> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
> better still maybe put in a checking assert for the same in the
> mid-end but that could be the subject of a follow-up patch.
>
> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
> the testsuite fallout separately.
Hi all,

I'd like to backport the arm.h from this ( r233130) to the GCC 5
branch. As the CSE patch from my series had some fallout on x86_64
due to a deficiency in the AVX patterns that is too invasive to fix
at this stage (and presumably backport), I'd like to just backport
this arm.h fix and adjust the tests to XFAIL the fallout that comes
with not applying the CSE patch. The attached patch does that.

The code quality fallout on code outside the testsuite is not
that gread. The SPEC benchmarks are not affected by not applying
the CSE change, and only a single sequence in a popular embedded benchmark
shows some degradation for -mtune=cortex-a9 in the same way as the
wmul-1.c and wmul-2.c tests.

I think that's a fair tradeoff for fixing the wrong code bug on that branch.

Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?

Thanks,
Kyrill

2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/65932
     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
     xfail the scan-assembler test.
     * gcc.target/arm/wmul-2.c: Likewise.
     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.



>
> regards
> Ramana
>
>
>
>
>> My change was tested with an arm bootstrap, make check, and SPEC
>> CPU2000 run.  The original poster verified that this gives a linux
>> kernel that boots correctly.
>>
>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>> are tests to verify that smulbb and/or smlabb are generated.
>> Eliminating the unnecessary sign conversions causes us to get better
>> code that doesn't include the smulbb and smlabb instructions.  I had
>> to modify the testcases to get them to emit the desired instructions.
>> With the testcase changes there are no additional testsuite failures,
>> though I'm concerned that these testcases with the changes may be
>> fragile, and future changes may break them again.
>
>
>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>> then maybe we should try to add new patterns to get the instructions
>> emitted again for the unmodified testcases.
>>
>> Jim


[-- Attachment #2: arm-tests-5.patch --]
[-- Type: text/x-patch, Size: 2405 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c
index ddddd509fe645ea98877753773e7bcf9b6787897..ce14769c570537f16f022c8cde35843ab0695f74 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-1.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 int mac(const short *a, const short *b, int sqr, int *sum)
 {
@@ -16,4 +16,4 @@ int mac(const short *a, const short *b, int sqr, int *sum)
   return sqr;
 }
 
-/* { dg-final { scan-assembler-times "smlabb" 2 } } */
+/* { dg-final { scan-assembler-times "smlabb" 2  { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c
index 2ea55f9fbe12f74f38754cb72be791fd6e9495f4..a74d81b195b210650fc1bc1da598ac7c1fe82ac2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-2.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-2.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O1 -fexpensive-optimizations -mtune=cortex-a9" } */
 
 void vec_mpy(int y[], const short x[], short scaler)
 {
@@ -10,4 +10,4 @@ void vec_mpy(int y[], const short x[], short scaler)
    y[i] += ((scaler * x[i]) >> 31);
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 1 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c
index 144b553082e6158701639f05929987de01e7125a..87eba740142a80a1dc1979b4e79d9272a839e7b2 100644
--- a/gcc/testsuite/gcc.target/arm/wmul-3.c
+++ b/gcc/testsuite/gcc.target/arm/wmul-3.c
@@ -1,19 +1,11 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target arm_dsp } */
-/* { dg-options "-O1 -fexpensive-optimizations" } */
+/* { dg-options "-O" } */
 
-int mac(const short *a, const short *b, int sqr, int *sum)
+int
+foo (int a, int b)
 {
-  int i;
-  int dotp = *sum;
-
-  for (i = 0; i < 150; i++) {
-    dotp -= b[i] * a[i];
-    sqr -= b[i] * b[i];
-  }
-
-  *sum = dotp;
-  return sqr;
+  return (short) a * (short) b;
 }
 
-/* { dg-final { scan-assembler-times "smulbb" 2 } } */
+/* { dg-final { scan-assembler-times "smulbb" 1 } } */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-15 11:32   ` Kyrill Tkachov
@ 2016-02-16 10:44     ` Ramana Radhakrishnan
  2016-02-17 10:03     ` Christophe Lyon
  1 sibling, 0 replies; 32+ messages in thread
From: Ramana Radhakrishnan @ 2016-02-16 10:44 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Jim Wilson, gcc-patches

On Mon, Feb 15, 2016 at 11:32 AM, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.

I would hope that someone looks to fix the AVX port for GCC 7  - as
the CSE patch is something that is useful on ARM for performance in
real terms and could have benefits elsewhere.

OK to apply if no regressions - thanks for pushing this through.

Thanks,
Ramana

>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65932
>     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>     xfail the scan-assembler test.
>     * gcc.target/arm/wmul-2.c: Likewise.
>     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>
>
>
>
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>> My change was tested with an arm bootstrap, make check, and SPEC
>>> CPU2000 run.  The original poster verified that this gives a linux
>>> kernel that boots correctly.
>>>
>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>> are tests to verify that smulbb and/or smlabb are generated.
>>> Eliminating the unnecessary sign conversions causes us to get better
>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>> to modify the testcases to get them to emit the desired instructions.
>>> With the testcase changes there are no additional testsuite failures,
>>> though I'm concerned that these testcases with the changes may be
>>> fragile, and future changes may break them again.
>>
>>
>>
>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>> then maybe we should try to add new patterns to get the instructions
>>> emitted again for the unmodified testcases.
>>>
>>> Jim
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-15 11:32   ` Kyrill Tkachov
  2016-02-16 10:44     ` Ramana Radhakrishnan
@ 2016-02-17 10:03     ` Christophe Lyon
  2016-02-17 10:05       ` Kyrill Tkachov
  1 sibling, 1 reply; 32+ messages in thread
From: Christophe Lyon @ 2016-02-17 10:03 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Ramana Radhakrishnan, Jim Wilson, gcc-patches

On 15 February 2016 at 12:32, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as the
> wmul-1.c and wmul-2.c tests.
>
> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>
> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>
> Thanks,
> Kyrill
>
> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     PR target/65932
>     * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>     xfail the scan-assembler test.
>     * gcc.target/arm/wmul-2.c: Likewise.
>     * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>
>
Hi Kyrill,

I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
configuration to:
--with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
(target arm-none-linux-gnueabihf)

The generated code is:
        sxth    r0, r0
        sxth    r1, r1
        mul     r0, r1, r0
instead of
        smulbb  r0, r1, r0
on trunk.

I guess we don't worry?

>
>
>>
>> regards
>> Ramana
>>
>>
>>
>>
>>> My change was tested with an arm bootstrap, make check, and SPEC
>>> CPU2000 run.  The original poster verified that this gives a linux
>>> kernel that boots correctly.
>>>
>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>> are tests to verify that smulbb and/or smlabb are generated.
>>> Eliminating the unnecessary sign conversions causes us to get better
>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>> to modify the testcases to get them to emit the desired instructions.
>>> With the testcase changes there are no additional testsuite failures,
>>> though I'm concerned that these testcases with the changes may be
>>> fragile, and future changes may break them again.
>>
>>
>>
>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>> then maybe we should try to add new patterns to get the instructions
>>> emitted again for the unmodified testcases.
>>>
>>> Jim
>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-17 10:03     ` Christophe Lyon
@ 2016-02-17 10:05       ` Kyrill Tkachov
  2016-02-17 10:20         ` Christophe Lyon
  0 siblings, 1 reply; 32+ messages in thread
From: Kyrill Tkachov @ 2016-02-17 10:05 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Ramana Radhakrishnan, Jim Wilson, gcc-patches


On 17/02/16 10:03, Christophe Lyon wrote:
> On 15 February 2016 at 12:32, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>> miscompile with gcc-5.1.
>>>>
>>>> The problem here is caused by a chain of events.  The first is that
>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>> slightly differently than normal parameters.  The second is that the
>>>> optimizer creates phi nodes that copy local variables to fake
>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>> forces QImode and HImode to unsigned, but a
>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>> short parameters have different in register representations than local
>>>> variables, and require a conversion when copying between them, a
>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>
>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>> short local variables.  I also think that we should merge the
>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>> prevent this from happening again.
>>>>
>>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>> instruction was added.  It is a lose for armv6 and later.
>>>> 2) Unsigned short was only faster for targets that don't support
>>>> unaligned accesses.  Support for these targets was removed a while
>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>> time.  It was accidentally left behind.
>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>> changes to resolve the difference are safe.
>>>> 4) There is a general principle that you should only change signedness
>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>> in extra conversion instructions that make code slower.  The mips64
>>>> hardware for instance requires that 32-bit values be sign-extended
>>>> regardless of type, and instructions may trap if this is not true.
>>>> However, it has a set of 32-bit instructions that operate on these
>>>> values, and hence no conversions are required.  There is no similar
>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>> Given Kyrill's testing with the patch and the reasonably detailed
>>> check of the effects of code generation changes - The arm.h hunk is ok
>>> - I do think we should make this explicit in the documentation that
>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>> better still maybe put in a checking assert for the same in the
>>> mid-end but that could be the subject of a follow-up patch.
>>>
>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>> the testsuite fallout separately.
>> Hi all,
>>
>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>> branch. As the CSE patch from my series had some fallout on x86_64
>> due to a deficiency in the AVX patterns that is too invasive to fix
>> at this stage (and presumably backport), I'd like to just backport
>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>> with not applying the CSE patch. The attached patch does that.
>>
>> The code quality fallout on code outside the testsuite is not
>> that gread. The SPEC benchmarks are not affected by not applying
>> the CSE change, and only a single sequence in a popular embedded benchmark
>> shows some degradation for -mtune=cortex-a9 in the same way as the
>> wmul-1.c and wmul-2.c tests.
>>
>> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>>
>> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>>
>> Thanks,
>> Kyrill
>>
>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR target/65932
>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>      xfail the scan-assembler test.
>>      * gcc.target/arm/wmul-2.c: Likewise.
>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>>
>>
> Hi Kyrill,
>
> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
> configuration to:
> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
> (target arm-none-linux-gnueabihf)
>
> The generated code is:
>          sxth    r0, r0
>          sxth    r1, r1
>          mul     r0, r1, r0
> instead of
>          smulbb  r0, r1, r0
> on trunk.
>
> I guess we don't worry?

Hi Christophe,
Hmmm, I suspect we might want to backport https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
to fix backend the costing logic of smulbb.
Could you please try that patch to see if it helps?

Thanks,
Kyrill

>>
>>> regards
>>> Ramana
>>>
>>>
>>>
>>>
>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>> kernel that boots correctly.
>>>>
>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>> to modify the testcases to get them to emit the desired instructions.
>>>> With the testcase changes there are no additional testsuite failures,
>>>> though I'm concerned that these testcases with the changes may be
>>>> fragile, and future changes may break them again.
>>>
>>>
>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>> then maybe we should try to add new patterns to get the instructions
>>>> emitted again for the unmodified testcases.
>>>>
>>>> Jim
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-17 10:05       ` Kyrill Tkachov
@ 2016-02-17 10:20         ` Christophe Lyon
  2016-02-17 10:22           ` Kyrill Tkachov
  2016-03-07  4:43           ` Ramana Radhakrishnan
  0 siblings, 2 replies; 32+ messages in thread
From: Christophe Lyon @ 2016-02-17 10:20 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Ramana Radhakrishnan, Jim Wilson, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7082 bytes --]

On 17 February 2016 at 11:05, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 17/02/16 10:03, Christophe Lyon wrote:
>>
>> On 15 February 2016 at 12:32, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>
>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>> wrote:
>>>>>
>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>> miscompile with gcc-5.1.
>>>>>
>>>>> The problem here is caused by a chain of events.  The first is that
>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>> slightly differently than normal parameters.  The second is that the
>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>> forces QImode and HImode to unsigned, but a
>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>>> short parameters have different in register representations than local
>>>>> variables, and require a conversion when copying between them, a
>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>
>>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>> short local variables.  I also think that we should merge the
>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>> prevent this from happening again.
>>>>>
>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>> definition.
>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>>> instruction was added.  It is a lose for armv6 and later.
>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>> unaligned accesses.  Support for these targets was removed a while
>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>> time.  It was accidentally left behind.
>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>> changes to resolve the difference are safe.
>>>>> 4) There is a general principle that you should only change signedness
>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>> in extra conversion instructions that make code slower.  The mips64
>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>> regardless of type, and instructions may trap if this is not true.
>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>> values, and hence no conversions are required.  There is no similar
>>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>
>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>> - I do think we should make this explicit in the documentation that
>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>> better still maybe put in a checking assert for the same in the
>>>> mid-end but that could be the subject of a follow-up patch.
>>>>
>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>> the testsuite fallout separately.
>>>
>>> Hi all,
>>>
>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>> branch. As the CSE patch from my series had some fallout on x86_64
>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>> at this stage (and presumably backport), I'd like to just backport
>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>> with not applying the CSE patch. The attached patch does that.
>>>
>>> The code quality fallout on code outside the testsuite is not
>>> that gread. The SPEC benchmarks are not affected by not applying
>>> the CSE change, and only a single sequence in a popular embedded
>>> benchmark
>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>> wmul-1.c and wmul-2.c tests.
>>>
>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>> branch.
>>>
>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>> branch?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>      PR target/65932
>>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>      xfail the scan-assembler test.
>>>      * gcc.target/arm/wmul-2.c: Likewise.
>>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>> smulbb.
>>>
>>>
>> Hi Kyrill,
>>
>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
>> configuration to:
>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>> (target arm-none-linux-gnueabihf)
>>
>> The generated code is:
>>          sxth    r0, r0
>>          sxth    r1, r1
>>          mul     r0, r1, r0
>> instead of
>>          smulbb  r0, r1, r0
>> on trunk.
>>
>> I guess we don't worry?
>
>
> Hi Christophe,
> Hmmm, I suspect we might want to backport
> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
> to fix backend the costing logic of smulbb.
> Could you please try that patch to see if it helps?
>

Ha indeed, with the attached patch, we now generate smulbb.
I didn't run a full make check though.

OK with a suitable ChangeLog entry?

Christophe.

> Thanks,
> Kyrill
>
>
>>>
>>>> regards
>>>> Ramana
>>>>
>>>>
>>>>
>>>>
>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>>> kernel that boots correctly.
>>>>>
>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>> With the testcase changes there are no additional testsuite failures,
>>>>> though I'm concerned that these testcases with the changes may be
>>>>> fragile, and future changes may break them again.
>>>>
>>>>
>>>>
>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>> then maybe we should try to add new patterns to get the instructions
>>>>> emitted again for the unmodified testcases.
>>>>>
>>>>> Jim
>>>
>>>
>

[-- Attachment #2: smulbb.patch --]
[-- Type: text/x-diff, Size: 626 bytes --]

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 233484)
+++ gcc/config/arm/arm.c	(working copy)
@@ -10306,8 +10306,10 @@
 	      /* SMUL[TB][TB].  */
 	      if (speed_p)
 		*cost += extra_cost->mult[0].extend;
-	      *cost += (rtx_cost (XEXP (x, 0), SIGN_EXTEND, 0, speed_p)
-			+ rtx_cost (XEXP (x, 1), SIGN_EXTEND, 0, speed_p));
+	      *cost += rtx_cost (XEXP (XEXP (x, 0), 0),
+				 SIGN_EXTEND, 0, speed_p);
+	      *cost += rtx_cost (XEXP (XEXP (x, 1), 0),
+				 SIGN_EXTEND, 1, speed_p);
 	      return true;
 	    }
 	  if (speed_p)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-17 10:20         ` Christophe Lyon
@ 2016-02-17 10:22           ` Kyrill Tkachov
  2016-02-18 10:16             ` Christophe Lyon
  2016-03-07  4:43           ` Ramana Radhakrishnan
  1 sibling, 1 reply; 32+ messages in thread
From: Kyrill Tkachov @ 2016-02-17 10:22 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Ramana Radhakrishnan, Jim Wilson, gcc-patches


On 17/02/16 10:20, Christophe Lyon wrote:
> On 17 February 2016 at 11:05, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> On 17/02/16 10:03, Christophe Lyon wrote:
>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>>> wrote:
>>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>>> miscompile with gcc-5.1.
>>>>>>
>>>>>> The problem here is caused by a chain of events.  The first is that
>>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>>> slightly differently than normal parameters.  The second is that the
>>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>>> forces QImode and HImode to unsigned, but a
>>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>>>> short parameters have different in register representations than local
>>>>>> variables, and require a conversion when copying between them, a
>>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>>
>>>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>>> short local variables.  I also think that we should merge the
>>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>>> prevent this from happening again.
>>>>>>
>>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>>> definition.
>>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>>>> instruction was added.  It is a lose for armv6 and later.
>>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>>> unaligned accesses.  Support for these targets was removed a while
>>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>>> time.  It was accidentally left behind.
>>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>>> changes to resolve the difference are safe.
>>>>>> 4) There is a general principle that you should only change signedness
>>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>>> in extra conversion instructions that make code slower.  The mips64
>>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>>> regardless of type, and instructions may trap if this is not true.
>>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>>> values, and hence no conversions are required.  There is no similar
>>>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>>> - I do think we should make this explicit in the documentation that
>>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>>> better still maybe put in a checking assert for the same in the
>>>>> mid-end but that could be the subject of a follow-up patch.
>>>>>
>>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>>> the testsuite fallout separately.
>>>> Hi all,
>>>>
>>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>>> branch. As the CSE patch from my series had some fallout on x86_64
>>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>>> at this stage (and presumably backport), I'd like to just backport
>>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>>> with not applying the CSE patch. The attached patch does that.
>>>>
>>>> The code quality fallout on code outside the testsuite is not
>>>> that gread. The SPEC benchmarks are not affected by not applying
>>>> the CSE change, and only a single sequence in a popular embedded
>>>> benchmark
>>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>>> wmul-1.c and wmul-2.c tests.
>>>>
>>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>>> branch.
>>>>
>>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>>> branch?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>       PR target/65932
>>>>       * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>>       xfail the scan-assembler test.
>>>>       * gcc.target/arm/wmul-2.c: Likewise.
>>>>       * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>>> smulbb.
>>>>
>>>>
>>> Hi Kyrill,
>>>
>>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
>>> configuration to:
>>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>>> (target arm-none-linux-gnueabihf)
>>>
>>> The generated code is:
>>>           sxth    r0, r0
>>>           sxth    r1, r1
>>>           mul     r0, r1, r0
>>> instead of
>>>           smulbb  r0, r1, r0
>>> on trunk.
>>>
>>> I guess we don't worry?
>>
>> Hi Christophe,
>> Hmmm, I suspect we might want to backport
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
>> to fix backend the costing logic of smulbb.
>> Could you please try that patch to see if it helps?
>>
> Ha indeed, with the attached patch, we now generate smulbb.
> I didn't run a full make check though.

Thanks for checking.

> OK with a suitable ChangeLog entry?

Can you please do a full test run when you get the chance?
Since the patch is mine we'll need an ok from another arm maintainer ;)

Thanks,
Kyrill

>
> Christophe.
>
>> Thanks,
>> Kyrill
>>
>>
>>>>> regards
>>>>> Ramana
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>>>> kernel that boots correctly.
>>>>>>
>>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>>> With the testcase changes there are no additional testsuite failures,
>>>>>> though I'm concerned that these testcases with the changes may be
>>>>>> fragile, and future changes may break them again.
>>>>>
>>>>>
>>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>>> then maybe we should try to add new patterns to get the instructions
>>>>>> emitted again for the unmodified testcases.
>>>>>>
>>>>>> Jim
>>>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-17 10:22           ` Kyrill Tkachov
@ 2016-02-18 10:16             ` Christophe Lyon
  0 siblings, 0 replies; 32+ messages in thread
From: Christophe Lyon @ 2016-02-18 10:16 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: Ramana Radhakrishnan, Jim Wilson, gcc-patches

On 17 February 2016 at 11:22, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> On 17/02/16 10:20, Christophe Lyon wrote:
>>
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:
>>>>
>>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>>
>>>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>>>> miscompile with gcc-5.1.
>>>>>>>
>>>>>>> The problem here is caused by a chain of events.  The first is that
>>>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>>>> slightly differently than normal parameters.  The second is that the
>>>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>>>>> assumes that it can emit simple move instructions for these phi
>>>>>>> nodes.
>>>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>>>> forces QImode and HImode to unsigned, but a
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>>>>> short parameters have different in register representations than
>>>>>>> local
>>>>>>> variables, and require a conversion when copying between them, a
>>>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>>>
>>>>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>>>> short local variables.  I also think that we should merge the
>>>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>>>> prevent this from happening again.
>>>>>>>
>>>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>>>> definition.
>>>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the
>>>>>>> sxtb
>>>>>>> instruction was added.  It is a lose for armv6 and later.
>>>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>>>> unaligned accesses.  Support for these targets was removed a while
>>>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>>>> time.  It was accidentally left behind.
>>>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it
>>>>>>> was
>>>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>>>> changes to resolve the difference are safe.
>>>>>>> 4) There is a general principle that you should only change
>>>>>>> signedness
>>>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>>>> in extra conversion instructions that make code slower.  The mips64
>>>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>>>> regardless of type, and instructions may trap if this is not true.
>>>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>>>> values, and hence no conversions are required.  There is no similar
>>>>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>>>
>>>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>>>> - I do think we should make this explicit in the documentation that
>>>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>>>> better still maybe put in a checking assert for the same in the
>>>>>> mid-end but that could be the subject of a follow-up patch.
>>>>>>
>>>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>>>> the testsuite fallout separately.
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>>>> branch. As the CSE patch from my series had some fallout on x86_64
>>>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>>>> at this stage (and presumably backport), I'd like to just backport
>>>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>>>> with not applying the CSE patch. The attached patch does that.
>>>>>
>>>>> The code quality fallout on code outside the testsuite is not
>>>>> that gread. The SPEC benchmarks are not affected by not applying
>>>>> the CSE change, and only a single sequence in a popular embedded
>>>>> benchmark
>>>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>>>> wmul-1.c and wmul-2.c tests.
>>>>>
>>>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>>>> branch.
>>>>>
>>>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>>>> branch?
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>       PR target/65932
>>>>>       * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>>>       xfail the scan-assembler test.
>>>>>       * gcc.target/arm/wmul-2.c: Likewise.
>>>>>       * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>>>> smulbb.
>>>>>
>>>>>
>>>> Hi Kyrill,
>>>>
>>>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing
>>>> GCC
>>>> configuration to:
>>>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>>>> (target arm-none-linux-gnueabihf)
>>>>
>>>> The generated code is:
>>>>           sxth    r0, r0
>>>>           sxth    r1, r1
>>>>           mul     r0, r1, r0
>>>> instead of
>>>>           smulbb  r0, r1, r0
>>>> on trunk.
>>>>
>>>> I guess we don't worry?
>>>
>>>
>>> Hi Christophe,
>>> Hmmm, I suspect we might want to backport
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
>>> to fix backend the costing logic of smulbb.
>>> Could you please try that patch to see if it helps?
>>>
>> Ha indeed, with the attached patch, we now generate smulbb.
>> I didn't run a full make check though.
>
>
> Thanks for checking.
>
>> OK with a suitable ChangeLog entry?
>
>
> Can you please do a full test run when you get the chance?

Done, I noticed no other difference.

> Since the patch is mine we'll need an ok from another arm maintainer ;)
OK :)

> Thanks,
> Kyrill
>
>
>>
>> Christophe.
>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>>>>> kernel that boots correctly.
>>>>>>>
>>>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>>>> With the testcase changes there are no additional testsuite failures,
>>>>>>> though I'm concerned that these testcases with the changes may be
>>>>>>> fragile, and future changes may break them again.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>>>> then maybe we should try to add new patterns to get the instructions
>>>>>>> emitted again for the unmodified testcases.
>>>>>>>
>>>>>>> Jim
>>>>>
>>>>>
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-02-17 10:20         ` Christophe Lyon
  2016-02-17 10:22           ` Kyrill Tkachov
@ 2016-03-07  4:43           ` Ramana Radhakrishnan
  2016-03-07 12:55             ` Christophe Lyon
  1 sibling, 1 reply; 32+ messages in thread
From: Ramana Radhakrishnan @ 2016-03-07  4:43 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Kyrill Tkachov, Jim Wilson, gcc-patches

On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> On 17 February 2016 at 11:05, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>>
>> On 17/02/16 10:03, Christophe Lyon wrote:
>>>
>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>
>>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>>
>>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>>> wrote:
>>>>>>
>>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>>> miscompile with gcc-5.1.
>>>>>>
>>>>>> The problem here is caused by a chain of events.  The first is that
>>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>>> slightly differently than normal parameters.  The second is that the
>>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>>> forces QImode and HImode to unsigned, but a
>>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>>>> short parameters have different in register representations than local
>>>>>> variables, and require a conversion when copying between them, a
>>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>>
>>>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>>> short local variables.  I also think that we should merge the
>>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>>> prevent this from happening again.
>>>>>>
>>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>>> definition.
>>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>>>> instruction was added.  It is a lose for armv6 and later.
>>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>>> unaligned accesses.  Support for these targets was removed a while
>>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>>> time.  It was accidentally left behind.
>>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>>> changes to resolve the difference are safe.
>>>>>> 4) There is a general principle that you should only change signedness
>>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>>> in extra conversion instructions that make code slower.  The mips64
>>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>>> regardless of type, and instructions may trap if this is not true.
>>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>>> values, and hence no conversions are required.  There is no similar
>>>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>>
>>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>>> - I do think we should make this explicit in the documentation that
>>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>>> better still maybe put in a checking assert for the same in the
>>>>> mid-end but that could be the subject of a follow-up patch.
>>>>>
>>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>>> the testsuite fallout separately.
>>>>
>>>> Hi all,
>>>>
>>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>>> branch. As the CSE patch from my series had some fallout on x86_64
>>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>>> at this stage (and presumably backport), I'd like to just backport
>>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>>> with not applying the CSE patch. The attached patch does that.
>>>>
>>>> The code quality fallout on code outside the testsuite is not
>>>> that gread. The SPEC benchmarks are not affected by not applying
>>>> the CSE change, and only a single sequence in a popular embedded
>>>> benchmark
>>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>>> wmul-1.c and wmul-2.c tests.
>>>>
>>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>>> branch.
>>>>
>>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>>> branch?
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>
>>>>      PR target/65932
>>>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>>      xfail the scan-assembler test.
>>>>      * gcc.target/arm/wmul-2.c: Likewise.
>>>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>>> smulbb.
>>>>
>>>>
>>> Hi Kyrill,
>>>
>>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
>>> configuration to:
>>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>>> (target arm-none-linux-gnueabihf)
>>>
>>> The generated code is:
>>>          sxth    r0, r0
>>>          sxth    r1, r1
>>>          mul     r0, r1, r0
>>> instead of
>>>          smulbb  r0, r1, r0
>>> on trunk.
>>>
>>> I guess we don't worry?
>>
>>
>> Hi Christophe,
>> Hmmm, I suspect we might want to backport
>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
>> to fix backend the costing logic of smulbb.
>> Could you please try that patch to see if it helps?
>>
>
> Ha indeed, with the attached patch, we now generate smulbb.
> I didn't run a full make check though.
>
> OK with a suitable ChangeLog entry?

Ok with a suitable Changelog entry and if no regressions.

Ramana

>
> Christophe.
>
>> Thanks,
>> Kyrill
>>
>>
>>>>
>>>>> regards
>>>>> Ramana
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>>>> kernel that boots correctly.
>>>>>>
>>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>>> With the testcase changes there are no additional testsuite failures,
>>>>>> though I'm concerned that these testcases with the changes may be
>>>>>> fragile, and future changes may break them again.
>>>>>
>>>>>
>>>>>
>>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>>> then maybe we should try to add new patterns to get the instructions
>>>>>> emitted again for the unmodified testcases.
>>>>>>
>>>>>> Jim
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
  2016-03-07  4:43           ` Ramana Radhakrishnan
@ 2016-03-07 12:55             ` Christophe Lyon
  0 siblings, 0 replies; 32+ messages in thread
From: Christophe Lyon @ 2016-03-07 12:55 UTC (permalink / raw)
  To: Ramana Radhakrishnan; +Cc: Kyrill Tkachov, Jim Wilson, gcc-patches

On 7 March 2016 at 05:43, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:
>>>>
>>>> On 15 February 2016 at 12:32, Kyrill Tkachov
>>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>>
>>>>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>>>>>
>>>>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>>>>> miscompile with gcc-5.1.
>>>>>>>
>>>>>>> The problem here is caused by a chain of events.  The first is that
>>>>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>>>>> slightly differently than normal parameters.  The second is that the
>>>>>>> optimizer creates phi nodes that copy local variables to fake
>>>>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>>>>> forces QImode and HImode to unsigned, but a
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>>>>> short parameters have different in register representations than local
>>>>>>> variables, and require a conversion when copying between them, a
>>>>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>>>>
>>>>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>>>>> short local variables.  I also think that we should merge the
>>>>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>>>>> prevent this from happening again.
>>>>>>>
>>>>>>> I see four general problems with the current ARM PROMOTE_MODE
>>>>>>> definition.
>>>>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>>>>> instruction was added.  It is a lose for armv6 and later.
>>>>>>> 2) Unsigned short was only faster for targets that don't support
>>>>>>> unaligned accesses.  Support for these targets was removed a while
>>>>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>>>>> time.  It was accidentally left behind.
>>>>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>>>>> changes to resolve the difference are safe.
>>>>>>> 4) There is a general principle that you should only change signedness
>>>>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>>>>> in extra conversion instructions that make code slower.  The mips64
>>>>>>> hardware for instance requires that 32-bit values be sign-extended
>>>>>>> regardless of type, and instructions may trap if this is not true.
>>>>>>> However, it has a set of 32-bit instructions that operate on these
>>>>>>> values, and hence no conversions are required.  There is no similar
>>>>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>>>>>
>>>>>> Given Kyrill's testing with the patch and the reasonably detailed
>>>>>> check of the effects of code generation changes - The arm.h hunk is ok
>>>>>> - I do think we should make this explicit in the documentation that
>>>>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>>>>> better still maybe put in a checking assert for the same in the
>>>>>> mid-end but that could be the subject of a follow-up patch.
>>>>>>
>>>>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>>>>> the testsuite fallout separately.
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>>>>> branch. As the CSE patch from my series had some fallout on x86_64
>>>>> due to a deficiency in the AVX patterns that is too invasive to fix
>>>>> at this stage (and presumably backport), I'd like to just backport
>>>>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>>>>> with not applying the CSE patch. The attached patch does that.
>>>>>
>>>>> The code quality fallout on code outside the testsuite is not
>>>>> that gread. The SPEC benchmarks are not affected by not applying
>>>>> the CSE change, and only a single sequence in a popular embedded
>>>>> benchmark
>>>>> shows some degradation for -mtune=cortex-a9 in the same way as the
>>>>> wmul-1.c and wmul-2.c tests.
>>>>>
>>>>> I think that's a fair tradeoff for fixing the wrong code bug on that
>>>>> branch.
>>>>>
>>>>> Ok to backport r233130 and the attached testsuite patch to the GCC 5
>>>>> branch?
>>>>>
>>>>> Thanks,
>>>>> Kyrill
>>>>>
>>>>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>>>
>>>>>      PR target/65932
>>>>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>>>>      xfail the scan-assembler test.
>>>>>      * gcc.target/arm/wmul-2.c: Likewise.
>>>>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single
>>>>> smulbb.
>>>>>
>>>>>
>>>> Hi Kyrill,
>>>>
>>>> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
>>>> configuration to:
>>>> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
>>>> (target arm-none-linux-gnueabihf)
>>>>
>>>> The generated code is:
>>>>          sxth    r0, r0
>>>>          sxth    r1, r1
>>>>          mul     r0, r1, r0
>>>> instead of
>>>>          smulbb  r0, r1, r0
>>>> on trunk.
>>>>
>>>> I guess we don't worry?
>>>
>>>
>>> Hi Christophe,
>>> Hmmm, I suspect we might want to backport
>>> https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
>>> to fix backend the costing logic of smulbb.
>>> Could you please try that patch to see if it helps?
>>>
>>
>> Ha indeed, with the attached patch, we now generate smulbb.
>> I didn't run a full make check though.
>>
>> OK with a suitable ChangeLog entry?
>
> Ok with a suitable Changelog entry and if no regressions.
>
Thanks, committed as r2340169, with no regression:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/gcc-5-branch/234019/report-build-info.html

> Ramana
>
>>
>> Christophe.
>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>>>
>>>>>> regards
>>>>>> Ramana
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>>>>> kernel that boots correctly.
>>>>>>>
>>>>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>>>>> to modify the testcases to get them to emit the desired instructions.
>>>>>>> With the testcase changes there are no additional testsuite failures,
>>>>>>> though I'm concerned that these testcases with the changes may be
>>>>>>> fragile, and future changes may break them again.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>>>>> then maybe we should try to add new patterns to get the instructions
>>>>>>> emitted again for the unmodified testcases.
>>>>>>>
>>>>>>> Jim
>>>>>
>>>>>
>>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-03-07 12:55 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-30  1:56 [PATCH, ARM] stop changing signedness in PROMOTE_MODE Jim Wilson
2015-07-02  9:07 ` Richard Earnshaw
2015-07-07 18:25   ` Jim Wilson
2015-07-07 15:07 ` Jeff Law
2015-07-07 16:29   ` Jim Wilson
2015-07-07 21:35     ` Richard Biener
2015-07-10 15:46       ` Jim Wilson
2015-07-13  8:19         ` Richard Biener
2015-07-13 15:29           ` Michael Matz
2015-07-13 15:35             ` H.J. Lu
2015-07-14 16:38             ` Richard Earnshaw
2015-07-14 16:49               ` Richard Biener
2015-07-14 17:07               ` Jim Wilson
2015-07-14 17:23                 ` Richard Biener
2015-07-15 13:25                 ` Michael Matz
2015-07-15 16:01                   ` Jim Wilson
2015-07-16  9:40                     ` Richard Earnshaw
2015-07-16 15:02                       ` Michael Matz
2015-07-16 15:20                         ` Richard Earnshaw
2015-07-15 13:04               ` Michael Matz
2015-07-08 22:54     ` Jeff Law
2015-07-10 15:35       ` Jim Wilson
2016-02-04  8:58 ` Ramana Radhakrishnan
2016-02-15 11:32   ` Kyrill Tkachov
2016-02-16 10:44     ` Ramana Radhakrishnan
2016-02-17 10:03     ` Christophe Lyon
2016-02-17 10:05       ` Kyrill Tkachov
2016-02-17 10:20         ` Christophe Lyon
2016-02-17 10:22           ` Kyrill Tkachov
2016-02-18 10:16             ` Christophe Lyon
2016-03-07  4:43           ` Ramana Radhakrishnan
2016-03-07 12:55             ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).