public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][AARCH64] Fix for PR86901
@ 2020-02-05  0:01 Modi Mo via gcc-patches
  2020-02-05 15:00 ` Wilco Dijkstra
  0 siblings, 1 reply; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-05  0:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: Wilco.Dijkstra, pinskia

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

Hi,

Adding support for extv<mode> and extzv<mode> on aarch64 as described in PR86901. I also changed extract_bit_field_using_extv to use gen_lowpart_if_possible instead of gen_lowpart directly. Using gen_lowpart directly will fail with an ICE in building libgcc when the compiler fails to successfully do so whereas gen_lowpart_if_possible will bail out of matching this pattern gracefully.

I'm looking through past mails and https://gcc.gnu.org/contribute.html which details testing bootstrap. I'm building a cross-compiler (x64_aarch64) and the instructions don't address that scenario. The GCC cross-build is green and there's no regressions on the C/C++ tests (The go/fortran etc. look like they need additional infrastructure built on my side to work). Is there a workflow for cross-builds or should I aim to get an aarch64 native machine for full validation?


ChangeLog:
2020-02-03  Di Mo  <modimo@microsoft.com>

gcc/
	* config/aarch64/aarch64.md: Add define_expand for extv<mode> and extzv<mode>.
	* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
	* gcc.target/aarch64/pr86901.c: Add new test.


Best,
Modi

[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2828 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a1894063a1e..6bc73b5f8e2a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5491,6 +5491,40 @@
   [(set_attr "type" "bfm")]
 )
 
+;; Bitfield extract and zero extend (extzv)
+(define_expand "extzv<mode>"
+  [(set (match_operand:GPI 0 "register_operand")
+  (zero_extract:GPI (match_operand:GPI 1 "register_operand")
+                         (match_operand 2 "const_int_operand")
+                         (match_operand 3 "const_int_operand")))]
+  ""
+{
+  unsigned HOST_WIDE_INT width = UINTVAL (operands[2]);
+  unsigned HOST_WIDE_INT pos = UINTVAL (operands[3]);
+  rtx value = operands[1];
+
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  operands[1] = force_reg (<MODE>mode, value);
+})
+
+;; Bitfield extract and sign extend (extv)
+(define_expand "extv<mode>"
+  [(set (match_operand:GPI 0 "register_operand")
+  (sign_extract:GPI (match_operand:GPI 1 "register_operand")
+                         (match_operand 2 "const_int_operand")
+                         (match_operand 3 "const_int_operand")))]
+  ""
+{
+  unsigned HOST_WIDE_INT width = UINTVAL (operands[2]);
+  unsigned HOST_WIDE_INT pos = UINTVAL (operands[3]);
+  rtx value = operands[1];
+
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;
+  operands[1] = force_reg (<MODE>mode, value);
+})
+
 ;;  Match a bfi instruction where the shift of OP3 means that we are
 ;;  actually copying the least significant bits of OP3 into OP0 by way
 ;;  of the AND masks and the IOR instruction.  A similar instruction
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
       if (REG_P (target)
 	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
 	{
-	  target = gen_lowpart (ext_mode, target);
+	  target = gen_lowpart_if_possible (ext_mode, target);
 	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
 	    spec_target_subreg = target;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+  union
+  {
+    float f;
+    uint32_t i;
+  } u = {x};
+  return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+  if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+    *p = y * y;
+  else
+    g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][AARCH64] Fix for PR86901
  2020-02-05  0:01 [PATCH][AARCH64] Fix for PR86901 Modi Mo via gcc-patches
@ 2020-02-05 15:00 ` Wilco Dijkstra
  2020-02-07  2:12   ` Modi Mo via gcc-patches
  0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-02-05 15:00 UTC (permalink / raw)
  To: Modi Mo, gcc-patches; +Cc: pinskia

Hi Modi,

Thanks for your patch! 

> Adding support for extv<mode> and extzv<mode> on aarch64 as described in PR86901. I also changed
> extract_bit_field_using_extv to use gen_lowpart_if_possible instead of gen_lowpart directly. Using
> gen_lowpart directly will fail with an ICE in building libgcc when the compiler fails to successfully do so 
> whereas gen_lowpart_if_possible will bail out of matching this pattern gracefully.

I did a quick bootstrap, this shows several failures like:

gcc/builtins.c:9427:1: error: unrecognizable insn:
 9427 | }
      | ^
(insn 212 211 213 24 (set (reg:SI 207)
        (zero_extract:SI (reg:SI 206)
            (const_int 26 [0x1a])
            (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
     (nil))

The issue here is that 26+6 = 32 and that's not a valid ubfx encoding. Currently
cases like this are split into a right shift in aarch64.md around line 5569:

;; When the bit position and width add up to 32 we can use a W-reg LSR
;; instruction taking advantage of the implicit zero-extension of the X-reg.
(define_split
  [(set (match_operand:DI 0 "register_operand")
        (zero_extract:DI (match_operand:DI 1 "register_operand")
                         (match_operand 2
                           "aarch64_simd_shift_imm_offset_di")
                         (match_operand 3
                           "aarch64_simd_shift_imm_di")))]
  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
             GET_MODE_BITSIZE (DImode) - 1)
   && (INTVAL (operands[2]) + INTVAL (operands[3]))
       == GET_MODE_BITSIZE (SImode)"
  [(set (match_dup 0)
        (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
  {
    operands[4] = gen_lowpart (SImode, operands[1]);
  }

However that only supports DImode, not SImode, so it needs to be changed to
be more general using GPI.

Your new extv patterns should replace the magic patterns above it:

;; -------------------------------------------------------------------
;; Bitfields
;; -------------------------------------------------------------------

(define_expand "<optab>"

These are the current extv/extzv patterns, but without a mode. They are no longer
used when we start using the new ones.

Note you can write <optab><mode> to combine the extzv and extz patterns.
But please add a comment mentioning the pattern names so they are easy to find!

Besides a bootstrap it is always useful to compile a large body of code with your change
(say SPEC2006/2017) and check for differences in at least codesize. If there is an increase
in instruction count then there may be more issues that need to be resolved.

> I'm looking through past mails and https://gcc.gnu.org/contribute.html which details testing bootstrap.
> I'm building a cross-compiler (x64_aarch64) and the instructions don't address that scenario. The GCC 
> cross-build is green and there's no regressions on the C/C++ tests (The go/fortran etc. look like they 
> need additional infrastructure built on my side to work). Is there a workflow for cross-builds or should I 
> aim to get an aarch64 native machine for full validation?

I find it easiest to develop on a many-core AArch64 server so you get much faster builds,
bootstraps and regression tests. Cross compilers are mostly useful if you want to test big-endian
or new architecture features which are not yet supported in hardware. You don't normally need
to test Go/Fortran/ADA etc unless your patch does something that would directly affect them.

Finally do you have a copyright assignment with the FSF?

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH][AARCH64] Fix for PR86901
  2020-02-05 15:00 ` Wilco Dijkstra
@ 2020-02-07  2:12   ` Modi Mo via gcc-patches
  2020-02-07  9:52     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-07  2:12 UTC (permalink / raw)
  To: Wilco Dijkstra, gcc-patches; +Cc: pinskia

[-- Attachment #1: Type: text/plain, Size: 5737 bytes --]

> I did a quick bootstrap, this shows several failures like:
> 
> gcc/builtins.c:9427:1: error: unrecognizable insn:
>  9427 | }
>       | ^
> (insn 212 211 213 24 (set (reg:SI 207)
>         (zero_extract:SI (reg:SI 206)
>             (const_int 26 [0x1a])
>             (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>      (nil))
> 
> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
> Currently cases like this are split into a right shift in aarch64.md around line
> 5569:

Appreciate you taking a look and the validation. I've gotten access to an aarch64 server and the bootstrap demonstrated the issue you saw. This was caused by my re-definition of the pattern to:
+  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+    FAIL;

Which meant for SImode only a sum of >32 bit actually triggers the fail condition for the define_expand whereas the existing define_insn fails on >=32 bit. I looked into the architecture reference manual and the bits are available for ubfx/sbfx for that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but transforms it into a LSR:

Assembly file:
ubfx	w0, w0, 24, 8

Disassembly of section .text:

0000000000000000 <f>:
   0:   53187c00        lsr     w0, w0, #24

Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other assemblers could trip over, I've attached a new patch that allows this encoding and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's better to explicitly do the transformation in GCC.

> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
> instruction taking advantage of the implicit zero-extension of the X-reg.
> (define_split
>   [(set (match_operand:DI 0 "register_operand")
>         (zero_extract:DI (match_operand:DI 1 "register_operand")
>                          (match_operand 2
>                            "aarch64_simd_shift_imm_offset_di")
>                          (match_operand 3
>                            "aarch64_simd_shift_imm_di")))]
>   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>              GET_MODE_BITSIZE (DImode) - 1)
>    && (INTVAL (operands[2]) + INTVAL (operands[3]))
>        == GET_MODE_BITSIZE (SImode)"
>   [(set (match_dup 0)
>         (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
>   {
>     operands[4] = gen_lowpart (SImode, operands[1]);
>   }
> 
> However that only supports DImode, not SImode, so it needs to be changed
> to be more general using GPI.
> 
> Your new extv patterns should replace the magic patterns above it:

With the previous discovery that a sum of 32/64 will trigger LSR in the assembler I was curious what would happen if I remove this pattern. Turns out, you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. So this pattern still has value but I don't think it's necessary to extend it to DI since that'll automatically get turned into a LSR by the assembler as I previously mentioned.


> ;; -------------------------------------------------------------------
> ;; Bitfields
> ;; -------------------------------------------------------------------
> 
> (define_expand "<optab>"
> 
> These are the current extv/extzv patterns, but without a mode. They are no
> longer used when we start using the new ones.
> 
> Note you can write <optab><mode> to combine the extzv and extz patterns.
> But please add a comment mentioning the pattern names so they are easy to
> find!

Good call here, made this change in the new patch. I did see the define_insn of these guys below it but somehow missed that they were expanded just above. I believe, from my understanding of GCC, that the matching pattern below the first line is what constrains <optab> into just extv/extsv from the long list of iterators it belongs to. Still, I see that there's constrained iterators elsewhere like: 

;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")

I added a comment in this patch before the pattern. Thoughts on defining another constrained version to make it clearer (in addition or in lieu of the comment)?

> Besides a bootstrap it is always useful to compile a large body of code with
> your change (say SPEC2006/2017) and check for differences in at least
> codesize. If there is an increase in instruction count then there may be more
> issues that need to be resolved.

Sounds good. I'll get those setup and running and will report back on findings. What's the preferred way to measure codesize? I'm assuming by default the code pages are aligned so smaller differences would need to trip over the boundary to actually show up. 
 
> I find it easiest to develop on a many-core AArch64 server so you get much
> faster builds, bootstraps and regression tests. Cross compilers are mostly
> useful if you want to test big-endian or new architecture features which are
> not yet supported in hardware. You don't normally need to test
> Go/Fortran/ADA etc unless your patch does something that would directly
> affect them.

Good to know. Sounds like our current testing with C/C++ testsuite is alright for most changes.

> Finally do you have a copyright assignment with the FSF?

Lawyers are working on that. Since I have SPEC codesize to go through the licensing may be ready before all analysis is done for the patch.

Modi


[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2637 bytes --]

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4f5898185f5..46c1b870aaf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5514,21 +5514,21 @@
 ;; Bitfields
 ;; -------------------------------------------------------------------
 
-(define_expand "<optab>"
-  [(set (match_operand:DI 0 "register_operand")
-	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
 			(match_operand 2
-			  "aarch64_simd_shift_imm_offset_di")
-			(match_operand 3 "aarch64_simd_shift_imm_di")))]
+			  "aarch64_simd_shift_imm_offset_<mode>")
+			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
   ""
   {
     if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-		   1, GET_MODE_BITSIZE (DImode) - 1))
+		   1, GET_MODE_BITSIZE (<MODE>mode)))
      FAIL;
   }
 )
 
-
 (define_insn "*<optab><mode>"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5537,7 +5537,7 @@
 			 (match_operand 3
 			   "aarch64_simd_shift_imm_<mode>" "n")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-	     1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+	     1, GET_MODE_BITSIZE (<MODE>mode))"
   "<su>bfx\\t%<w>0, %<w>1, %3, %2"
   [(set_attr "type" "bfx")]
 )
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
       if (REG_P (target)
 	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
 	{
-	  target = gen_lowpart (ext_mode, target);
+	  target = gen_lowpart_if_possible (ext_mode, target);
 	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
 	    spec_target_subreg = target;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+  union
+  {
+    float f;
+    uint32_t i;
+  } u = {x};
+  return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+  if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+    *p = y * y;
+  else
+    g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][AARCH64] Fix for PR86901
  2020-02-07  2:12   ` Modi Mo via gcc-patches
@ 2020-02-07  9:52     ` Richard Earnshaw (lists)
  2020-02-07 14:06       ` Wilco Dijkstra
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Earnshaw (lists) @ 2020-02-07  9:52 UTC (permalink / raw)
  To: Modi Mo, Wilco Dijkstra, gcc-patches; +Cc: pinskia

On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:
>> I did a quick bootstrap, this shows several failures like:
>>
>> gcc/builtins.c:9427:1: error: unrecognizable insn:
>>   9427 | }
>>        | ^
>> (insn 212 211 213 24 (set (reg:SI 207)
>>          (zero_extract:SI (reg:SI 206)
>>              (const_int 26 [0x1a])
>>              (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>>       (nil))
>>
>> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
>> Currently cases like this are split into a right shift in aarch64.md around line
>> 5569:
> 
> Appreciate you taking a look and the validation. I've gotten access to an aarch64 server and the bootstrap demonstrated the issue you saw. This was caused by my re-definition of the pattern to:
> +  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
> +    FAIL;
> 
> Which meant for SImode only a sum of >32 bit actually triggers the fail condition for the define_expand whereas the existing define_insn fails on >=32 bit. I looked into the architecture reference manual and the bits are available for ubfx/sbfx for that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but transforms it into a LSR:
> 
> Assembly file:
> ubfx	w0, w0, 24, 8
> 
> Disassembly of section .text:
> 
> 0000000000000000 <f>:
>     0:   53187c00        lsr     w0, w0, #24

This is how LSR is implemented in AArch64, as an extract.  So the 
disassembler is showing the canonical representation.

However, inside the compiler we really want to represent this as a 
shift.  Why?  Because there are cases where we can then merge a shift 
into other operations when we can't merge the more general extract 
operation.  For example, we can merge an LSR into a subsequent 'AND' 
operation, but can't merge a more general extract into an AND. 
Essentially, we want the compiler to describe this in the canonical 
shift form rather than the extract form.

Ideally this would be handled inside the mid-end expansion of an 
extract, but in the absence of that I think this is best done inside the 
extv expansion so that we never end up with a real extract in that case.


> 
> Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other assemblers could trip over, I've attached a new patch that allows this encoding and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's better to explicitly do the transformation in GCC.
> 
>> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
>> instruction taking advantage of the implicit zero-extension of the X-reg.
>> (define_split
>>    [(set (match_operand:DI 0 "register_operand")
>>          (zero_extract:DI (match_operand:DI 1 "register_operand")
>>                           (match_operand 2
>>                             "aarch64_simd_shift_imm_offset_di")
>>                           (match_operand 3
>>                             "aarch64_simd_shift_imm_di")))]
>>    "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>>               GET_MODE_BITSIZE (DImode) - 1)
>>     && (INTVAL (operands[2]) + INTVAL (operands[3]))
>>         == GET_MODE_BITSIZE (SImode)"
>>    [(set (match_dup 0)
>>          (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
>>    {
>>      operands[4] = gen_lowpart (SImode, operands[1]);
>>    }
>>
>> However that only supports DImode, not SImode, so it needs to be changed
>> to be more general using GPI.
>>
>> Your new extv patterns should replace the magic patterns above it:
> 
> With the previous discovery that a sum of 32/64 will trigger LSR in the assembler I was curious what would happen if I remove this pattern. Turns out, you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. So this pattern still has value but I don't think it's necessary to extend it to DI since that'll automatically get turned into a LSR by the assembler as I previously mentioned.
> 
> 
>> ;; -------------------------------------------------------------------
>> ;; Bitfields
>> ;; -------------------------------------------------------------------
>>
>> (define_expand "<optab>"
>>
>> These are the current extv/extzv patterns, but without a mode. They are no
>> longer used when we start using the new ones.
>>
>> Note you can write <optab><mode> to combine the extzv and extz patterns.
>> But please add a comment mentioning the pattern names so they are easy to
>> find!
> 
> Good call here, made this change in the new patch. I did see the define_insn of these guys below it but somehow missed that they were expanded just above. I believe, from my understanding of GCC, that the matching pattern below the first line is what constrains <optab> into just extv/extsv from the long list of iterators it belongs to. Still, I see that there's constrained iterators elsewhere like:
> 
> ;; Optab prefix for sign/zero-extending operations
> (define_code_attr su_optab [(sign_extend "") (zero_extend "u")
> 
> I added a comment in this patch before the pattern. Thoughts on defining another constrained version to make it clearer (in addition or in lieu of the comment)?
> 
>> Besides a bootstrap it is always useful to compile a large body of code with
>> your change (say SPEC2006/2017) and check for differences in at least
>> codesize. If there is an increase in instruction count then there may be more
>> issues that need to be resolved.
> 
> Sounds good. I'll get those setup and running and will report back on findings. What's the preferred way to measure codesize? I'm assuming by default the code pages are aligned so smaller differences would need to trip over the boundary to actually show up.
>   
>> I find it easiest to develop on a many-core AArch64 server so you get much
>> faster builds, bootstraps and regression tests. Cross compilers are mostly
>> useful if you want to test big-endian or new architecture features which are
>> not yet supported in hardware. You don't normally need to test
>> Go/Fortran/ADA etc unless your patch does something that would directly
>> affect them.
> 
> Good to know. Sounds like our current testing with C/C++ testsuite is alright for most changes.
> 
>> Finally do you have a copyright assignment with the FSF?
> 
> Lawyers are working on that. Since I have SPEC codesize to go through the licensing may be ready before all analysis is done for the patch.
> 
> Modi
> 

R.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][AARCH64] Fix for PR86901
  2020-02-07  9:52     ` Richard Earnshaw (lists)
@ 2020-02-07 14:06       ` Wilco Dijkstra
  2020-02-21 23:17         ` Modi Mo via gcc-patches
  0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-02-07 14:06 UTC (permalink / raw)
  To: Richard Earnshaw, Modi Mo, gcc-patches; +Cc: pinskia

Hi,

Richard wrote:
> However, inside the compiler we really want to represent this as a 
>shift.
...
> Ideally this would be handled inside the mid-end expansion of an 
> extract, but in the absence of that I think this is best done inside the 
> extv expansion so that we never end up with a real extract in that case.

Yes the mid-end could be improved - it turns out it is due to expansion of
bitfields, all variations of (x & mask) >> N are optimized into shifts early on.

However it turns out Combine can already transform these zero/sign_extends
into shifts, so we do end up with good code. With the latest patch I get:

typedef struct { int x : 6, y : 6, z : 20; } X;

int f (int x, X *p) { return x + p->z; }

	ldr	w1, [x1]
	add	w0, w0, w1, asr 12
	ret

So this case looks alright.

> Sounds good. I'll get those setup and running and will report back on findings. What's
> the preferred way to measure codesize? I'm assuming by default the code pages are 
> aligned so smaller differences would need to trip over the boundary to actually show up.

You can use the size command on the binaries:

>size /bin/ls
   text	   data	    bss	    dec	    hex	filename
 107271	   2024	   3472	 112767	  1b87f	/bin/ls

As you can see it shows the text size in bytes. It is not rounded up to a page, so it is an 
accurate measure of the codesize. Generally -O2 size is most useful to check (since that
is what most applications build with), but -Ofast -flto can be useful as well (the global 
inlining means you get instruction combinations which appear less often with -O2).

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH][AARCH64] Fix for PR86901
  2020-02-07 14:06       ` Wilco Dijkstra
@ 2020-02-21 23:17         ` Modi Mo via gcc-patches
  2020-03-03 15:54           ` Wilco Dijkstra
  2020-03-03 16:04           ` Wilco Dijkstra
  0 siblings, 2 replies; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-21 23:17 UTC (permalink / raw)
  To: Wilco Dijkstra, Richard Earnshaw, gcc-patches; +Cc: pinskia

[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]

> > Sounds good. I'll get those setup and running and will report back on
> > findings. What's the preferred way to measure codesize? I'm assuming
> > by default the code pages are aligned so smaller differences would need to trip
> over the boundary to actually show up.
> 
> You can use the size command on the binaries:
> 
> >size /bin/ls
>    text	   data	    bss	    dec	    hex	filename
>  107271	   2024	   3472	 112767	  1b87f	/bin/ls
> 
> As you can see it shows the text size in bytes. It is not rounded up to a page, so it
> is an accurate measure of the codesize. Generally -O2 size is most useful to
> check (since that is what most applications build with), but -Ofast -flto can be
> useful as well (the global inlining means you get instruction combinations which
> appear less often with -O2).
> 
> Cheers,
> Wilco
Alrighty, I've got spec 2017 and spec 2006 setup and building. Using default configurations so -O2 in spec2006 and -O3 in spec2017. Testing the patch as last sent showed a 1% code size regression in spec 2017 perlbench which turned out to be a missing pattern for tbnz and all its variants:

(define_insn "*tb<optab><mode>1"
  [(set (pc) (if_then_else
	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")  <--- only matches against zero_extract:DI
				    (const_int 1)
				    (match_operand 1
				      "aarch64_simd_shift_imm_<mode>" "n"))

The zero extract now matching against other modes would generate a test + branch rather than the combined instruction which led to the code size regression. I've updated the patch so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.

Spec results are attached for reference.

@Wilco I've gotten instruction on my side to set up an individual contributor's license for the time being. Can you send me the necessary documents to make that happen? Thanks!

ChangeLog:
2020-02-21  Di Mo  <modimo@microsoft.com>

gcc/
	* config/aarch64/aarch64.md: Add GPI modes to extsv/extv patterns. Allow tb<optab><mode>1 pattern to match against zero_extract:GPI.
	* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
	* gcc.target/aarch64/pr86901.c: Add new test.

Modi

[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2962 bytes --]

diff --git a/config/aarch64/aarch64.md b/config/aarch64/aarch64.md
index 4f5898185f5..46c1b870aaf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -786,7 +786,7 @@
 
 (define_insn "*tb<optab><mode>1"
   [(set (pc) (if_then_else
-	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
+	      (EQL (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
 				    (const_int 1)
 				    (match_operand 1
 				      "aarch64_simd_shift_imm_<mode>" "n"))
@@ -5514,21 +5514,21 @@
 ;; Bitfields
 ;; -------------------------------------------------------------------
 
-(define_expand "<optab>"
-  [(set (match_operand:DI 0 "register_operand")
-	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
 			(match_operand 2
-			  "aarch64_simd_shift_imm_offset_di")
-			(match_operand 3 "aarch64_simd_shift_imm_di")))]
+			  "aarch64_simd_shift_imm_offset_<mode>")
+			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
   ""
   {
     if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-		   1, GET_MODE_BITSIZE (DImode) - 1))
+		   1, GET_MODE_BITSIZE (<MODE>mode)))
      FAIL;
   }
 )
 
-
 (define_insn "*<optab><mode>"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5537,7 +5537,7 @@
 			 (match_operand 3
 			   "aarch64_simd_shift_imm_<mode>" "n")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-	     1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+	     1, GET_MODE_BITSIZE (<MODE>mode))"
   "<su>bfx\\t%<w>0, %<w>1, %3, %2"
   [(set_attr "type" "bfx")]
 )
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
       if (REG_P (target)
 	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
 	{
-	  target = gen_lowpart (ext_mode, target);
+	  target = gen_lowpart_if_possible (ext_mode, target);
 	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
 	    spec_target_subreg = target;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+  union
+  {
+    float f;
+    uint32_t i;
+  } u = {x};
+  return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+  if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+    *p = y * y;
+  else
+    g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */

[-- Attachment #3: spec2006 --]
[-- Type: application/octet-stream, Size: 2912 bytes --]

   text	   data	    bss	filename
1084394	20456	12616	benchspec/CPU2006/400.perlbench/exe/perlbench_base.base
1084858	20456	12616	benchspec/CPU2006/400.perlbench/exe/perlbench_base.test
58872	3800	4312	benchspec/CPU2006/401.bzip2/exe/bzip2_base.base
58872	3800	4312	benchspec/CPU2006/401.bzip2/exe/bzip2_base.test
3111150	7600	749024	benchspec/CPU2006/403.gcc/exe/gcc_base.base
3111510	7600	749024	benchspec/CPU2006/403.gcc/exe/gcc_base.test
12395	728	11904	benchspec/CPU2006/429.mcf/exe/mcf_base.base
12395	728	11904	benchspec/CPU2006/429.mcf/exe/mcf_base.test
110942	1513	37144	benchspec/CPU2006/433.milc/exe/milc_base.base
110942	1513	37144	benchspec/CPU2006/433.milc/exe/milc_base.test
221478	1024	496	benchspec/CPU2006/444.namd/exe/namd_base.base
222398	1024	496	benchspec/CPU2006/444.namd/exe/namd_base.test
1533494	1985324	2328704	benchspec/CPU2006/445.gobmk/exe/gobmk_base.base
1533526	1985324	2328704	benchspec/CPU2006/445.gobmk/exe/gobmk_base.test
2787753	3208	3512	benchspec/CPU2006/447.dealII/exe/dealII_base.base
2788536	3208	3512	benchspec/CPU2006/447.dealII/exe/dealII_base.test
382123	2408	1584	benchspec/CPU2006/450.soplex/exe/soplex_base.base
382051	2408	1584	benchspec/CPU2006/450.soplex/exe/soplex_base.test
824472	12944	161232	benchspec/CPU2006/453.povray/exe/povray_base.base
825784	12944	161232	benchspec/CPU2006/453.povray/exe/povray_base.test
271536	5680	81904	benchspec/CPU2006/456.hmmer/exe/hmmer_base.base
271552	5680	81904	benchspec/CPU2006/456.hmmer/exe/hmmer_base.test
130864	1984	2576192	benchspec/CPU2006/458.sjeng/exe/sjeng_base.base
130864	1984	2576192	benchspec/CPU2006/458.sjeng/exe/sjeng_base.test
38405	772	96	benchspec/CPU2006/462.libquantum/exe/libquantum_base.base
38397	772	96	benchspec/CPU2006/462.libquantum/exe/libquantum_base.test
485915	11108	370856	benchspec/CPU2006/464.h264ref/exe/h264ref_base.base
485883	11108	370856	benchspec/CPU2006/464.h264ref/exe/h264ref_base.test
11440	712	24	benchspec/CPU2006/470.lbm/exe/lbm_base.base
11440	712	24	benchspec/CPU2006/470.lbm/exe/lbm_base.test
621357	9560	14440	benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.base
621405	9560	14440	benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.test
37927	908	5144	benchspec/CPU2006/473.astar/exe/astar_base.base
37927	908	5144	benchspec/CPU2006/473.astar/exe/astar_base.test
171269	3500	32880	benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.base
171269	3500	32880	benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.test
3950983	140104	11360	benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.base
3952407	140104	11360	benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.test
2120	600	8	benchspec/CPU2006/998.specrand/exe/specrand_base.base
2120	600	8	benchspec/CPU2006/998.specrand/exe/specrand_base.test
2120	600	8	benchspec/CPU2006/999.specrand/exe/specrand_base.base
2120	600	8	benchspec/CPU2006/999.specrand/exe/specrand_base.test

[-- Attachment #4: spec2017 --]
[-- Type: application/octet-stream, Size: 3456 bytes --]

   text	   data	    bss	filename
2379985	8520	8896	benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.Baseline-64
2380673	8520	8896	benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.test-64
9992763	33832	1146480	benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.Baseline-64
9993491	33832	1146480	benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.test-64
29671	744	720	benchspec/CPU/505.mcf_r/exe/mcf_r_base.Baseline-64
29671	744	720	benchspec/CPU/505.mcf_r/exe/mcf_r_base.test-64
729385	984	960	benchspec/CPU/508.namd_r/exe/namd_r_base.Baseline-64
730617	984	960	benchspec/CPU/508.namd_r/exe/namd_r_base.test-64
8791120	8016	16128	benchspec/CPU/510.parest_r/exe/parest_r_base.Baseline-64
8793944	8016	16128	benchspec/CPU/510.parest_r/exe/parest_r_base.test-64
22536	816	24	benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.Baseline-64
22528	816	24	benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.test-64
992982	32264	188784	benchspec/CPU/511.povray_r/exe/povray_r_base.Baseline-64
994654	32264	188784	benchspec/CPU/511.povray_r/exe/povray_r_base.test-64
14568	712	24	benchspec/CPU/519.lbm_r/exe/lbm_r_base.Baseline-64
14568	712	24	benchspec/CPU/519.lbm_r/exe/lbm_r_base.test-64
2206053	8968	46472	benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.Baseline-64
2206933	8968	46472	benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.test-64
5654485	142560	14752	benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.Baseline-64
5656441	142560	14752	benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.test-64
22840	792	24	benchspec/CPU/525.x264_r/exe/imagevalidate_525_base.Baseline-64
22832	792	24	benchspec/CPU/525.x264_r/exe/imagevalidate_525_base.test-64
580687	3312	13104	benchspec/CPU/525.x264_r/exe/ldecod_r_base.Baseline-64
580895	3312	13104	benchspec/CPU/525.x264_r/exe/ldecod_r_base.test-64
654788	6000	29440	benchspec/CPU/525.x264_r/exe/x264_r_base.Baseline-64
654908	6000	29440	benchspec/CPU/525.x264_r/exe/x264_r_base.test-64
10991284	4842724	417288	benchspec/CPU/526.blender_r/exe/blender_r_base.Baseline-64
10991652	4842724	417288	benchspec/CPU/526.blender_r/exe/blender_r_base.test-64
22536	816	24	benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.Baseline-64
22528	816	24	benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.test-64
90464	816	12138272	benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.Baseline-64
90608	816	12138272	benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.test-64
22507	784	24	benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.Baseline-64
22499	784	24	benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.test-64
2198219	10820	5520	benchspec/CPU/538.imagick_r/exe/imagick_r_base.Baseline-64
2197491	10820	5520	benchspec/CPU/538.imagick_r/exe/imagick_r_base.test-64
205038	8568	30016	benchspec/CPU/541.leela_r/exe/leela_r_base.Baseline-64
205046	8568	30016	benchspec/CPU/541.leela_r/exe/leela_r_base.test-64
205519	3560	381944	benchspec/CPU/544.nab_r/exe/nab_r_base.Baseline-64
205463	3560	381944	benchspec/CPU/544.nab_r/exe/nab_r_base.test-64
204894	908	17592	benchspec/CPU/557.xz_r/exe/xz_r_base.Baseline-64
204918	908	17592	benchspec/CPU/557.xz_r/exe/xz_r_base.test-64
4173	604	5008	benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.Baseline-64
4173	604	5008	benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.test-64
4173	604	5008	benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.Baseline-64
4173	604	5008	benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.test-64

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][AARCH64] Fix for PR86901
  2020-02-21 23:17         ` Modi Mo via gcc-patches
@ 2020-03-03 15:54           ` Wilco Dijkstra
  2020-03-03 16:04           ` Wilco Dijkstra
  1 sibling, 0 replies; 9+ messages in thread
From: Wilco Dijkstra @ 2020-03-03 15:54 UTC (permalink / raw)
  To: Modi Mo, Richard Earnshaw, gcc-patches; +Cc: pinskia

Hi Modi,

> The zero extract now matching against other modes would generate a test + branch rather
> than the combined instruction which led to the code size regression. I've updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.

That's looking better indeed. I notice there are still differences, eg. tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly tbz reg, #7).

There are also more uses of uxtw, ubfiz, sbfiz - for example 


@Wilco I've gotten instruction on my side to set up an individual contributor's license for the time being. Can you send me the necessary documents to make that happen? Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH][AARCH64] Fix for PR86901
  2020-02-21 23:17         ` Modi Mo via gcc-patches
  2020-03-03 15:54           ` Wilco Dijkstra
@ 2020-03-03 16:04           ` Wilco Dijkstra
  2020-04-30 23:15             ` Modi Mo
  1 sibling, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-03-03 16:04 UTC (permalink / raw)
  To: Modi Mo, Richard Earnshaw, gcc-patches; +Cc: pinskia

Hi Modi,

> The zero extract now matching against other modes would generate a test + branch rather
> than the combined instruction which led to the code size regression. I've updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.

That's looking better indeed. I notice there are still differences, eg. tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly tbz reg, #7).

There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like this in namd:

  42c7dc:       13007400        sbfx    w0, w0, #0, #30
  42c7e0:       937c7c00        sbfiz   x0, x0, #4, #32

So it would be a good idea to check any benchmarks where there is still a non-trivial
codesize difference. You can get a quick idea what is happening by grepping for
instructions like this:

grep -c sbfiz out1.txt out2.txt
out1.txt:872
out2.txt:934

grep -c tbnz out1.txt out2.txt
out1.txt:5189
out2.txt:4989

> Can you send me the necessary documents to make that happen? Thanks!

That's something you need to sort out with the fsf. There is a mailing list for this:
assign@gnu.org.

Cheers,
Wilco

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH][AARCH64] Fix for PR86901
  2020-03-03 16:04           ` Wilco Dijkstra
@ 2020-04-30 23:15             ` Modi Mo
  0 siblings, 0 replies; 9+ messages in thread
From: Modi Mo @ 2020-04-30 23:15 UTC (permalink / raw)
  To: Wilco Dijkstra, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]

Hey all-apologies for the long delay. Haven't had time until recently to look into this further.
>>> The zero extract now matching against other modes would generate a
>>> test + branch rather than the combined instruction which led to the
>>> code size regression. I've updated the patch so that tbnz etc. matches GPI and
>>that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.
>>
>>That's looking better indeed. I notice there are still differences, eg. tbz/tbnz
>>counts are significantly different in perlbench, with ~350 missed cases overall
>>(mostly tbz reg, #7).
>>
>>There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like this
>>in namd:
>>
>>  42c7dc:       13007400        sbfx    w0, w0, #0, #30
>>  42c7e0:       937c7c00        sbfiz   x0, x0, #4, #32
>>
>>So it would be a good idea to check any benchmarks where there is still a non-
>>trivial codesize difference. You can get a quick idea what is happening by
>>grepping for instructions like this:
>>
>>grep -c sbfiz out1.txt out2.txt
>>out1.txt:872
>>out2.txt:934
>>
>>grep -c tbnz out1.txt out2.txt
>>out1.txt:5189
>>out2.txt:4989

That's really good insight Wilco! I took a look at the tbnz/tbz case in perl and we lose matching against this because allowing SI mode on extv/extzv causes subst in combine.c to generate:

(lshiftrt:SI (reg:SI 107 [ _16 ])
    (const_int 7 [0x7]))
(nil)

Instead of:

(and:DI (lshiftrt:DI (subreg:DI (reg:SI 107 [ _16 ]) 0)
        (const_int 7 [0x7]))
    (const_int 1 [0x1]))

The latter case is picked up in make_compound_operation_int to transform into a zero_extract while the new case is left alone. A lshiftrt generally can't be reduced down to a bit-test but in this case it can because we have zero_bit information on it. Given that, looking around try_combine it seems like the best place to detect this pattern is in the 2nd chance code after the first failure of recog_for_combine which I've done in this patch. I think this is the place to put this fix given changing subst/make_compound_operation_int leads to significantly more diffs.

After this change the total number of tbnz/tbz lines up near identical to the baseline which is good and overall size within .1% on spec 2017 and spec 2006. However, looking further at ubfiz there's a pretty large increase in certain benchmarks. I looked into spec 2017/blender and we fail to combine this pattern:

Trying 653 -> 654:
  653: r512:SI=r94:SI 0>>0x8
      REG_DEAD r94:SI
  654: r513:DI=zero_extend(r512:SI)
      REG_DEAD r512:SI
Failed to match this instruction:
(set (reg:DI 513)
    (zero_extract:DI (reg:SI 94 [ bswapdst_4 ])
        (const_int 8 [0x8])
        (const_int 8 [0x8])))

Where previously we combined it like this:

Trying 653 -> 654:
  653: r512:SI=r94:SI 0>>0x8
      REG_DEAD r94:SI
  654: r513:DI=zero_extend(r512:SI)
      REG_DEAD r512:SI
Successfully matched this instruction:
(set (reg:DI 513)
    (zero_extract:DI (subreg:DI (reg:SI 94 [ bswapdst_4 ]) 0) // subreg used
        (const_int 8 [0x8])
        (const_int 8 [0x8])))

Here's where I'm at an impasse. The code that generates the modes in get_best_reg_extraction_insn looks at the inner mode of SI now that extzvsi is valid and generates a non-subreg use. However, the MD pattern is looking for all modes being DI or SI not a mix. I think a fix could be done to canonicalize these extracts to the same mode but am unsure if in general a mode mismatched extract RTX is valid which would make this a fairly large change. 

Latest patch with fix for tbnz/tbz is attached alongside the numbers for SPEC and instruction count for SPEC 2017 are attached for reference.

>>> Can you send me the necessary documents to make that happen? Thanks!
>>
>>That's something you need to sort out with the fsf. There is a mailing list for this:
>>mailto:assign@gnu.org.

I haven't had any response from my previous mail there. Should I add one of you to the CC or mail someone specifically to get traction?

Best,
Modi

[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 4504 bytes --]

diff --git a/gcc/combine.c b/gcc/combine.c
index cff76cd3303..fdb03f5d305 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -4166,6 +4166,40 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
 	}
     }
 
+  /* Check for a case where we generate an IF_THEN_ELSE that depends on a
+     logical right shift.  If we have nonzero_bit information that indicates the
+     shift will ultimately lead down to a single bit generate the equivalent
+     zero_extract pattern to see if the target has an appropriate instruction
+     that can handle this.  */
+
+  else if (insn_code_number < 0 && asm_noperands (newpat) < 0
+	   && GET_CODE (newpat) == SET
+	   && GET_CODE (XEXP (newpat, 1)) == IF_THEN_ELSE)
+    {
+      rtx cond, true_rtx, false_rtx, new_rtx;
+
+      cond = if_then_else_cond (XEXP (newpat, 1), &true_rtx,
+				&false_rtx);
+      if (GET_CODE (cond) == LSHIFTRT)
+      {
+	  unsigned HOST_WIDE_INT nonzero
+	    = nonzero_bits (XEXP (cond, 0), GET_MODE (XEXP (cond, 0)));
+	  unsigned HOST_WIDE_INT shift_amount = UINTVAL (XEXP (cond, 1));
+
+	  if (nonzero >> shift_amount == 1)
+	    {
+	      new_rtx = make_extraction (GET_MODE (cond), XEXP (cond, 0),
+					 shift_amount, 0, 1,
+					 1, 0, 1);
+
+	      newpat = simplify_replace_rtx (newpat, cond, new_rtx);
+
+	      insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
+	    }
+
+      }
+    }
+
   /* If it still isn't recognized, fail and change things back the way they
      were.  */
   if ((insn_code_number < 0
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..552dc712304 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -897,7 +897,8 @@
 
 (define_insn "*tb<optab><mode>1"
   [(set (pc) (if_then_else
-	      (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
+	      (EQL (zero_extract:GPI
+				    (match_operand:GPI 0 "register_operand" "r")
 				    (const_int 1)
 				    (match_operand 1
 				      "aarch64_simd_shift_imm_<mode>" "n"))
@@ -5530,21 +5531,21 @@
 ;; Bitfields
 ;; -------------------------------------------------------------------
 
-(define_expand "<optab>"
-  [(set (match_operand:DI 0 "register_operand")
-	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
 			(match_operand 2
-			  "aarch64_simd_shift_imm_offset_di")
-			(match_operand 3 "aarch64_simd_shift_imm_di")))]
+			  "aarch64_simd_shift_imm_offset_<mode>")
+			(match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
   ""
   {
     if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-		   1, GET_MODE_BITSIZE (DImode) - 1))
+		   1, GET_MODE_BITSIZE (<MODE>mode)))
      FAIL;
   }
 )
 
-
 (define_insn "*<optab><mode>"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5553,7 +5554,7 @@
 			 (match_operand 3
 			   "aarch64_simd_shift_imm_<mode>" "n")))]
   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
-	     1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+	     1, GET_MODE_BITSIZE (<MODE>mode))"
   "<su>bfx\\t%<w>0, %<w>1, %3, %2"
   [(set_attr "type" "bfx")]
 )
diff --git a/gcc/expmed.c b/gcc/expmed.c
index e7c03fbf92c..f39ab04ced2 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1564,7 +1564,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
       if (REG_P (target)
 	  && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
 	{
-	  target = gen_lowpart (ext_mode, target);
+	  target = gen_lowpart_if_possible (ext_mode, target);
 	  if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
 	    spec_target_subreg = target;
 	}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 00000000000..5dd6fdf75f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+  union
+  {
+    float f;
+    uint32_t i;
+  } u = {x};
+  return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+  if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+    *p = y * y;
+  else
+    g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */

[-- Attachment #3: spec2006_v2.txt --]
[-- Type: text/plain, Size: 3965 bytes --]

base							diff							% increase
text	data	bss	total	filename			text	data	bss	total	filename			
1038264	243802	12472	1294538	base/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-base			1038344	243714	12472	1294530	diff/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-diff			0.01%
72024	16030	4328	92382	base/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-base			72016	16030	4328	92374	diff/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-diff			-0.01%
2651976	816398	749792	4218166	base/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-base			2652096	816558	749792	4218446	diff/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-diff			0.00%
8232	4803	11912	24947	base/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-base			8232	4803	11912	24947	diff/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-diff			0.00%
105912	31228	37176	174316	base/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-base			105904	31228	37176	174308	diff/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-diff			-0.01%
205752	25512	496	231760	base/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-base			204792	25288	496	230576	diff/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-diff			-0.47%
757960	2917634	2328968	6004562	base/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-base			757952	2917634	2328968	6004554	diff/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-diff			0.00%
2180808	880678	3528	3065014	base/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-base			2181248	880670	3528	3065446	diff/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-diff			0.02%
345832	79234	1584	426650	base/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-base			345936	79234	1584	426754	diff/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-diff			0.03%
779768	225837	161496	1167101	base/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-base			779904	225965	161496	1167365	diff/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-diff			0.02%
249528	69146	81944	400618	base/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-base			249536	69146	81944	400626	diff/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-diff			0.00%
125080	38096	2576288	2739464	base/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-base			125080	38096	2576288	2739464	diff/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-diff			0.00%
30452	11213	96	41761	base/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-base			30420	11221	96	41737	diff/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-diff			-0.11%
567320	127533	371064	1065917	base/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-base			567304	127533	371064	1065901	diff/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-diff			0.00%
8376	4584	24	12984	base/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-base			8368	4584	24	12976	diff/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-diff			-0.10%
484536	216255	14528	715319	base/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-base			484528	216255	14528	715311	diff/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-diff			0.00%
33256	10075	5152	48483	base/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-base			33248	10075	5152	48475	diff/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-diff			-0.02%
155608	53889	32888	242385	base/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-base			155600	53889	32888	242377	diff/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-diff			-0.01%
2930312	1698488	11544	4640344	base/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-base			2930304	1698488	11544	4640336	diff/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-diff			0.00%
1016	1728	8	2752	base/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-base			1016	1728	8	2752	diff/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-diff			0.00%
1016	1728	8	2752	base/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-base			1016	1728	8	2752	diff/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-diff			0.00%
12733028							12732844							0.00%

[-- Attachment #4: spec2017_insn_count.txt --]
[-- Type: text/plain, Size: 2447 bytes --]

	tbnz			tbz			sbfiz			ubfiz			uxtw			sxtw		
	base	diff	diff-base	base	diff	diff-base	base	diff	diff-base	base	diff	diff-base	base	diff	diff-base	base	diff	diff-base
diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64	4394	4396	2	4442	4444	2	215	215	0	322	323	1	328	328	0	3217	3234	17
diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64	7755	7833	78	6645	6728	83	1235	1236	1	2801	3018	217	10271	10271	0	9770	9794	24
diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64	19	19	0	12	12	0	1	1	0	0	0	0	0	0	0	25	25	0
diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64	148	148	0	65	65	0	724	786	62	3302	3475	173	735	735	0	5057	5121	64
diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64	1295	1295	0	1447	1448	1	1694	1694	0	5775	5899	124	7367	7368	1	11068	11072	4
diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64	5	5	0	8	8	0	3	3	0	25	27	2	2	2	0	13	13	0
diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64	312	312	0	438	438	0	245	245	0	140	154	14	165	165	0	982	987	5
diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64	3	3	0	3	3	0	5	5	0	0	0	0	0	0	0	8	8	0
diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64	690	690	0	361	361	0	336	336	0	26	26	0	179	179	0	1216	1216	0
diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64	446	451	5	905	905	0	138	138	0	1108	1249	141	3946	3946	0	893	893	0
diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64	6750	6761	11	7849	7855	6	2125	2125	0	813	952	139	1341	1340	-1	8339	8358	19
diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64	5	5	0	8	8	0	3	3	0	25	27	2	2	2	0	13	13	0
diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64	43	43	0	13	13	0	18	18	0	19	23	4	9	9	0	414	424	10
diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64	5	5	0	8	8	0	3	3	0	25	27	2	2	2	0	13	13	0
diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64	340	340	0	497	497	0	2	2	0	682	682	0	75	75	0	157	158	1
diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64	24	24	0	26	26	0	13	13	0	6	10	4	44	44	0	595	595	0
diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64	56	56	0	70	70	0	92	92	0	36	47	11	10	10	0	454	454	0
diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64	22	22	0	9	9	0	1	1	0	115	118	3	251	251	0	28	28	0
diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	4	0
diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	4	4	0

[-- Attachment #5: spec2017_v2.txt --]
[-- Type: text/plain, Size: 3716 bytes --]

base						diff						% increase
text	data	bss	total	filename		text	data	bss	total	filename		
1775240	561457	8760	2345457	base/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.base-64		1775432	561305	8760	2345497	diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64		0.01%
7755224	2191963	1146680	11093867	base/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.base-64		7754488	2192051	1146680	11093219	diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64		-0.01%
23064	6391	728	30183	base/benchspec/CPU/505.mcf_r/exe/mcf_r_base.base-64		23064	6391	728	30183	diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64		0.00%
677176	30046	960	708182	base/benchspec/CPU/508.namd_r/exe/namd_r_base.base-64		677816	30046	960	708822	diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64		0.09%
7082456	1704125	16128	8802709	base/benchspec/CPU/510.parest_r/exe/parest_r_base.base-64		7084584	1704145	16128	8804857	diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64		0.03%
14072	9141	24	23237	base/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.base-64		14072	9141	24	23237	diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64		0.00%
789000	246142	188784	1223926	base/benchspec/CPU/511.povray_r/exe/povray_r_base.base-64		789032	246134	188784	1223950	diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64		0.00%
10648	4744	24	15416	base/benchspec/CPU/519.lbm_r/exe/lbm_r_base.base-64		10640	4744	24	15408	diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64		-0.08%
1505352	744909	46536	2296797	base/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.base-64		1505344	744925	46536	2296805	diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64		0.00%
3992744	1933641	14752	5941137	base/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.base-64		3992736	1933641	14752	5941129	diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64		0.00%
7994116	7841208	417344	16252668	base/benchspec/CPU/526.blender_r/exe/blender_r_base.base-64		7994276	7841400	417344	16253020	diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64		0.00%
14072	9141	24	23237	base/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.base-64		14072	9141	24	23237	diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64		0.00%
76264	16488	12138272	12231024	base/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.base-64		76296	16488	12138272	12231056	diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64		0.04%
14072	9080	24	23176	base/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.base-64		14072	9080	24	23176	diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64		0.00%
1772984	408975	5520	2187479	base/benchspec/CPU/538.imagick_r/exe/imagick_r_base.base-64		1772152	409119	5520	2186791	diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64		-0.05%
174616	46829	30032	251477	base/benchspec/CPU/541.leela_r/exe/leela_r_base.base-64		174624	46829	30032	251485	diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64		0.00%
166616	39807	381952	588375	base/benchspec/CPU/544.nab_r/exe/nab_r_base.base-64		166616	39807	381952	588375	diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64		0.00%
131672	69274	17576	218522	base/benchspec/CPU/557.xz_r/exe/xz_r_base.base-64		131680	69274	17576	218530	diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64		0.01%
2440	2361	5008	9809	base/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.base-64		2440	2361	5008	9809	diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64		0.00%
2440	2361	5008	9809	base/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.base-64		2440	2361	5008	9809	diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64		0.00%
33974268						33975876						0.00%

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-04-30 23:15 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05  0:01 [PATCH][AARCH64] Fix for PR86901 Modi Mo via gcc-patches
2020-02-05 15:00 ` Wilco Dijkstra
2020-02-07  2:12   ` Modi Mo via gcc-patches
2020-02-07  9:52     ` Richard Earnshaw (lists)
2020-02-07 14:06       ` Wilco Dijkstra
2020-02-21 23:17         ` Modi Mo via gcc-patches
2020-03-03 15:54           ` Wilco Dijkstra
2020-03-03 16:04           ` Wilco Dijkstra
2020-04-30 23:15             ` Modi Mo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).