* [PATCH][AARCH64] Fix for PR86901
@ 2020-02-05 0:01 Modi Mo via gcc-patches
2020-02-05 15:00 ` Wilco Dijkstra
0 siblings, 1 reply; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-05 0:01 UTC (permalink / raw)
To: gcc-patches; +Cc: Wilco.Dijkstra, pinskia
[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]
Hi,
Adding support for extv<mode> and extzv<mode> on aarch64 as described in PR86901. I also changed extract_bit_field_using_extv to use gen_lowpart_if_possible instead of gen_lowpart directly. Using gen_lowpart directly will fail with an ICE in building libgcc when the compiler fails to successfully do so whereas gen_lowpart_if_possible will bail out of matching this pattern gracefully.
I'm looking through past mails and https://gcc.gnu.org/contribute.html which details testing bootstrap. I'm building a cross-compiler (x64_aarch64) and the instructions don't address that scenario. The GCC cross-build is green and there's no regressions on the C/C++ tests (The go/fortran etc. look like they need additional infrastructure built on my side to work). Is there a workflow for cross-builds or should I aim to get an aarch64 native machine for full validation?
ChangeLog:
2020-02-03 Di Mo <modimo@microsoft.com>
gcc/
* config/aarch64/aarch64.md: Add define_expand for extv<mode> and extzv<mode>.
* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
* gcc.target/aarch64/pr86901.c: Add new test.
Best,
Modi
[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2828 bytes --]
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a1894063a1e..6bc73b5f8e2a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5491,6 +5491,40 @@
[(set_attr "type" "bfm")]
)
+;; Bitfield extract and zero extend (extzv)
+(define_expand "extzv<mode>"
+ [(set (match_operand:GPI 0 "register_operand")
+ (zero_extract:GPI (match_operand:GPI 1 "register_operand")
+ (match_operand 2 "const_int_operand")
+ (match_operand 3 "const_int_operand")))]
+ ""
+{
+ unsigned HOST_WIDE_INT width = UINTVAL (operands[2]);
+ unsigned HOST_WIDE_INT pos = UINTVAL (operands[3]);
+ rtx value = operands[1];
+
+ if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+ FAIL;
+ operands[1] = force_reg (<MODE>mode, value);
+})
+
+;; Bitfield extract and sign extend (extv)
+(define_expand "extv<mode>"
+ [(set (match_operand:GPI 0 "register_operand")
+ (sign_extract:GPI (match_operand:GPI 1 "register_operand")
+ (match_operand 2 "const_int_operand")
+ (match_operand 3 "const_int_operand")))]
+ ""
+{
+ unsigned HOST_WIDE_INT width = UINTVAL (operands[2]);
+ unsigned HOST_WIDE_INT pos = UINTVAL (operands[3]);
+ rtx value = operands[1];
+
+ if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+ FAIL;
+ operands[1] = force_reg (<MODE>mode, value);
+})
+
;; Match a bfi instruction where the shift of OP3 means that we are
;; actually copying the least significant bits of OP3 into OP0 by way
;; of the AND masks and the IOR instruction. A similar instruction
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
if (REG_P (target)
&& TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
- target = gen_lowpart (ext_mode, target);
+ target = gen_lowpart_if_possible (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
spec_target_subreg = target;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+ union
+ {
+ float f;
+ uint32_t i;
+ } u = {x};
+ return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+ if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+ *p = y * y;
+ else
+ g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH][AARCH64] Fix for PR86901
2020-02-05 0:01 [PATCH][AARCH64] Fix for PR86901 Modi Mo via gcc-patches
@ 2020-02-05 15:00 ` Wilco Dijkstra
2020-02-07 2:12 ` Modi Mo via gcc-patches
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-02-05 15:00 UTC (permalink / raw)
To: Modi Mo, gcc-patches; +Cc: pinskia
Hi Modi,
Thanks for your patch!
> Adding support for extv<mode> and extzv<mode> on aarch64 as described in PR86901. I also changed
> extract_bit_field_using_extv to use gen_lowpart_if_possible instead of gen_lowpart directly. Using
> gen_lowpart directly will fail with an ICE in building libgcc when the compiler fails to successfully do so
> whereas gen_lowpart_if_possible will bail out of matching this pattern gracefully.
I did a quick bootstrap, this shows several failures like:
gcc/builtins.c:9427:1: error: unrecognizable insn:
9427 | }
| ^
(insn 212 211 213 24 (set (reg:SI 207)
(zero_extract:SI (reg:SI 206)
(const_int 26 [0x1a])
(const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
(nil))
The issue here is that 26+6 = 32 and that's not a valid ubfx encoding. Currently
cases like this are split into a right shift in aarch64.md around line 5569:
;; When the bit position and width add up to 32 we can use a W-reg LSR
;; instruction taking advantage of the implicit zero-extension of the X-reg.
(define_split
[(set (match_operand:DI 0 "register_operand")
(zero_extract:DI (match_operand:DI 1 "register_operand")
(match_operand 2
"aarch64_simd_shift_imm_offset_di")
(match_operand 3
"aarch64_simd_shift_imm_di")))]
"IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
GET_MODE_BITSIZE (DImode) - 1)
&& (INTVAL (operands[2]) + INTVAL (operands[3]))
== GET_MODE_BITSIZE (SImode)"
[(set (match_dup 0)
(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
{
operands[4] = gen_lowpart (SImode, operands[1]);
}
However that only supports DImode, not SImode, so it needs to be changed to
be more general using GPI.
Your new extv patterns should replace the magic patterns above it:
;; -------------------------------------------------------------------
;; Bitfields
;; -------------------------------------------------------------------
(define_expand "<optab>"
These are the current extv/extzv patterns, but without a mode. They are no longer
used when we start using the new ones.
Note you can write <optab><mode> to combine the extzv and extz patterns.
But please add a comment mentioning the pattern names so they are easy to find!
Besides a bootstrap it is always useful to compile a large body of code with your change
(say SPEC2006/2017) and check for differences in at least codesize. If there is an increase
in instruction count then there may be more issues that need to be resolved.
> I'm looking through past mails and https://gcc.gnu.org/contribute.html which details testing bootstrap.
> I'm building a cross-compiler (x64_aarch64) and the instructions don't address that scenario. The GCC
> cross-build is green and there's no regressions on the C/C++ tests (The go/fortran etc. look like they
> need additional infrastructure built on my side to work). Is there a workflow for cross-builds or should I
> aim to get an aarch64 native machine for full validation?
I find it easiest to develop on a many-core AArch64 server so you get much faster builds,
bootstraps and regression tests. Cross compilers are mostly useful if you want to test big-endian
or new architecture features which are not yet supported in hardware. You don't normally need
to test Go/Fortran/ADA etc unless your patch does something that would directly affect them.
Finally do you have a copyright assignment with the FSF?
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH][AARCH64] Fix for PR86901
2020-02-05 15:00 ` Wilco Dijkstra
@ 2020-02-07 2:12 ` Modi Mo via gcc-patches
2020-02-07 9:52 ` Richard Earnshaw (lists)
0 siblings, 1 reply; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-07 2:12 UTC (permalink / raw)
To: Wilco Dijkstra, gcc-patches; +Cc: pinskia
[-- Attachment #1: Type: text/plain, Size: 5737 bytes --]
> I did a quick bootstrap, this shows several failures like:
>
> gcc/builtins.c:9427:1: error: unrecognizable insn:
> 9427 | }
> | ^
> (insn 212 211 213 24 (set (reg:SI 207)
> (zero_extract:SI (reg:SI 206)
> (const_int 26 [0x1a])
> (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
> (nil))
>
> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
> Currently cases like this are split into a right shift in aarch64.md around line
> 5569:
Appreciate you taking a look and the validation. I've gotten access to an aarch64 server and the bootstrap demonstrated the issue you saw. This was caused by my re-definition of the pattern to:
+ if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
+ FAIL;
Which meant for SImode only a sum of >32 bit actually triggers the fail condition for the define_expand whereas the existing define_insn fails on >=32 bit. I looked into the architecture reference manual and the bits are available for ubfx/sbfx for that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but transforms it into a LSR:
Assembly file:
ubfx w0, w0, 24, 8
Disassembly of section .text:
0000000000000000 <f>:
0: 53187c00 lsr w0, w0, #24
Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other assemblers could trip over, I've attached a new patch that allows this encoding and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's better to explicitly do the transformation in GCC.
> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
> instruction taking advantage of the implicit zero-extension of the X-reg.
> (define_split
> [(set (match_operand:DI 0 "register_operand")
> (zero_extract:DI (match_operand:DI 1 "register_operand")
> (match_operand 2
> "aarch64_simd_shift_imm_offset_di")
> (match_operand 3
> "aarch64_simd_shift_imm_di")))]
> "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
> GET_MODE_BITSIZE (DImode) - 1)
> && (INTVAL (operands[2]) + INTVAL (operands[3]))
> == GET_MODE_BITSIZE (SImode)"
> [(set (match_dup 0)
> (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
> {
> operands[4] = gen_lowpart (SImode, operands[1]);
> }
>
> However that only supports DImode, not SImode, so it needs to be changed
> to be more general using GPI.
>
> Your new extv patterns should replace the magic patterns above it:
With the previous discovery that a sum of 32/64 will trigger LSR in the assembler I was curious what would happen if I remove this pattern. Turns out, you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. So this pattern still has value but I don't think it's necessary to extend it to DI since that'll automatically get turned into a LSR by the assembler as I previously mentioned.
> ;; -------------------------------------------------------------------
> ;; Bitfields
> ;; -------------------------------------------------------------------
>
> (define_expand "<optab>"
>
> These are the current extv/extzv patterns, but without a mode. They are no
> longer used when we start using the new ones.
>
> Note you can write <optab><mode> to combine the extzv and extz patterns.
> But please add a comment mentioning the pattern names so they are easy to
> find!
Good call here, made this change in the new patch. I did see the define_insn of these guys below it but somehow missed that they were expanded just above. I believe, from my understanding of GCC, that the matching pattern below the first line is what constrains <optab> into just extv/extsv from the long list of iterators it belongs to. Still, I see that there's constrained iterators elsewhere like:
;; Optab prefix for sign/zero-extending operations
(define_code_attr su_optab [(sign_extend "") (zero_extend "u")
I added a comment in this patch before the pattern. Thoughts on defining another constrained version to make it clearer (in addition or in lieu of the comment)?
> Besides a bootstrap it is always useful to compile a large body of code with
> your change (say SPEC2006/2017) and check for differences in at least
> codesize. If there is an increase in instruction count then there may be more
> issues that need to be resolved.
Sounds good. I'll get those setup and running and will report back on findings. What's the preferred way to measure codesize? I'm assuming by default the code pages are aligned so smaller differences would need to trip over the boundary to actually show up.
> I find it easiest to develop on a many-core AArch64 server so you get much
> faster builds, bootstraps and regression tests. Cross compilers are mostly
> useful if you want to test big-endian or new architecture features which are
> not yet supported in hardware. You don't normally need to test
> Go/Fortran/ADA etc unless your patch does something that would directly
> affect them.
Good to know. Sounds like our current testing with C/C++ testsuite is alright for most changes.
> Finally do you have a copyright assignment with the FSF?
Lawyers are working on that. Since I have SPEC codesize to go through the licensing may be ready before all analysis is done for the patch.
Modi
[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2637 bytes --]
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 4f5898185f5..46c1b870aaf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5514,21 +5514,21 @@
;; Bitfields
;; -------------------------------------------------------------------
-(define_expand "<optab>"
- [(set (match_operand:DI 0 "register_operand")
- (ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
(match_operand 2
- "aarch64_simd_shift_imm_offset_di")
- (match_operand 3 "aarch64_simd_shift_imm_di")))]
+ "aarch64_simd_shift_imm_offset_<mode>")
+ (match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
""
{
if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (DImode) - 1))
+ 1, GET_MODE_BITSIZE (<MODE>mode)))
FAIL;
}
)
-
(define_insn "*<optab><mode>"
[(set (match_operand:GPI 0 "register_operand" "=r")
(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5537,7 +5537,7 @@
(match_operand 3
"aarch64_simd_shift_imm_<mode>" "n")))]
"IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+ 1, GET_MODE_BITSIZE (<MODE>mode))"
"<su>bfx\\t%<w>0, %<w>1, %3, %2"
[(set_attr "type" "bfx")]
)
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
if (REG_P (target)
&& TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
- target = gen_lowpart (ext_mode, target);
+ target = gen_lowpart_if_possible (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
spec_target_subreg = target;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+ union
+ {
+ float f;
+ uint32_t i;
+ } u = {x};
+ return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+ if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+ *p = y * y;
+ else
+ g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH][AARCH64] Fix for PR86901
2020-02-07 2:12 ` Modi Mo via gcc-patches
@ 2020-02-07 9:52 ` Richard Earnshaw (lists)
2020-02-07 14:06 ` Wilco Dijkstra
0 siblings, 1 reply; 9+ messages in thread
From: Richard Earnshaw (lists) @ 2020-02-07 9:52 UTC (permalink / raw)
To: Modi Mo, Wilco Dijkstra, gcc-patches; +Cc: pinskia
On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:
>> I did a quick bootstrap, this shows several failures like:
>>
>> gcc/builtins.c:9427:1: error: unrecognizable insn:
>> 9427 | }
>> | ^
>> (insn 212 211 213 24 (set (reg:SI 207)
>> (zero_extract:SI (reg:SI 206)
>> (const_int 26 [0x1a])
>> (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>> (nil))
>>
>> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
>> Currently cases like this are split into a right shift in aarch64.md around line
>> 5569:
>
> Appreciate you taking a look and the validation. I've gotten access to an aarch64 server and the bootstrap demonstrated the issue you saw. This was caused by my re-definition of the pattern to:
> + if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
> + FAIL;
>
> Which meant for SImode only a sum of >32 bit actually triggers the fail condition for the define_expand whereas the existing define_insn fails on >=32 bit. I looked into the architecture reference manual and the bits are available for ubfx/sbfx for that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but transforms it into a LSR:
>
> Assembly file:
> ubfx w0, w0, 24, 8
>
> Disassembly of section .text:
>
> 0000000000000000 <f>:
> 0: 53187c00 lsr w0, w0, #24
This is how LSR is implemented in AArch64, as an extract. So the
disassembler is showing the canonical representation.
However, inside the compiler we really want to represent this as a
shift. Why? Because there are cases where we can then merge a shift
into other operations when we can't merge the more general extract
operation. For example, we can merge an LSR into a subsequent 'AND'
operation, but can't merge a more general extract into an AND.
Essentially, we want the compiler to describe this in the canonical
shift form rather than the extract form.
Ideally this would be handled inside the mid-end expansion of an
extract, but in the absence of that I think this is best done inside the
extv expansion so that we never end up with a real extract in that case.
>
> Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other assemblers could trip over, I've attached a new patch that allows this encoding and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's better to explicitly do the transformation in GCC.
>
>> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
>> instruction taking advantage of the implicit zero-extension of the X-reg.
>> (define_split
>> [(set (match_operand:DI 0 "register_operand")
>> (zero_extract:DI (match_operand:DI 1 "register_operand")
>> (match_operand 2
>> "aarch64_simd_shift_imm_offset_di")
>> (match_operand 3
>> "aarch64_simd_shift_imm_di")))]
>> "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>> GET_MODE_BITSIZE (DImode) - 1)
>> && (INTVAL (operands[2]) + INTVAL (operands[3]))
>> == GET_MODE_BITSIZE (SImode)"
>> [(set (match_dup 0)
>> (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
>> {
>> operands[4] = gen_lowpart (SImode, operands[1]);
>> }
>>
>> However that only supports DImode, not SImode, so it needs to be changed
>> to be more general using GPI.
>>
>> Your new extv patterns should replace the magic patterns above it:
>
> With the previous discovery that a sum of 32/64 will trigger LSR in the assembler I was curious what would happen if I remove this pattern. Turns out, you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. So this pattern still has value but I don't think it's necessary to extend it to DI since that'll automatically get turned into a LSR by the assembler as I previously mentioned.
>
>
>> ;; -------------------------------------------------------------------
>> ;; Bitfields
>> ;; -------------------------------------------------------------------
>>
>> (define_expand "<optab>"
>>
>> These are the current extv/extzv patterns, but without a mode. They are no
>> longer used when we start using the new ones.
>>
>> Note you can write <optab><mode> to combine the extzv and extz patterns.
>> But please add a comment mentioning the pattern names so they are easy to
>> find!
>
> Good call here, made this change in the new patch. I did see the define_insn of these guys below it but somehow missed that they were expanded just above. I believe, from my understanding of GCC, that the matching pattern below the first line is what constrains <optab> into just extv/extsv from the long list of iterators it belongs to. Still, I see that there's constrained iterators elsewhere like:
>
> ;; Optab prefix for sign/zero-extending operations
> (define_code_attr su_optab [(sign_extend "") (zero_extend "u")
>
> I added a comment in this patch before the pattern. Thoughts on defining another constrained version to make it clearer (in addition or in lieu of the comment)?
>
>> Besides a bootstrap it is always useful to compile a large body of code with
>> your change (say SPEC2006/2017) and check for differences in at least
>> codesize. If there is an increase in instruction count then there may be more
>> issues that need to be resolved.
>
> Sounds good. I'll get those setup and running and will report back on findings. What's the preferred way to measure codesize? I'm assuming by default the code pages are aligned so smaller differences would need to trip over the boundary to actually show up.
>
>> I find it easiest to develop on a many-core AArch64 server so you get much
>> faster builds, bootstraps and regression tests. Cross compilers are mostly
>> useful if you want to test big-endian or new architecture features which are
>> not yet supported in hardware. You don't normally need to test
>> Go/Fortran/ADA etc unless your patch does something that would directly
>> affect them.
>
> Good to know. Sounds like our current testing with C/C++ testsuite is alright for most changes.
>
>> Finally do you have a copyright assignment with the FSF?
>
> Lawyers are working on that. Since I have SPEC codesize to go through the licensing may be ready before all analysis is done for the patch.
>
> Modi
>
R.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH][AARCH64] Fix for PR86901
2020-02-07 9:52 ` Richard Earnshaw (lists)
@ 2020-02-07 14:06 ` Wilco Dijkstra
2020-02-21 23:17 ` Modi Mo via gcc-patches
0 siblings, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-02-07 14:06 UTC (permalink / raw)
To: Richard Earnshaw, Modi Mo, gcc-patches; +Cc: pinskia
Hi,
Richard wrote:
> However, inside the compiler we really want to represent this as a
>shift.
...
> Ideally this would be handled inside the mid-end expansion of an
> extract, but in the absence of that I think this is best done inside the
> extv expansion so that we never end up with a real extract in that case.
Yes the mid-end could be improved - it turns out it is due to expansion of
bitfields, all variations of (x & mask) >> N are optimized into shifts early on.
However it turns out Combine can already transform these zero/sign_extends
into shifts, so we do end up with good code. With the latest patch I get:
typedef struct { int x : 6, y : 6, z : 20; } X;
int f (int x, X *p) { return x + p->z; }
ldr w1, [x1]
add w0, w0, w1, asr 12
ret
So this case looks alright.
> Sounds good. I'll get those setup and running and will report back on findings. What's
> the preferred way to measure codesize? I'm assuming by default the code pages are
> aligned so smaller differences would need to trip over the boundary to actually show up.
You can use the size command on the binaries:
>size /bin/ls
text data bss dec hex filename
107271 2024 3472 112767 1b87f /bin/ls
As you can see it shows the text size in bytes. It is not rounded up to a page, so it is an
accurate measure of the codesize. Generally -O2 size is most useful to check (since that
is what most applications build with), but -Ofast -flto can be useful as well (the global
inlining means you get instruction combinations which appear less often with -O2).
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH][AARCH64] Fix for PR86901
2020-02-07 14:06 ` Wilco Dijkstra
@ 2020-02-21 23:17 ` Modi Mo via gcc-patches
2020-03-03 15:54 ` Wilco Dijkstra
2020-03-03 16:04 ` Wilco Dijkstra
0 siblings, 2 replies; 9+ messages in thread
From: Modi Mo via gcc-patches @ 2020-02-21 23:17 UTC (permalink / raw)
To: Wilco Dijkstra, Richard Earnshaw, gcc-patches; +Cc: pinskia
[-- Attachment #1: Type: text/plain, Size: 2335 bytes --]
> > Sounds good. I'll get those setup and running and will report back on
> > findings. What's the preferred way to measure codesize? I'm assuming
> > by default the code pages are aligned so smaller differences would need to trip
> over the boundary to actually show up.
>
> You can use the size command on the binaries:
>
> >size /bin/ls
> text data bss dec hex filename
> 107271 2024 3472 112767 1b87f /bin/ls
>
> As you can see it shows the text size in bytes. It is not rounded up to a page, so it
> is an accurate measure of the codesize. Generally -O2 size is most useful to
> check (since that is what most applications build with), but -Ofast -flto can be
> useful as well (the global inlining means you get instruction combinations which
> appear less often with -O2).
>
> Cheers,
> Wilco
Alrighty, I've got spec 2017 and spec 2006 setup and building. Using default configurations so -O2 in spec2006 and -O3 in spec2017. Testing the patch as last sent showed a 1% code size regression in spec 2017 perlbench which turned out to be a missing pattern for tbnz and all its variants:
(define_insn "*tb<optab><mode>1"
[(set (pc) (if_then_else
(EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r") <--- only matches against zero_extract:DI
(const_int 1)
(match_operand 1
"aarch64_simd_shift_imm_<mode>" "n"))
The zero extract now matching against other modes would generate a test + branch rather than the combined instruction which led to the code size regression. I've updated the patch so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.
Spec results are attached for reference.
@Wilco I've gotten instruction on my side to set up an individual contributor's license for the time being. Can you send me the necessary documents to make that happen? Thanks!
ChangeLog:
2020-02-21 Di Mo <modimo@microsoft.com>
gcc/
* config/aarch64/aarch64.md: Add GPI modes to extsv/extv patterns. Allow tb<optab><mode>1 pattern to match against zero_extract:GPI.
* expmed.c (extract_bit_field_using_extv): Change gen_lowpart to gen_lowpart_if_possible to avoid compiler assert building libgcc.
testsuite/
* gcc.target/aarch64/pr86901.c: Add new test.
Modi
[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 2962 bytes --]
diff --git a/config/aarch64/aarch64.md b/config/aarch64/aarch64.md
index 4f5898185f5..46c1b870aaf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -786,7 +786,7 @@
(define_insn "*tb<optab><mode>1"
[(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
+ (EQL (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r")
(const_int 1)
(match_operand 1
"aarch64_simd_shift_imm_<mode>" "n"))
@@ -5514,21 +5514,21 @@
;; Bitfields
;; -------------------------------------------------------------------
-(define_expand "<optab>"
- [(set (match_operand:DI 0 "register_operand")
- (ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
(match_operand 2
- "aarch64_simd_shift_imm_offset_di")
- (match_operand 3 "aarch64_simd_shift_imm_di")))]
+ "aarch64_simd_shift_imm_offset_<mode>")
+ (match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
""
{
if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (DImode) - 1))
+ 1, GET_MODE_BITSIZE (<MODE>mode)))
FAIL;
}
)
-
(define_insn "*<optab><mode>"
[(set (match_operand:GPI 0 "register_operand" "=r")
(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5537,7 +5537,7 @@
(match_operand 3
"aarch64_simd_shift_imm_<mode>" "n")))]
"IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+ 1, GET_MODE_BITSIZE (<MODE>mode))"
"<su>bfx\\t%<w>0, %<w>1, %3, %2"
[(set_attr "type" "bfx")]
)
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d767..926f4e8f1eaf 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1546,7 +1546,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
if (REG_P (target)
&& TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
- target = gen_lowpart (ext_mode, target);
+ target = gen_lowpart_if_possible (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
spec_target_subreg = target;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 000000000000..5dd6fdf75f44
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+ union
+ {
+ float f;
+ uint32_t i;
+ } u = {x};
+ return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+ if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+ *p = y * y;
+ else
+ g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */
[-- Attachment #3: spec2006 --]
[-- Type: application/octet-stream, Size: 2912 bytes --]
text data bss filename
1084394 20456 12616 benchspec/CPU2006/400.perlbench/exe/perlbench_base.base
1084858 20456 12616 benchspec/CPU2006/400.perlbench/exe/perlbench_base.test
58872 3800 4312 benchspec/CPU2006/401.bzip2/exe/bzip2_base.base
58872 3800 4312 benchspec/CPU2006/401.bzip2/exe/bzip2_base.test
3111150 7600 749024 benchspec/CPU2006/403.gcc/exe/gcc_base.base
3111510 7600 749024 benchspec/CPU2006/403.gcc/exe/gcc_base.test
12395 728 11904 benchspec/CPU2006/429.mcf/exe/mcf_base.base
12395 728 11904 benchspec/CPU2006/429.mcf/exe/mcf_base.test
110942 1513 37144 benchspec/CPU2006/433.milc/exe/milc_base.base
110942 1513 37144 benchspec/CPU2006/433.milc/exe/milc_base.test
221478 1024 496 benchspec/CPU2006/444.namd/exe/namd_base.base
222398 1024 496 benchspec/CPU2006/444.namd/exe/namd_base.test
1533494 1985324 2328704 benchspec/CPU2006/445.gobmk/exe/gobmk_base.base
1533526 1985324 2328704 benchspec/CPU2006/445.gobmk/exe/gobmk_base.test
2787753 3208 3512 benchspec/CPU2006/447.dealII/exe/dealII_base.base
2788536 3208 3512 benchspec/CPU2006/447.dealII/exe/dealII_base.test
382123 2408 1584 benchspec/CPU2006/450.soplex/exe/soplex_base.base
382051 2408 1584 benchspec/CPU2006/450.soplex/exe/soplex_base.test
824472 12944 161232 benchspec/CPU2006/453.povray/exe/povray_base.base
825784 12944 161232 benchspec/CPU2006/453.povray/exe/povray_base.test
271536 5680 81904 benchspec/CPU2006/456.hmmer/exe/hmmer_base.base
271552 5680 81904 benchspec/CPU2006/456.hmmer/exe/hmmer_base.test
130864 1984 2576192 benchspec/CPU2006/458.sjeng/exe/sjeng_base.base
130864 1984 2576192 benchspec/CPU2006/458.sjeng/exe/sjeng_base.test
38405 772 96 benchspec/CPU2006/462.libquantum/exe/libquantum_base.base
38397 772 96 benchspec/CPU2006/462.libquantum/exe/libquantum_base.test
485915 11108 370856 benchspec/CPU2006/464.h264ref/exe/h264ref_base.base
485883 11108 370856 benchspec/CPU2006/464.h264ref/exe/h264ref_base.test
11440 712 24 benchspec/CPU2006/470.lbm/exe/lbm_base.base
11440 712 24 benchspec/CPU2006/470.lbm/exe/lbm_base.test
621357 9560 14440 benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.base
621405 9560 14440 benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.test
37927 908 5144 benchspec/CPU2006/473.astar/exe/astar_base.base
37927 908 5144 benchspec/CPU2006/473.astar/exe/astar_base.test
171269 3500 32880 benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.base
171269 3500 32880 benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.test
3950983 140104 11360 benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.base
3952407 140104 11360 benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.test
2120 600 8 benchspec/CPU2006/998.specrand/exe/specrand_base.base
2120 600 8 benchspec/CPU2006/998.specrand/exe/specrand_base.test
2120 600 8 benchspec/CPU2006/999.specrand/exe/specrand_base.base
2120 600 8 benchspec/CPU2006/999.specrand/exe/specrand_base.test
[-- Attachment #4: spec2017 --]
[-- Type: application/octet-stream, Size: 3456 bytes --]
text data bss filename
2379985 8520 8896 benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.Baseline-64
2380673 8520 8896 benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.test-64
9992763 33832 1146480 benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.Baseline-64
9993491 33832 1146480 benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.test-64
29671 744 720 benchspec/CPU/505.mcf_r/exe/mcf_r_base.Baseline-64
29671 744 720 benchspec/CPU/505.mcf_r/exe/mcf_r_base.test-64
729385 984 960 benchspec/CPU/508.namd_r/exe/namd_r_base.Baseline-64
730617 984 960 benchspec/CPU/508.namd_r/exe/namd_r_base.test-64
8791120 8016 16128 benchspec/CPU/510.parest_r/exe/parest_r_base.Baseline-64
8793944 8016 16128 benchspec/CPU/510.parest_r/exe/parest_r_base.test-64
22536 816 24 benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.Baseline-64
22528 816 24 benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.test-64
992982 32264 188784 benchspec/CPU/511.povray_r/exe/povray_r_base.Baseline-64
994654 32264 188784 benchspec/CPU/511.povray_r/exe/povray_r_base.test-64
14568 712 24 benchspec/CPU/519.lbm_r/exe/lbm_r_base.Baseline-64
14568 712 24 benchspec/CPU/519.lbm_r/exe/lbm_r_base.test-64
2206053 8968 46472 benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.Baseline-64
2206933 8968 46472 benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.test-64
5654485 142560 14752 benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.Baseline-64
5656441 142560 14752 benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.test-64
22840 792 24 benchspec/CPU/525.x264_r/exe/imagevalidate_525_base.Baseline-64
22832 792 24 benchspec/CPU/525.x264_r/exe/imagevalidate_525_base.test-64
580687 3312 13104 benchspec/CPU/525.x264_r/exe/ldecod_r_base.Baseline-64
580895 3312 13104 benchspec/CPU/525.x264_r/exe/ldecod_r_base.test-64
654788 6000 29440 benchspec/CPU/525.x264_r/exe/x264_r_base.Baseline-64
654908 6000 29440 benchspec/CPU/525.x264_r/exe/x264_r_base.test-64
10991284 4842724 417288 benchspec/CPU/526.blender_r/exe/blender_r_base.Baseline-64
10991652 4842724 417288 benchspec/CPU/526.blender_r/exe/blender_r_base.test-64
22536 816 24 benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.Baseline-64
22528 816 24 benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.test-64
90464 816 12138272 benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.Baseline-64
90608 816 12138272 benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.test-64
22507 784 24 benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.Baseline-64
22499 784 24 benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.test-64
2198219 10820 5520 benchspec/CPU/538.imagick_r/exe/imagick_r_base.Baseline-64
2197491 10820 5520 benchspec/CPU/538.imagick_r/exe/imagick_r_base.test-64
205038 8568 30016 benchspec/CPU/541.leela_r/exe/leela_r_base.Baseline-64
205046 8568 30016 benchspec/CPU/541.leela_r/exe/leela_r_base.test-64
205519 3560 381944 benchspec/CPU/544.nab_r/exe/nab_r_base.Baseline-64
205463 3560 381944 benchspec/CPU/544.nab_r/exe/nab_r_base.test-64
204894 908 17592 benchspec/CPU/557.xz_r/exe/xz_r_base.Baseline-64
204918 908 17592 benchspec/CPU/557.xz_r/exe/xz_r_base.test-64
4173 604 5008 benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.Baseline-64
4173 604 5008 benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.test-64
4173 604 5008 benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.Baseline-64
4173 604 5008 benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.test-64
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH][AARCH64] Fix for PR86901
2020-02-21 23:17 ` Modi Mo via gcc-patches
@ 2020-03-03 15:54 ` Wilco Dijkstra
2020-03-03 16:04 ` Wilco Dijkstra
1 sibling, 0 replies; 9+ messages in thread
From: Wilco Dijkstra @ 2020-03-03 15:54 UTC (permalink / raw)
To: Modi Mo, Richard Earnshaw, gcc-patches; +Cc: pinskia
Hi Modi,
> The zero extract now matching against other modes would generate a test + branch rather
> than the combined instruction which led to the code size regression. I've updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.
That's looking better indeed. I notice there are still differences, eg. tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly tbz reg, #7).
There are also more uses of uxtw, ubfiz, sbfiz - for example
@Wilco I've gotten instruction on my side to set up an individual contributor's license for the time being. Can you send me the necessary documents to make that happen? Thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH][AARCH64] Fix for PR86901
2020-02-21 23:17 ` Modi Mo via gcc-patches
2020-03-03 15:54 ` Wilco Dijkstra
@ 2020-03-03 16:04 ` Wilco Dijkstra
2020-04-30 23:15 ` Modi Mo
1 sibling, 1 reply; 9+ messages in thread
From: Wilco Dijkstra @ 2020-03-03 16:04 UTC (permalink / raw)
To: Modi Mo, Richard Earnshaw, gcc-patches; +Cc: pinskia
Hi Modi,
> The zero extract now matching against other modes would generate a test + branch rather
> than the combined instruction which led to the code size regression. I've updated the patch
> so that tbnz etc. matches GPI and that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.
That's looking better indeed. I notice there are still differences, eg. tbz/tbnz counts are
significantly different in perlbench, with ~350 missed cases overall (mostly tbz reg, #7).
There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like this in namd:
42c7dc: 13007400 sbfx w0, w0, #0, #30
42c7e0: 937c7c00 sbfiz x0, x0, #4, #32
So it would be a good idea to check any benchmarks where there is still a non-trivial
codesize difference. You can get a quick idea what is happening by grepping for
instructions like this:
grep -c sbfiz out1.txt out2.txt
out1.txt:872
out2.txt:934
grep -c tbnz out1.txt out2.txt
out1.txt:5189
out2.txt:4989
> Can you send me the necessary documents to make that happen? Thanks!
That's something you need to sort out with the fsf. There is a mailing list for this:
assign@gnu.org.
Cheers,
Wilco
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH][AARCH64] Fix for PR86901
2020-03-03 16:04 ` Wilco Dijkstra
@ 2020-04-30 23:15 ` Modi Mo
0 siblings, 0 replies; 9+ messages in thread
From: Modi Mo @ 2020-04-30 23:15 UTC (permalink / raw)
To: Wilco Dijkstra, Richard Earnshaw, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4077 bytes --]
Hey all-apologies for the long delay. Haven't had time until recently to look into this further.
>>> The zero extract now matching against other modes would generate a
>>> test + branch rather than the combined instruction which led to the
>>> code size regression. I've updated the patch so that tbnz etc. matches GPI and
>>that brings code size down to <0.2% in spec2017 and <0.4% in spec2006.
>>
>>That's looking better indeed. I notice there are still differences, eg. tbz/tbnz
>>counts are significantly different in perlbench, with ~350 missed cases overall
>>(mostly tbz reg, #7).
>>
>>There are also more uses of uxtw, ubfiz, sbfiz - for example I see cases like this
>>in namd:
>>
>> 42c7dc: 13007400 sbfx w0, w0, #0, #30
>> 42c7e0: 937c7c00 sbfiz x0, x0, #4, #32
>>
>>So it would be a good idea to check any benchmarks where there is still a non-
>>trivial codesize difference. You can get a quick idea what is happening by
>>grepping for instructions like this:
>>
>>grep -c sbfiz out1.txt out2.txt
>>out1.txt:872
>>out2.txt:934
>>
>>grep -c tbnz out1.txt out2.txt
>>out1.txt:5189
>>out2.txt:4989
That's really good insight Wilco! I took a look at the tbnz/tbz case in perl and we lose matching against this because allowing SI mode on extv/extzv causes subst in combine.c to generate:
(lshiftrt:SI (reg:SI 107 [ _16 ])
(const_int 7 [0x7]))
(nil)
Instead of:
(and:DI (lshiftrt:DI (subreg:DI (reg:SI 107 [ _16 ]) 0)
(const_int 7 [0x7]))
(const_int 1 [0x1]))
The latter case is picked up in make_compound_operation_int to transform into a zero_extract while the new case is left alone. A lshiftrt generally can't be reduced down to a bit-test but in this case it can because we have zero_bit information on it. Given that, looking around try_combine it seems like the best place to detect this pattern is in the 2nd chance code after the first failure of recog_for_combine which I've done in this patch. I think this is the place to put this fix given changing subst/make_compound_operation_int leads to significantly more diffs.
After this change the total number of tbnz/tbz lines up near identical to the baseline which is good and overall size within .1% on spec 2017 and spec 2006. However, looking further at ubfiz there's a pretty large increase in certain benchmarks. I looked into spec 2017/blender and we fail to combine this pattern:
Trying 653 -> 654:
653: r512:SI=r94:SI 0>>0x8
REG_DEAD r94:SI
654: r513:DI=zero_extend(r512:SI)
REG_DEAD r512:SI
Failed to match this instruction:
(set (reg:DI 513)
(zero_extract:DI (reg:SI 94 [ bswapdst_4 ])
(const_int 8 [0x8])
(const_int 8 [0x8])))
Where previously we combined it like this:
Trying 653 -> 654:
653: r512:SI=r94:SI 0>>0x8
REG_DEAD r94:SI
654: r513:DI=zero_extend(r512:SI)
REG_DEAD r512:SI
Successfully matched this instruction:
(set (reg:DI 513)
(zero_extract:DI (subreg:DI (reg:SI 94 [ bswapdst_4 ]) 0) // subreg used
(const_int 8 [0x8])
(const_int 8 [0x8])))
Here's where I'm at an impasse. The code that generates the modes in get_best_reg_extraction_insn looks at the inner mode of SI now that extzvsi is valid and generates a non-subreg use. However, the MD pattern is looking for all modes being DI or SI not a mix. I think a fix could be done to canonicalize these extracts to the same mode but am unsure if in general a mode mismatched extract RTX is valid which would make this a fairly large change.
Latest patch with fix for tbnz/tbz is attached alongside the numbers for SPEC and instruction count for SPEC 2017 are attached for reference.
>>> Can you send me the necessary documents to make that happen? Thanks!
>>
>>That's something you need to sort out with the fsf. There is a mailing list for this:
>>mailto:assign@gnu.org.
I haven't had any response from my previous mail there. Should I add one of you to the CC or mail someone specifically to get traction?
Best,
Modi
[-- Attachment #2: pr86901.patch --]
[-- Type: application/octet-stream, Size: 4504 bytes --]
diff --git a/gcc/combine.c b/gcc/combine.c
index cff76cd3303..fdb03f5d305 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -4166,6 +4166,40 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0,
}
}
+ /* Check for a case where we generate an IF_THEN_ELSE that depends on a
+ logical right shift. If we have nonzero_bit information that indicates the
+ shift will ultimately lead down to a single bit generate the equivalent
+ zero_extract pattern to see if the target has an appropriate instruction
+ that can handle this. */
+
+ else if (insn_code_number < 0 && asm_noperands (newpat) < 0
+ && GET_CODE (newpat) == SET
+ && GET_CODE (XEXP (newpat, 1)) == IF_THEN_ELSE)
+ {
+ rtx cond, true_rtx, false_rtx, new_rtx;
+
+ cond = if_then_else_cond (XEXP (newpat, 1), &true_rtx,
+ &false_rtx);
+ if (GET_CODE (cond) == LSHIFTRT)
+ {
+ unsigned HOST_WIDE_INT nonzero
+ = nonzero_bits (XEXP (cond, 0), GET_MODE (XEXP (cond, 0)));
+ unsigned HOST_WIDE_INT shift_amount = UINTVAL (XEXP (cond, 1));
+
+ if (nonzero >> shift_amount == 1)
+ {
+ new_rtx = make_extraction (GET_MODE (cond), XEXP (cond, 0),
+ shift_amount, 0, 1,
+ 1, 0, 1);
+
+ newpat = simplify_replace_rtx (newpat, cond, new_rtx);
+
+ insn_code_number = recog_for_combine (&newpat, i3, &new_i3_notes);
+ }
+
+ }
+ }
+
/* If it still isn't recognized, fail and change things back the way they
were. */
if ((insn_code_number < 0
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c7c4d1dd519..552dc712304 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -897,7 +897,8 @@
(define_insn "*tb<optab><mode>1"
[(set (pc) (if_then_else
- (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
+ (EQL (zero_extract:GPI
+ (match_operand:GPI 0 "register_operand" "r")
(const_int 1)
(match_operand 1
"aarch64_simd_shift_imm_<mode>" "n"))
@@ -5530,21 +5531,21 @@
;; Bitfields
;; -------------------------------------------------------------------
-(define_expand "<optab>"
- [(set (match_operand:DI 0 "register_operand")
- (ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+;; Defines extsv, extv patterns
+(define_expand "<optab><mode>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand")
(match_operand 2
- "aarch64_simd_shift_imm_offset_di")
- (match_operand 3 "aarch64_simd_shift_imm_di")))]
+ "aarch64_simd_shift_imm_offset_<mode>")
+ (match_operand 3 "aarch64_simd_shift_imm_<mode>")))]
""
{
if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (DImode) - 1))
+ 1, GET_MODE_BITSIZE (<MODE>mode)))
FAIL;
}
)
-
(define_insn "*<optab><mode>"
[(set (match_operand:GPI 0 "register_operand" "=r")
(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
@@ -5553,7 +5554,7 @@
(match_operand 3
"aarch64_simd_shift_imm_<mode>" "n")))]
"IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
- 1, GET_MODE_BITSIZE (<MODE>mode) - 1)"
+ 1, GET_MODE_BITSIZE (<MODE>mode))"
"<su>bfx\\t%<w>0, %<w>1, %3, %2"
[(set_attr "type" "bfx")]
)
diff --git a/gcc/expmed.c b/gcc/expmed.c
index e7c03fbf92c..f39ab04ced2 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -1564,7 +1564,7 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0,
if (REG_P (target)
&& TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
{
- target = gen_lowpart (ext_mode, target);
+ target = gen_lowpart_if_possible (ext_mode, target);
if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
spec_target_subreg = target;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr86901.c b/gcc/testsuite/gcc.target/aarch64/pr86901.c
new file mode 100644
index 00000000000..5dd6fdf75f4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr86901.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef unsigned int uint32_t;
+
+float g (float);
+
+static inline uint32_t
+top12 (float x)
+{
+ union
+ {
+ float f;
+ uint32_t i;
+ } u = {x};
+ return (u.i >> 20) & 0x7ff;
+}
+
+void
+f2 (float y, float *p)
+{
+ if (__builtin_expect (top12 (y) < top12 (1.0), 1))
+ *p = y * y;
+ else
+ g (y);
+}
+
+/* { dg-final { scan-assembler "fmov\tw" } } */
[-- Attachment #3: spec2006_v2.txt --]
[-- Type: text/plain, Size: 3965 bytes --]
base diff % increase
text data bss total filename text data bss total filename
1038264 243802 12472 1294538 base/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-base 1038344 243714 12472 1294530 diff/benchspec/CPU2006/400.perlbench/exe/perlbench_base.gcc9-diff 0.01%
72024 16030 4328 92382 base/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-base 72016 16030 4328 92374 diff/benchspec/CPU2006/401.bzip2/exe/bzip2_base.gcc9-diff -0.01%
2651976 816398 749792 4218166 base/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-base 2652096 816558 749792 4218446 diff/benchspec/CPU2006/403.gcc/exe/gcc_base.gcc9-diff 0.00%
8232 4803 11912 24947 base/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-base 8232 4803 11912 24947 diff/benchspec/CPU2006/429.mcf/exe/mcf_base.gcc9-diff 0.00%
105912 31228 37176 174316 base/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-base 105904 31228 37176 174308 diff/benchspec/CPU2006/433.milc/exe/milc_base.gcc9-diff -0.01%
205752 25512 496 231760 base/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-base 204792 25288 496 230576 diff/benchspec/CPU2006/444.namd/exe/namd_base.gcc9-diff -0.47%
757960 2917634 2328968 6004562 base/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-base 757952 2917634 2328968 6004554 diff/benchspec/CPU2006/445.gobmk/exe/gobmk_base.gcc9-diff 0.00%
2180808 880678 3528 3065014 base/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-base 2181248 880670 3528 3065446 diff/benchspec/CPU2006/447.dealII/exe/dealII_base.gcc9-diff 0.02%
345832 79234 1584 426650 base/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-base 345936 79234 1584 426754 diff/benchspec/CPU2006/450.soplex/exe/soplex_base.gcc9-diff 0.03%
779768 225837 161496 1167101 base/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-base 779904 225965 161496 1167365 diff/benchspec/CPU2006/453.povray/exe/povray_base.gcc9-diff 0.02%
249528 69146 81944 400618 base/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-base 249536 69146 81944 400626 diff/benchspec/CPU2006/456.hmmer/exe/hmmer_base.gcc9-diff 0.00%
125080 38096 2576288 2739464 base/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-base 125080 38096 2576288 2739464 diff/benchspec/CPU2006/458.sjeng/exe/sjeng_base.gcc9-diff 0.00%
30452 11213 96 41761 base/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-base 30420 11221 96 41737 diff/benchspec/CPU2006/462.libquantum/exe/libquantum_base.gcc9-diff -0.11%
567320 127533 371064 1065917 base/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-base 567304 127533 371064 1065901 diff/benchspec/CPU2006/464.h264ref/exe/h264ref_base.gcc9-diff 0.00%
8376 4584 24 12984 base/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-base 8368 4584 24 12976 diff/benchspec/CPU2006/470.lbm/exe/lbm_base.gcc9-diff -0.10%
484536 216255 14528 715319 base/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-base 484528 216255 14528 715311 diff/benchspec/CPU2006/471.omnetpp/exe/omnetpp_base.gcc9-diff 0.00%
33256 10075 5152 48483 base/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-base 33248 10075 5152 48475 diff/benchspec/CPU2006/473.astar/exe/astar_base.gcc9-diff -0.02%
155608 53889 32888 242385 base/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-base 155600 53889 32888 242377 diff/benchspec/CPU2006/482.sphinx3/exe/sphinx_livepretend_base.gcc9-diff -0.01%
2930312 1698488 11544 4640344 base/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-base 2930304 1698488 11544 4640336 diff/benchspec/CPU2006/483.xalancbmk/exe/Xalan_base.gcc9-diff 0.00%
1016 1728 8 2752 base/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-base 1016 1728 8 2752 diff/benchspec/CPU2006/998.specrand/exe/specrand_base.gcc9-diff 0.00%
1016 1728 8 2752 base/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-base 1016 1728 8 2752 diff/benchspec/CPU2006/999.specrand/exe/specrand_base.gcc9-diff 0.00%
12733028 12732844 0.00%
[-- Attachment #4: spec2017_insn_count.txt --]
[-- Type: text/plain, Size: 2447 bytes --]
tbnz tbz sbfiz ubfiz uxtw sxtw
base diff diff-base base diff diff-base base diff diff-base base diff diff-base base diff diff-base base diff diff-base
diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64 4394 4396 2 4442 4444 2 215 215 0 322 323 1 328 328 0 3217 3234 17
diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64 7755 7833 78 6645 6728 83 1235 1236 1 2801 3018 217 10271 10271 0 9770 9794 24
diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64 19 19 0 12 12 0 1 1 0 0 0 0 0 0 0 25 25 0
diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64 148 148 0 65 65 0 724 786 62 3302 3475 173 735 735 0 5057 5121 64
diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64 1295 1295 0 1447 1448 1 1694 1694 0 5775 5899 124 7367 7368 1 11068 11072 4
diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0
diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64 312 312 0 438 438 0 245 245 0 140 154 14 165 165 0 982 987 5
diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64 3 3 0 3 3 0 5 5 0 0 0 0 0 0 0 8 8 0
diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64 690 690 0 361 361 0 336 336 0 26 26 0 179 179 0 1216 1216 0
diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64 446 451 5 905 905 0 138 138 0 1108 1249 141 3946 3946 0 893 893 0
diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64 6750 6761 11 7849 7855 6 2125 2125 0 813 952 139 1341 1340 -1 8339 8358 19
diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0
diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64 43 43 0 13 13 0 18 18 0 19 23 4 9 9 0 414 424 10
diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64 5 5 0 8 8 0 3 3 0 25 27 2 2 2 0 13 13 0
diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64 340 340 0 497 497 0 2 2 0 682 682 0 75 75 0 157 158 1
diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64 24 24 0 26 26 0 13 13 0 6 10 4 44 44 0 595 595 0
diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64 56 56 0 70 70 0 92 92 0 36 47 11 10 10 0 454 454 0
diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64 22 22 0 9 9 0 1 1 0 115 118 3 251 251 0 28 28 0
diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0
diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 4 0
[-- Attachment #5: spec2017_v2.txt --]
[-- Type: text/plain, Size: 3716 bytes --]
base diff % increase
text data bss total filename text data bss total filename
1775240 561457 8760 2345457 base/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.base-64 1775432 561305 8760 2345497 diff/benchspec/CPU/500.perlbench_r/exe/perlbench_r_base.diff-64 0.01%
7755224 2191963 1146680 11093867 base/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.base-64 7754488 2192051 1146680 11093219 diff/benchspec/CPU/502.gcc_r/exe/cpugcc_r_base.diff-64 -0.01%
23064 6391 728 30183 base/benchspec/CPU/505.mcf_r/exe/mcf_r_base.base-64 23064 6391 728 30183 diff/benchspec/CPU/505.mcf_r/exe/mcf_r_base.diff-64 0.00%
677176 30046 960 708182 base/benchspec/CPU/508.namd_r/exe/namd_r_base.base-64 677816 30046 960 708822 diff/benchspec/CPU/508.namd_r/exe/namd_r_base.diff-64 0.09%
7082456 1704125 16128 8802709 base/benchspec/CPU/510.parest_r/exe/parest_r_base.base-64 7084584 1704145 16128 8804857 diff/benchspec/CPU/510.parest_r/exe/parest_r_base.diff-64 0.03%
14072 9141 24 23237 base/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.base-64 14072 9141 24 23237 diff/benchspec/CPU/511.povray_r/exe/imagevalidate_511_base.diff-64 0.00%
789000 246142 188784 1223926 base/benchspec/CPU/511.povray_r/exe/povray_r_base.base-64 789032 246134 188784 1223950 diff/benchspec/CPU/511.povray_r/exe/povray_r_base.diff-64 0.00%
10648 4744 24 15416 base/benchspec/CPU/519.lbm_r/exe/lbm_r_base.base-64 10640 4744 24 15408 diff/benchspec/CPU/519.lbm_r/exe/lbm_r_base.diff-64 -0.08%
1505352 744909 46536 2296797 base/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.base-64 1505344 744925 46536 2296805 diff/benchspec/CPU/520.omnetpp_r/exe/omnetpp_r_base.diff-64 0.00%
3992744 1933641 14752 5941137 base/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.base-64 3992736 1933641 14752 5941129 diff/benchspec/CPU/523.xalancbmk_r/exe/cpuxalan_r_base.diff-64 0.00%
7994116 7841208 417344 16252668 base/benchspec/CPU/526.blender_r/exe/blender_r_base.base-64 7994276 7841400 417344 16253020 diff/benchspec/CPU/526.blender_r/exe/blender_r_base.diff-64 0.00%
14072 9141 24 23237 base/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.base-64 14072 9141 24 23237 diff/benchspec/CPU/526.blender_r/exe/imagevalidate_526_base.diff-64 0.00%
76264 16488 12138272 12231024 base/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.base-64 76296 16488 12138272 12231056 diff/benchspec/CPU/531.deepsjeng_r/exe/deepsjeng_r_base.diff-64 0.04%
14072 9080 24 23176 base/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.base-64 14072 9080 24 23176 diff/benchspec/CPU/538.imagick_r/exe/imagevalidate_538_base.diff-64 0.00%
1772984 408975 5520 2187479 base/benchspec/CPU/538.imagick_r/exe/imagick_r_base.base-64 1772152 409119 5520 2186791 diff/benchspec/CPU/538.imagick_r/exe/imagick_r_base.diff-64 -0.05%
174616 46829 30032 251477 base/benchspec/CPU/541.leela_r/exe/leela_r_base.base-64 174624 46829 30032 251485 diff/benchspec/CPU/541.leela_r/exe/leela_r_base.diff-64 0.00%
166616 39807 381952 588375 base/benchspec/CPU/544.nab_r/exe/nab_r_base.base-64 166616 39807 381952 588375 diff/benchspec/CPU/544.nab_r/exe/nab_r_base.diff-64 0.00%
131672 69274 17576 218522 base/benchspec/CPU/557.xz_r/exe/xz_r_base.base-64 131680 69274 17576 218530 diff/benchspec/CPU/557.xz_r/exe/xz_r_base.diff-64 0.01%
2440 2361 5008 9809 base/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.base-64 2440 2361 5008 9809 diff/benchspec/CPU/997.specrand_fr/exe/specrand_fr_base.diff-64 0.00%
2440 2361 5008 9809 base/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.base-64 2440 2361 5008 9809 diff/benchspec/CPU/999.specrand_ir/exe/specrand_ir_base.diff-64 0.00%
33974268 33975876 0.00%
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-04-30 23:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 0:01 [PATCH][AARCH64] Fix for PR86901 Modi Mo via gcc-patches
2020-02-05 15:00 ` Wilco Dijkstra
2020-02-07 2:12 ` Modi Mo via gcc-patches
2020-02-07 9:52 ` Richard Earnshaw (lists)
2020-02-07 14:06 ` Wilco Dijkstra
2020-02-21 23:17 ` Modi Mo via gcc-patches
2020-03-03 15:54 ` Wilco Dijkstra
2020-03-03 16:04 ` Wilco Dijkstra
2020-04-30 23:15 ` Modi Mo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).