Re: [PATCH][AARCH64] Fix for PR86901 - Richard Earnshaw (lists)

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
To: Modi Mo <modimo@microsoft.com>,
	Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: "pinskia@gmail.com" <pinskia@gmail.com>
Subject: Re: [PATCH][AARCH64] Fix for PR86901
Date: Fri, 07 Feb 2020 09:52:00 -0000	[thread overview]
Message-ID: <3e830159-b19e-6373-bd4a-41368474978d@arm.com> (raw)
In-Reply-To: <BY5PR21MB1508637166D2F202909421C9CC1C0@BY5PR21MB1508.namprd21.prod.outlook.com>

On 07/02/2020 02:12, Modi Mo via gcc-patches wrote:
>> I did a quick bootstrap, this shows several failures like:
>>
>> gcc/builtins.c:9427:1: error: unrecognizable insn:
>>   9427 | }
>>        | ^
>> (insn 212 211 213 24 (set (reg:SI 207)
>>          (zero_extract:SI (reg:SI 206)
>>              (const_int 26 [0x1a])
>>              (const_int 6 [0x6]))) "/gcc/builtins.c":9413:44 -1
>>       (nil))
>>
>> The issue here is that 26+6 = 32 and that's not a valid ubfx encoding.
>> Currently cases like this are split into a right shift in aarch64.md around line
>> 5569:
> 
> Appreciate you taking a look and the validation. I've gotten access to an aarch64 server and the bootstrap demonstrated the issue you saw. This was caused by my re-definition of the pattern to:
> +  if (width == 0 || (pos + width) > GET_MODE_BITSIZE (<MODE>mode))
> +    FAIL;
> 
> Which meant for SImode only a sum of >32 bit actually triggers the fail condition for the define_expand whereas the existing define_insn fails on >=32 bit. I looked into the architecture reference manual and the bits are available for ubfx/sbfx for that type of encoding and the documentation says you can use [lsb, 32-lsb] for SImode as a legal pair. Checking with the GNU assembler it does accept a sum of 32 but transforms it into a LSR:
> 
> Assembly file:
> ubfx	w0, w0, 24, 8
> 
> Disassembly of section .text:
> 
> 0000000000000000 <f>:
>     0:   53187c00        lsr     w0, w0, #24

This is how LSR is implemented in AArch64, as an extract.  So the 
disassembler is showing the canonical representation.

However, inside the compiler we really want to represent this as a 
shift.  Why?  Because there are cases where we can then merge a shift 
into other operations when we can't merge the more general extract 
operation.  For example, we can merge an LSR into a subsequent 'AND' 
operation, but can't merge a more general extract into an AND. 
Essentially, we want the compiler to describe this in the canonical 
shift form rather than the extract form.

Ideally this would be handled inside the mid-end expansion of an 
extract, but in the absence of that I think this is best done inside the 
extv expansion so that we never end up with a real extract in that case.


> 
> Similarly with the 64 bit version it'll become a 64 bit LSR. Certainly other assemblers could trip over, I've attached a new patch that allows this encoding and bootstrap + testing c/c++ testsuite looks good. I'll defer to you if it's better to explicitly do the transformation in GCC.
> 
>> ;; When the bit position and width add up to 32 we can use a W-reg LSR ;;
>> instruction taking advantage of the implicit zero-extension of the X-reg.
>> (define_split
>>    [(set (match_operand:DI 0 "register_operand")
>>          (zero_extract:DI (match_operand:DI 1 "register_operand")
>>                           (match_operand 2
>>                             "aarch64_simd_shift_imm_offset_di")
>>                           (match_operand 3
>>                             "aarch64_simd_shift_imm_di")))]
>>    "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
>>               GET_MODE_BITSIZE (DImode) - 1)
>>     && (INTVAL (operands[2]) + INTVAL (operands[3]))
>>         == GET_MODE_BITSIZE (SImode)"
>>    [(set (match_dup 0)
>>          (zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
>>    {
>>      operands[4] = gen_lowpart (SImode, operands[1]);
>>    }
>>
>> However that only supports DImode, not SImode, so it needs to be changed
>> to be more general using GPI.
>>
>> Your new extv patterns should replace the magic patterns above it:
> 
> With the previous discovery that a sum of 32/64 will trigger LSR in the assembler I was curious what would happen if I remove this pattern. Turns out, you will end up with a UBFX x0, x0, 24, 8 compared to a LSR w0, w0, 24 in the test case associated with this change (gcc.target/aarch64/ubfx_lsr_1.c) which doesn't get transformed into an LSR by the assembler since it's in 64 bit mode. So this pattern still has value but I don't think it's necessary to extend it to DI since that'll automatically get turned into a LSR by the assembler as I previously mentioned.
> 
> 
>> ;; -------------------------------------------------------------------
>> ;; Bitfields
>> ;; -------------------------------------------------------------------
>>
>> (define_expand "<optab>"
>>
>> These are the current extv/extzv patterns, but without a mode. They are no
>> longer used when we start using the new ones.
>>
>> Note you can write <optab><mode> to combine the extzv and extz patterns.
>> But please add a comment mentioning the pattern names so they are easy to
>> find!
> 
> Good call here, made this change in the new patch. I did see the define_insn of these guys below it but somehow missed that they were expanded just above. I believe, from my understanding of GCC, that the matching pattern below the first line is what constrains <optab> into just extv/extsv from the long list of iterators it belongs to. Still, I see that there's constrained iterators elsewhere like:
> 
> ;; Optab prefix for sign/zero-extending operations
> (define_code_attr su_optab [(sign_extend "") (zero_extend "u")
> 
> I added a comment in this patch before the pattern. Thoughts on defining another constrained version to make it clearer (in addition or in lieu of the comment)?
> 
>> Besides a bootstrap it is always useful to compile a large body of code with
>> your change (say SPEC2006/2017) and check for differences in at least
>> codesize. If there is an increase in instruction count then there may be more
>> issues that need to be resolved.
> 
> Sounds good. I'll get those setup and running and will report back on findings. What's the preferred way to measure codesize? I'm assuming by default the code pages are aligned so smaller differences would need to trip over the boundary to actually show up.
>   
>> I find it easiest to develop on a many-core AArch64 server so you get much
>> faster builds, bootstraps and regression tests. Cross compilers are mostly
>> useful if you want to test big-endian or new architecture features which are
>> not yet supported in hardware. You don't normally need to test
>> Go/Fortran/ADA etc unless your patch does something that would directly
>> affect them.
> 
> Good to know. Sounds like our current testing with C/C++ testsuite is alright for most changes.
> 
>> Finally do you have a copyright assignment with the FSF?
> 
> Lawyers are working on that. Since I have SPEC codesize to go through the licensing may be ready before all analysis is done for the patch.
> 
> Modi
> 

R.

next prev parent reply	other threads:[~2020-02-07  9:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-05  0:01 Modi Mo via gcc-patches
2020-02-05 15:00 ` Wilco Dijkstra
2020-02-07  2:12   ` Modi Mo via gcc-patches
2020-02-07  9:52     ` Richard Earnshaw (lists) [this message]
2020-02-07 14:06       ` Wilco Dijkstra
2020-02-21 23:17         ` Modi Mo via gcc-patches
2020-03-03 15:54           ` Wilco Dijkstra
2020-03-03 16:04           ` Wilco Dijkstra
2020-04-30 23:15             ` Modi Mo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e830159-b19e-6373-bd4a-41368474978d@arm.com \
    --to=richard.earnshaw@arm.com \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=modimo@microsoft.com \
    --cc=pinskia@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).