public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
	GCC Patches <gcc-patches@gcc.gnu.org>,
		Marcus Shawcroft <marcus.shawcroft@arm.com>,
	Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false
Date: Fri, 26 Feb 2016 21:59:00 -0000	[thread overview]
Message-ID: <CAMe9rOrxiTdLMNT4t_UyVtqwL+dqhdaeA_cCFCYFdvHEQzVcPQ@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOoVpo-zn1cpbZygT_SCGPdm2Uivwm-eLjWsobDMBsYU2w@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5738 bytes --]

On Fri, Feb 26, 2016 at 1:44 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>> On Thu, Feb 25, 2016 at 09:25:45AM +0000, Kyrill Tkachov wrote:
>>> Hi all,
>>>
>>> In this wrong-code PR we get bad code when synthesising a TImode right shift
>>> by variable amount using DImode shifts during expand.
>>>
>>> The expand_double_word_shift function expands two paths: one where the
>>> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for aarch64) at runtime
>>> and one where it's less than that, and performs a conditional select between the two in the end.
>>>
>>> The second path is the one that goes wrong here.
>>> The algorithm it uses for calculating a TImode shift when the variable shift amount is <64 is:
>>>
>>>
>>>   ------DImode-----   ------DImode----- ------DImode-----   ------DImode-----
>>> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ source-lo ___| } >> var_amnt
>>>
>>> 1) carry = source-hi << 1
>>> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending on shift_truncation_mask
>>> 3) carry = carry << tmp // net effect is that carry is source-hi shifted left by BITS_PER_WORD - var_amnt
>>> 4) target-lo = source-lo >>u var_amnt //unsigned shift.
>>> 5) target-lo = target-lo | carry
>>> 6) target-hi = source-hi >> var_amnt
>>>
>>> where the two DImode words source-hi and source-lo are the two words of the
>>> source TImode value and var_amnt is the register holding the variable shift
>>> amount. This is performed by the expand_subword_shift function in optabs.c.
>>>
>>> Step 2) is valid only if the target truncates the shift amount by the width
>>> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and
>>> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.
>>>
>>> Step 3) is the one that goes wrong.  On aarch64 a left shift is usually
>>> implemented using the LSL instruction but we also have alternatives that can
>>> use the SIMD registers and the USHL instruction.  In this case we end up
>>> using the USHL instruction. However, although the USHL instruction does
>>> a DImode shift, it doesn't truncate the shift amount by 64, but rather by
>>> 255. Furthermore, the shift amount is interpreted as a 2's complement signed
>>> integer and if it's negative performs a right shift.  This is in contrast
>>> with the normal LSL instruction which just performs an unsigned shift by an
>>> amount truncated by 64.
>>>
>>> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't
>>> assume shift amounts are truncated unless we know we can only ever use the
>>> LSL instructions for variable shifts.
>>>
>>> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode,
>>> so expand_subword_shift assumes that since the mask is non-zero it's a valid
>>> shift truncation mask.  The documentation for TARGET_SHIFT_TRUNCATION_MASK
>>> says:
>>> " The default implementation of this function returns
>>>   @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
>>>   and 0 otherwise."
>>>
>>> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their
>>> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when
>>> SHIFT_COUNT_TRUNCATED is false.  This is what the patch does, and with it
>>> step 2) becomes:
>>> 2) tmp = BITS_PER_WORD - 1 - var_amnt
>>> which behaves correctly on aarch64.
>>>
>>> Unfortunately, the testcase from the PR has very recently gone latent on
>>> trunk because it depends on register allocation picking a particular
>>> alternative from the *aarch64_ashl_sisd_or_int_<mode>3 pattern in aarch64.md.
>>> I tried to do some inline asm tricks to get it to force the correct
>>> constraints but was unsuccessful. Nevertheless I've included the testcase in
>>> the patch. I suppose it's better than nothing.
>>
>> Thanks for the great write-up. This level of detail is very helpful for
>> review.
>>
>> OK for trunk.
>>
>> Thanks,
>> James
>>
>>>
>>> Bootstrapped and tested on aarch64.
>>>
>>> Ok for trunk?
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>     PR target/69613
>>>     * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
>>>     Return 0 if !SHIFT_COUNT_TRUNCATED
>>>
>>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>     PR target/69613
>>>     * gcc.dg/torture/pr69613.c: New test.
>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
>>>  aarch64_shift_truncation_mask (machine_mode mode)
>>>  {
>>>    return
>>> -    (aarch64_vector_mode_supported_p (mode)
>>> +    (!SHIFT_COUNT_TRUNCATED
>>> +     || aarch64_vector_mode_supported_p (mode)
>>>       || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
>>>  }
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
>>> @@ -0,0 +1,40 @@
>>> +/* PR target/69613.  */
>>> +/* { dg-do run { target int128 } } */
>>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is incorrect.  You need to check AVX run-time to run
> it on x86.
>

Something like this?


-- 
H.J.

[-- Attachment #2: pr69613.patch --]
[-- Type: text/x-patch, Size: 612 bytes --]

diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c
index 44f2b0c..99375f1 100644
--- a/gcc/testsuite/gcc.dg/torture/pr69613.c
+++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
@@ -1,6 +1,7 @@
 /* PR target/69613.  */
 /* { dg-do run { target int128 } } */
-/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-Wno-psabi" { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
 
 typedef unsigned short u16;
 typedef unsigned short v32u16 __attribute__ ((vector_size (32)));

      reply	other threads:[~2016-02-26 21:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-25  9:25 Kyrill Tkachov
2016-02-26 15:50 ` James Greenhalgh
2016-02-26 21:44   ` H.J. Lu
2016-02-26 21:59     ` H.J. Lu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMe9rOrxiTdLMNT4t_UyVtqwL+dqhdaeA_cCFCYFdvHEQzVcPQ@mail.gmail.com \
    --to=hjl.tools@gmail.com \
    --cc=Richard.Earnshaw@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=kyrylo.tkachov@foss.arm.com \
    --cc=marcus.shawcroft@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).