From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-422309-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 7613 invoked by alias); 26 Feb 2016 21:44:37 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 7579 invoked by uid 89); 26 Feb 2016 21:44:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=james.greenhalgh@arm.com, sk:james.g, jamesgreenhalgharmcom, U*james.greenhalgh
X-HELO: mail-qg0-f46.google.com
Received: from mail-qg0-f46.google.com (HELO mail-qg0-f46.google.com) (209.85.192.46) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 26 Feb 2016 21:44:33 +0000
Received: by mail-qg0-f46.google.com with SMTP id y9so76857819qgd.3        for <gcc-patches@gcc.gnu.org>; Fri, 26 Feb 2016 13:44:32 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:date         :message-id:subject:from:to:cc;        bh=qUdy3h1kcTIO1QS2pJb6BYTvCLmbGoz+7Tgnh/zC/ZQ=;        b=FTD+UYKOMVqpuoFpOrFIji6cqj/iJP8/18cNy0lcpNXZE9mbMZg/FwEPoqaUKOOfcs         izV8WTllFQrxBQ9KtUb3FIDA26lh/ksiwf3p2R/ptY2HEHHU1nVvzOjKivGqndNLHh4U         ohEfEWvOBowkyZxIjd7v9ZHwjHnKJZ+NpS/TIbzRgmCu8dpkksBCEurvekSx9ELNvzJc         GoQEDezHGQa50gPQ7fvKw+wzqLGqvT14OfihLYZePItvKUDEifmPyow6suLIgS6nwdcT         wqU4Rkv1mfGLoB7C7NQBhIDZY8B8oDNqnng1JGj29LGLeZQkcr68sNwM6AJ/hsVZhEX1         kpEQ==
X-Gm-Message-State: AD7BkJKX0A3t6qgN4h5BOyv7/BNF1na3mGWo2ZrlQ5EtSX7NHVrOKlOvOiraB85thXvkdnw7D1hRgRnMpuXoPg==
MIME-Version: 1.0
X-Received: by 10.140.246.196 with SMTP id r187mr5130700qhc.47.1456523071067; Fri, 26 Feb 2016 13:44:31 -0800 (PST)
Received: by 10.55.15.199 with HTTP; Fri, 26 Feb 2016 13:44:31 -0800 (PST)
In-Reply-To: <20160226155034.GD40219@arm.com>
References: <56CEC899.5020701@foss.arm.com>	<20160226155034.GD40219@arm.com>
Date: Fri, 26 Feb 2016 21:44:00 -0000
Message-ID: <CAMe9rOoVpo-zn1cpbZygT_SCGPdm2Uivwm-eLjWsobDMBsYU2w@mail.gmail.com>
Subject: Re: [PATCH][AArch64] PR target/69613: Return zero TARGET_SHIFT_TRUNCATION_MASK when SHIFT_COUNT_TRUNCATED is false
From: "H.J. Lu" <hjl.tools@gmail.com>
To: James Greenhalgh <james.greenhalgh@arm.com>
Cc: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>, GCC Patches <gcc-patches@gcc.gnu.org>, 	Marcus Shawcroft <marcus.shawcroft@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-02/txt/msg01860.txt.bz2

On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Thu, Feb 25, 2016 at 09:25:45AM +0000, Kyrill Tkachov wrote:
>> Hi all,
>>
>> In this wrong-code PR we get bad code when synthesising a TImode right shift
>> by variable amount using DImode shifts during expand.
>>
>> The expand_double_word_shift function expands two paths: one where the
>> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for aarch64) at runtime
>> and one where it's less than that, and performs a conditional select between the two in the end.
>>
>> The second path is the one that goes wrong here.
>> The algorithm it uses for calculating a TImode shift when the variable shift amount is <64 is:
>>
>>
>>   ------DImode-----   ------DImode----- ------DImode-----   ------DImode-----
>> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ source-lo ___| } >> var_amnt
>>
>> 1) carry = source-hi << 1
>> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending on shift_truncation_mask
>> 3) carry = carry << tmp // net effect is that carry is source-hi shifted left by BITS_PER_WORD - var_amnt
>> 4) target-lo = source-lo >>u var_amnt //unsigned shift.
>> 5) target-lo = target-lo | carry
>> 6) target-hi = source-hi >> var_amnt
>>
>> where the two DImode words source-hi and source-lo are the two words of the
>> source TImode value and var_amnt is the register holding the variable shift
>> amount. This is performed by the expand_subword_shift function in optabs.c.
>>
>> Step 2) is valid only if the target truncates the shift amount by the width
>> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and
>> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case.
>>
>> Step 3) is the one that goes wrong.  On aarch64 a left shift is usually
>> implemented using the LSL instruction but we also have alternatives that can
>> use the SIMD registers and the USHL instruction.  In this case we end up
>> using the USHL instruction. However, although the USHL instruction does
>> a DImode shift, it doesn't truncate the shift amount by 64, but rather by
>> 255. Furthermore, the shift amount is interpreted as a 2's complement signed
>> integer and if it's negative performs a right shift.  This is in contrast
>> with the normal LSL instruction which just performs an unsigned shift by an
>> amount truncated by 64.
>>
>> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't
>> assume shift amounts are truncated unless we know we can only ever use the
>> LSL instructions for variable shifts.
>>
>> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode,
>> so expand_subword_shift assumes that since the mask is non-zero it's a valid
>> shift truncation mask.  The documentation for TARGET_SHIFT_TRUNCATION_MASK
>> says:
>> " The default implementation of this function returns
>>   @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED}
>>   and 0 otherwise."
>>
>> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their
>> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when
>> SHIFT_COUNT_TRUNCATED is false.  This is what the patch does, and with it
>> step 2) becomes:
>> 2) tmp = BITS_PER_WORD - 1 - var_amnt
>> which behaves correctly on aarch64.
>>
>> Unfortunately, the testcase from the PR has very recently gone latent on
>> trunk because it depends on register allocation picking a particular
>> alternative from the *aarch64_ashl_sisd_or_int_<mode>3 pattern in aarch64.md.
>> I tried to do some inline asm tricks to get it to force the correct
>> constraints but was unsuccessful. Nevertheless I've included the testcase in
>> the patch. I suppose it's better than nothing.
>
> Thanks for the great write-up. This level of detail is very helpful for
> review.
>
> OK for trunk.
>
> Thanks,
> James
>
>>
>> Bootstrapped and tested on aarch64.
>>
>> Ok for trunk?
>>
>> Thanks,
>> Kyrill
>>
>>
>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     PR target/69613
>>     * config/aarch64/aarch64.c (aarch64_shift_truncation_mask):
>>     Return 0 if !SHIFT_COUNT_TRUNCATED
>>
>> 2016-02-25  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>     PR target/69613
>>     * gcc.dg/torture/pr69613.c: New test.
>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT
>>  aarch64_shift_truncation_mask (machine_mode mode)
>>  {
>>    return
>> -    (aarch64_vector_mode_supported_p (mode)
>> +    (!SHIFT_COUNT_TRUNCATED
>> +     || aarch64_vector_mode_supported_p (mode)
>>       || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
>>  }
>>
>> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c b/gcc/testsuite/gcc.dg/torture/pr69613.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c
>> @@ -0,0 +1,40 @@
>> +/* PR target/69613.  */
>> +/* { dg-do run { target int128 } } */
>> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is incorrect.  You need to check AVX run-time to run
it on x86.

-- 
H.J.