From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-410606-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 76761 invoked by alias); 20 Oct 2015 07:54:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 76747 invoked by uid 89); 20 Oct 2015 07:54:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-pa0-f52.google.com
Received: from mail-pa0-f52.google.com (HELO mail-pa0-f52.google.com) (209.85.220.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 20 Oct 2015 07:54:08 +0000
Received: by pabrc13 with SMTP id rc13so13792694pab.0        for <gcc-patches@gcc.gnu.org>; Tue, 20 Oct 2015 00:54:06 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:message-id:date:from:organization:user-agent         :mime-version:to:subject:references:in-reply-to:content-type         :content-transfer-encoding;        bh=rFlC1FrdD9oxij5I+VqEVMQC4EmzKNyZqXfSNxUEErU=;        b=f+DHBz/pUrDvcbgwfvo1e8iVtpETWb2JDXwIsrPwN5oIp0FsKnw98yWZ4kKOSO+U4X         QB9hsc26nuiMmHIU2gZpheWFOa3c0RA+w5w+aoVFhpv3Fg+/aoPF3bk9iEPX1c3YKyt0         rr2+MO/2v5FhQL6hKUvQpvJhf+apVE/jNrSG/kS+aTo0uSIndP3lCKujJ6ILIyAJY9EN         BWdakXr/HzHEw52TiuEp4qMkOLRnD7d5b1+nhwEZEzk641a9XDxzBj94EtrAfqngyqPr         p6ADyjkaC2205cHSzlg88v2uauk6OetUe3yylkoub2DSBXPgMK3PO6BwcYtkaZsLCEkP         m61g==
X-Gm-Message-State: ALoCoQl8N0R4LOiauP4NeYv3HDPlJ+4N3M091JR8rB51FkQP1/ZMulcGVyWzYMR8dU4BEO7vAPMR
X-Received: by 10.68.92.99 with SMTP id cl3mr2414270pbb.60.1445327646861;        Tue, 20 Oct 2015 00:54:06 -0700 (PDT)
Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128])        by smtp.googlemail.com with ESMTPSA id ir5sm2108379pbc.13.2015.10.20.00.54.05        (version=TLSv1/SSLv3 cipher=OTHER);        Tue, 20 Oct 2015 00:54:06 -0700 (PDT)
Message-ID: <5625F31B.5060203@linaro.org>
Date: Tue, 20 Oct 2015 08:11:00 -0000
From: Michael Collison <michael.collison@linaro.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>,  GCC Patches <gcc-patches@gcc.gnu.org>, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [ARM] Use vector wide add for mixed-mode adds
References: <5601E9B9.5060600@linaro.org> <560267B4.5070809@arm.com> <560D0567.40207@linaro.org> <56164D45.8040408@arm.com>
In-Reply-To: <56164D45.8040408@arm.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2015-10/txt/msg01827.txt.bz2

Hi Kyrill,

Since your email I have done the following:

1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian 
targets. The big endian patches produced no change in the test results. 
I still have several execution failures with targeting big endian with 
lto enabled.

2. I diff'd the rtl dumps from a big endian compiler with lto enabled 
and disabled. I also examined the assembly language and there no 
differences except for the .ascii directives.

I want to ask a question about existing patterns in neon.md that utilize 
the vec_select and all the lanes as my example does: Why are the 
following pattern not matched if the target is big endian?


(define_insn "neon_vec_unpack<US>_lo_<mode>"
   [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
         (SE:<V_unpack> (vec_select:<V_HALF>
               (match_operand:VU 1 "register_operand" "w")
               (match_operand:VU 2 "vect_par_constant_low" ""))))]
   "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %e1"
   [(set_attr "type" "neon_shift_imm_long")]
)

(define_insn "neon_vec_unpack<US>_hi_<mode>"
   [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
         (SE:<V_unpack> (vec_select:<V_HALF>
               (match_operand:VU 1 "register_operand" "w")
               (match_operand:VU 2 "vect_par_constant_high" ""))))]
   "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %f1"
   [(set_attr "type" "neon_shift_imm_long")]

These patterns are similar to the new patterns I am adding and I am 
wondering if my patterns should exclude BYTES_BIG_ENDIAN?

On 10/08/2015 04:02 AM, Kyrill Tkachov wrote:
> Hi Michael,
>
> On 01/10/15 11:05, Michael Collison wrote:
>> Kyrill,
>>
>> I have modified the patch to address your comments. I also modified
>> check_effective_target_vect_widen_sum_hi_to_si_pattern in
>> target-supports.exp to
>> indicate that arm neon supports vector widen sum of HImode to SImode.
>> This resolved
>> several test suite failures.
>>
>> Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have
>> four related execution failure
>> tests on armeb-non-linux-gnueabihf with -flto only.
>>
>> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
>> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test
>
> We'd want to get to the bottom of these before committing.
> Does codegen before and after the patch show anything?
> When it comes to big-endian and NEON, the fiddly parts are
> usually lane numbers. Do you need to select the proper lanes with
> ENDIAN_LANE_N like Charles in his patch at:
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html?
>
> Thanks,
> Kyrill
>
>>
>> I am debugging but have not tracked down the root cause yet. Feedback?
>>
>> 2015-07-22  Michael Collison <michael.collison@linaro.org>
>>
>>       * config/arm/neon.md (widen_<us>sum<mode>): New patterns
>>       where mode is VQI to improve mixed mode vectorization.
>>       * config/arm/neon.md 
>> (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
>>       define_insn to match low half of signed vaddw.
>>       * config/arm/neon.md 
>> (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
>>       define_insn to match high half of signed vaddw.
>>       * config/arm/neon.md 
>> (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
>>       define_insn to match low half of unsigned vaddw.
>>       * config/arm/neon.md 
>> (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
>>       define_insn to match high half of unsigned vaddw.
>>       * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
>>       * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
>>       * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
>>       * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
>>       * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
>>       * testsuite/lib/target-supports.exp
>>       (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
>>       that arm neon support vector widen sum of HImode TO SImode.
>
> Note that the testsuite changes should have their own ChangeLog entry
> with the paths there starting relative to gcc/testsuite/
>
>>
>> On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:
>>> Hi Michael,
>>>
>>> On 23/09/15 00:52, Michael Collison wrote:
>>>> This is a modified version of the previous patch that removes the
>>>> documentation and read-md.c fixes. These patches have been submitted
>>>> separately and approved.
>>>>
>>>> This patch is designed to address code that was not being 
>>>> vectorized due
>>>> to missing widening patterns in the ARM backend. Code such as:
>>>>
>>>> int t6(int len, void * dummy, short * __restrict x)
>>>> {
>>>>      len = len & ~31;
>>>>      int result = 0;
>>>>      __asm volatile ("");
>>>>      for (int i = 0; i < len; i++)
>>>>        result += x[i];
>>>>      return result;
>>>> }
>>>>
>>>> Validated on arm-none-eabi, arm-none-linux-gnueabi,
>>>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.
>>>>
>>>> 2015-09-22  Michael Collison <michael.collison@linaro.org>
>>>>
>>>>        * config/arm/neon.md (widen_<us>sum<mode>): New patterns
>>>>        where mode is VQI to improve mixed mode add vectorization.
>>>>
>>> Please list all the new define_expands and define_insns
>>> in the changelog. Also, please add an ChangeLog entry for
>>> the testsuite additions.
>>>
>>> The approach looks ok to me with a few comments on some
>>> parts of the patch itself.
>>>
>>>
>>> +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
>>> +  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
>>> +    (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW
>>> (match_operand:VQI 1 "s_register_operand" "%w")
>>> +                           (match_operand:VQI 2
>>> "vect_par_constant_high" "")))
>>> +                (match_operand:<VW:V_widen> 3 "s_register_operand"
>>> "0")))]
>>> +  "TARGET_NEON"
>>> +  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
>>> +  [(set_attr "type" "neon_add_widen")
>>> +  (set_attr "length" "8")]
>>> +)
>>>
>>>
>>> This is a single instruction, and it has a length of 4, so no need to
>>> override the length attribute.
>>> Same with the other define_insns in this patch.
>>>
>>>
>>> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
>>> b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
>>> new file mode 100644
>>> index 0000000..ed10669
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
>>> @@ -0,0 +1,21 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target arm_neon_hw } */
>>>
>>> The arm_neon_hw check is usually used when you want to run the tests.
>>> Since this is a compile-only tests you just need arm_neon_ok.
>>>
>>>   +/* { dg-add-options arm_neon_ok } */
>>> +/* { dg-options "-O3" } */
>>> +
>>> +
>>> +int
>>> +t6(int len, void * dummy, short * __restrict x)
>>> +{
>>> +  len = len & ~31;
>>> +  int result = 0;
>>> +  __asm volatile ("");
>>> +  for (int i = 0; i < len; i++)
>>> +    result += x[i];
>>> +  return result;
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "vaddw\.s16" } } */
>>> +
>>> +
>>> +
>>>
>>> Stray trailing newlines. Similar comments for the other testcases.
>>>
>>> Thanks,
>>> Kyrill
>>>
>

-- 
Michael Collison
Linaro Toolchain Working Group
michael.collison@linaro.org