From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 76761 invoked by alias); 20 Oct 2015 07:54:13 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 76747 invoked by uid 89); 20 Oct 2015 07:54:12 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f52.google.com Received: from mail-pa0-f52.google.com (HELO mail-pa0-f52.google.com) (209.85.220.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 20 Oct 2015 07:54:08 +0000 Received: by pabrc13 with SMTP id rc13so13792694pab.0 for ; Tue, 20 Oct 2015 00:54:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:organization:user-agent :mime-version:to:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=rFlC1FrdD9oxij5I+VqEVMQC4EmzKNyZqXfSNxUEErU=; b=f+DHBz/pUrDvcbgwfvo1e8iVtpETWb2JDXwIsrPwN5oIp0FsKnw98yWZ4kKOSO+U4X QB9hsc26nuiMmHIU2gZpheWFOa3c0RA+w5w+aoVFhpv3Fg+/aoPF3bk9iEPX1c3YKyt0 rr2+MO/2v5FhQL6hKUvQpvJhf+apVE/jNrSG/kS+aTo0uSIndP3lCKujJ6ILIyAJY9EN BWdakXr/HzHEw52TiuEp4qMkOLRnD7d5b1+nhwEZEzk641a9XDxzBj94EtrAfqngyqPr p6ADyjkaC2205cHSzlg88v2uauk6OetUe3yylkoub2DSBXPgMK3PO6BwcYtkaZsLCEkP m61g== X-Gm-Message-State: ALoCoQl8N0R4LOiauP4NeYv3HDPlJ+4N3M091JR8rB51FkQP1/ZMulcGVyWzYMR8dU4BEO7vAPMR X-Received: by 10.68.92.99 with SMTP id cl3mr2414270pbb.60.1445327646861; Tue, 20 Oct 2015 00:54:06 -0700 (PDT) Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id ir5sm2108379pbc.13.2015.10.20.00.54.05 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 20 Oct 2015 00:54:06 -0700 (PDT) Message-ID: <5625F31B.5060203@linaro.org> Date: Tue, 20 Oct 2015 08:11:00 -0000 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Kyrill Tkachov , GCC Patches , Ramana Radhakrishnan Subject: Re: [ARM] Use vector wide add for mixed-mode adds References: <5601E9B9.5060600@linaro.org> <560267B4.5070809@arm.com> <560D0567.40207@linaro.org> <56164D45.8040408@arm.com> In-Reply-To: <56164D45.8040408@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2015-10/txt/msg01827.txt.bz2 Hi Kyrill, Since your email I have done the following: 1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian targets. The big endian patches produced no change in the test results. I still have several execution failures with targeting big endian with lto enabled. 2. I diff'd the rtl dumps from a big endian compiler with lto enabled and disabled. I also examined the assembly language and there no differences except for the .ascii directives. I want to ask a question about existing patterns in neon.md that utilize the vec_select and all the lanes as my example does: Why are the following pattern not matched if the target is big endian? (define_insn "neon_vec_unpack_lo_" [(set (match_operand: 0 "register_operand" "=w") (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_low" ""))))] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl. %q0, %e1" [(set_attr "type" "neon_shift_imm_long")] ) (define_insn "neon_vec_unpack_hi_" [(set (match_operand: 0 "register_operand" "=w") (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_high" ""))))] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl. %q0, %f1" [(set_attr "type" "neon_shift_imm_long")] These patterns are similar to the new patterns I am adding and I am wondering if my patterns should exclude BYTES_BIG_ENDIAN? On 10/08/2015 04:02 AM, Kyrill Tkachov wrote: > Hi Michael, > > On 01/10/15 11:05, Michael Collison wrote: >> Kyrill, >> >> I have modified the patch to address your comments. I also modified >> check_effective_target_vect_widen_sum_hi_to_si_pattern in >> target-supports.exp to >> indicate that arm neon supports vector widen sum of HImode to SImode. >> This resolved >> several test suite failures. >> >> Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have >> four related execution failure >> tests on armeb-non-linux-gnueabihf with -flto only. >> >> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test > > We'd want to get to the bottom of these before committing. > Does codegen before and after the patch show anything? > When it comes to big-endian and NEON, the fiddly parts are > usually lane numbers. Do you need to select the proper lanes with > ENDIAN_LANE_N like Charles in his patch at: > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html? > > Thanks, > Kyrill > >> >> I am debugging but have not tracked down the root cause yet. Feedback? >> >> 2015-07-22 Michael Collison >> >> * config/arm/neon.md (widen_sum): New patterns >> where mode is VQI to improve mixed mode vectorization. >> * config/arm/neon.md >> (vec_sel_widen_ssum_lo3): New >> define_insn to match low half of signed vaddw. >> * config/arm/neon.md >> (vec_sel_widen_ssum_hi3): New >> define_insn to match high half of signed vaddw. >> * config/arm/neon.md >> (vec_sel_widen_usum_lo3): New >> define_insn to match low half of unsigned vaddw. >> * config/arm/neon.md >> (vec_sel_widen_usum_hi3): New >> define_insn to match high half of unsigned vaddw. >> * testsuite/gcc.target/arm/neon-vaddws16.c: New test. >> * testsuite/gcc.target/arm/neon-vaddws32.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. >> * testsuite/lib/target-supports.exp >> (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate >> that arm neon support vector widen sum of HImode TO SImode. > > Note that the testsuite changes should have their own ChangeLog entry > with the paths there starting relative to gcc/testsuite/ > >> >> On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: >>> Hi Michael, >>> >>> On 23/09/15 00:52, Michael Collison wrote: >>>> This is a modified version of the previous patch that removes the >>>> documentation and read-md.c fixes. These patches have been submitted >>>> separately and approved. >>>> >>>> This patch is designed to address code that was not being >>>> vectorized due >>>> to missing widening patterns in the ARM backend. Code such as: >>>> >>>> int t6(int len, void * dummy, short * __restrict x) >>>> { >>>> len = len & ~31; >>>> int result = 0; >>>> __asm volatile (""); >>>> for (int i = 0; i < len; i++) >>>> result += x[i]; >>>> return result; >>>> } >>>> >>>> Validated on arm-none-eabi, arm-none-linux-gnueabi, >>>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >>>> >>>> 2015-09-22 Michael Collison >>>> >>>> * config/arm/neon.md (widen_sum): New patterns >>>> where mode is VQI to improve mixed mode add vectorization. >>>> >>> Please list all the new define_expands and define_insns >>> in the changelog. Also, please add an ChangeLog entry for >>> the testsuite additions. >>> >>> The approach looks ok to me with a few comments on some >>> parts of the patch itself. >>> >>> >>> +(define_insn "vec_sel_widen_ssum_hi3" >>> + [(set (match_operand: 0 "s_register_operand" "=w") >>> + (plus: (sign_extend: (vec_select:VW >>> (match_operand:VQI 1 "s_register_operand" "%w") >>> + (match_operand:VQI 2 >>> "vect_par_constant_high" ""))) >>> + (match_operand: 3 "s_register_operand" >>> "0")))] >>> + "TARGET_NEON" >>> + "vaddw.\t%q0, %q3, %f1" >>> + [(set_attr "type" "neon_add_widen") >>> + (set_attr "length" "8")] >>> +) >>> >>> >>> This is a single instruction, and it has a length of 4, so no need to >>> override the length attribute. >>> Same with the other define_insns in this patch. >>> >>> >>> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> new file mode 100644 >>> index 0000000..ed10669 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> @@ -0,0 +1,21 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target arm_neon_hw } */ >>> >>> The arm_neon_hw check is usually used when you want to run the tests. >>> Since this is a compile-only tests you just need arm_neon_ok. >>> >>> +/* { dg-add-options arm_neon_ok } */ >>> +/* { dg-options "-O3" } */ >>> + >>> + >>> +int >>> +t6(int len, void * dummy, short * __restrict x) >>> +{ >>> + len = len & ~31; >>> + int result = 0; >>> + __asm volatile (""); >>> + for (int i = 0; i < len; i++) >>> + result += x[i]; >>> + return result; >>> +} >>> + >>> +/* { dg-final { scan-assembler "vaddw\.s16" } } */ >>> + >>> + >>> + >>> >>> Stray trailing newlines. Similar comments for the other testcases. >>> >>> Thanks, >>> Kyrill >>> > -- Michael Collison Linaro Toolchain Working Group michael.collison@linaro.org