From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24774 invoked by alias); 4 Feb 2015 12:12:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 24761 invoked by uid 89); 4 Feb 2015 12:12:41 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: service87.mimecast.com Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Feb 2015 12:12:39 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by service87.mimecast.com; Wed, 04 Feb 2015 12:12:36 +0000 Received: from [10.2.207.50] ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Wed, 4 Feb 2015 12:12:34 +0000 Message-ID: <54D20CB2.4070200@arm.com> Date: Wed, 04 Feb 2015 12:12:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Ramana Radhakrishnan , Richard Earnshaw Subject: [PATCH][ARM] Rewrite vc NEON patterns to use RTL operations rather than UNSPECs X-MC-Unique: 115020412123606501 Content-Type: multipart/mixed; boundary="------------040808090906010201010603" X-IsSubscribed: yes X-SW-Source: 2015-02/txt/msg00232.txt.bz2 This is a multi-part message in MIME format. --------------040808090906010201010603 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 3066 Hi all, This patch improves the vc patterns in neon.md to use proper RTL=20 operations rather than UNSPECS. It is done in a similar way to the analogous aarch64 operations i.e.=20 vceq is expressed as (neg (eq (...) (...))) since we want to write all 1s to the result element when 'eq' holds and=20 0s otherwise. The catch is that the floating-point comparisons can only be expanded to=20 the RTL codes when -funsafe-math-optimizations is given and they must=20 continue to use the UNSPECS otherwise. For this I've created a define_expand that generates the correct RTL depending on -funsafe-math-optimizations and two=20 define_insns to match the result: one using the RTL codes and one using=20 UNSPECs. I've also compressed some of the patterns together using iterators for=20 the [eq gt ge le lt] cases. NOTE: for le and lt before this patch we would never generate=20 'vclt. dm, dn, dp' instructions, only 'vclt. dm, dn, #0'. With this patch we can now generate 'vclt. dm, dn, dp' assembly.=20 According to the ARM ARM this is just a pseudo-instruction that mapps to=20 vcgt with the operands swapped around. I've confirmed that gas supports this code. The vcage and vcagt patterns are rewritten to use the form: (neg ( (abs (...)) (abs (...)))) and condensed together using iterators as well. Bootstrapped and tested on arm-none-linux-gnueabihf, made sure that the=20 advanced-simd-intrinsics testsuite is passing (it did catch some bugs during development of this patch) and tried out=20 other NEON intrinsics codebases. The test gcc.target/arm/neon/pr51534.c now generates 'vclt. dn,=20 dm, #0' instructions where appropriate instead of the previous vmov of=20 #0 into a temp and then a 'vcgt. dn, temp, dm'. I think that is correct behaviour since the test was trying to make sure=20 that we didn't generate a .u-typed comparison with #0, which is=20 what the PR was talking about (from what I can gather). What do people think of this approach? I'm proposing this for next stage1, of course. Thanks, Kyrill 2015-02-04 Kyrylo Tkachov * config/arm/iterators.md (GTGE, GTUGEU, COMPARISONS): New code iterators. (cmp_op, cmp_type): New code attributes. (NEON_VCMP, NEON_VACMP): New int iterators. (cmp_op_unsp): New int attribute. * config/arm/neon.md (neon_vc): New define_expand. (neon_vceq): Delete. (neon_vc_insn): New pattern. (neon_vc_insn_unspec): Likewise. (neon_vcgeu): Delete. (neon_vcle): Likewise. (neon_vclt: Likewise. (neon_vcage): Likewise. (neon_vcagt): Likewise. (neon_vca): New define_expand. (neon_vca_insn): New pattern. (neon_vca_insn_unspec): Likewise. 2015-02-04 Kyrylo Tkachov * gcc.target/arm/neon/pr51534.c: Update vcg* scan-assembly patterns to look for vcl* where appropriate.= --------------040808090906010201010603 Content-Type: text/x-patch; name=arm-neon-refactor.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="arm-neon-refactor.patch" Content-length: 15802 diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index f7f8ab7..66f3f4d 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -181,6 +181,15 @@ (define_mode_iterator VPF [V8QI V16QI V2SF V4SF]) ;; compare a second time. (define_code_iterator LTUGEU [ltu geu]) =20 +;; The signed gt, ge comparisons +(define_code_iterator GTGE [gt ge]) + +;; The unsigned gt, ge comparisons +(define_code_iterator GTUGEU [gtu geu]) + +;; Comparisons for vc +(define_code_iterator COMPARISONS [eq gt ge le lt]) + ;; A list of ... (define_code_iterator ior_xor [ior xor]) =20 @@ -214,6 +223,11 @@ (define_code_attr t2_binop0 (define_code_attr arith_shift_insn [(plus "add") (minus "rsb") (ior "orr") (xor "eor") (and "and")]) =20 +(define_code_attr cmp_op [(eq "eq") (gt "gt") (ge "ge") (lt "lt") (le "le") + (gtu "gt") (geu "ge")]) + +(define_code_attr cmp_type [(eq "i") (gt "s") (ge "s") (lt "s") (le "s")]) + ;;------------------------------------------------------------------------= ---- ;; Int iterators ;;------------------------------------------------------------------------= ---- @@ -221,6 +235,10 @@ (define_code_attr arith_shift_insn (define_int_iterator VRINT [UNSPEC_VRINTZ UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTR UNSPEC_VRINTX UNSPEC_VRINTA]) =20 +(define_int_iterator NEON_VCMP [UNSPEC_VCEQ UNSPEC_VCGT UNSPEC_VCGE UNSPEC= _VCLT UNSPEC_VCLE]) + +(define_int_iterator NEON_VACMP [UNSPEC_VCAGE UNSPEC_VCAGT]) + (define_int_iterator VCVT [UNSPEC_VRINTP UNSPEC_VRINTM UNSPEC_VRINTA]) =20 (define_int_iterator NEON_VRINT [UNSPEC_NVRINTP UNSPEC_NVRINTZ UNSPEC_NVRI= NTM @@ -677,6 +695,11 @@ (define_int_attr sup [ =20 ]) =20 +(define_int_attr cmp_op_unsp [(UNSPEC_VCEQ "eq") (UNSPEC_VCGT "gt") + (UNSPEC_VCGE "ge") (UNSPEC_VCLE "le") + (UNSPEC_VCLT "lt") (UNSPEC_VCAGE "ge") + (UNSPEC_VCAGT "gt")]) + (define_int_attr r [ (UNSPEC_VRHADD_S "r") (UNSPEC_VRHADD_U "r") (UNSPEC_VHADD_S "") (UNSPEC_VHADD_U "") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 63c327e..445df2a 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -2200,134 +2200,140 @@ (define_insn "neon_vsubhn" [(set_attr "type" "neon_sub_halve_narrow_q")] ) =20 -(define_insn "neon_vceq" - [(set (match_operand: 0 "s_register_operand" "=3Dw,w") - (unspec: - [(match_operand:VDQW 1 "s_register_operand" "w,w") - (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")] - UNSPEC_VCEQ))] +;; These may expand to an UNSPEC pattern when a floating point mode is used +;; without unsafe math optimizations. +(define_expand "neon_vc" + [(match_operand: 0 "s_register_operand" "=3Dw,w") + (neg: + (COMPARISONS:VDQW (match_operand:VDQW 1 "s_register_operand" "w,w") + (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz= ")))] "TARGET_NEON" - "@ - vceq.\t%0, %1, %2 - vceq.\t%0, %1, #0" - [(set (attr "type") - (if_then_else (match_test "") - (const_string "neon_fp_compare_s") - (if_then_else (match_operand 2 "zero_operand") - (const_string "neon_compare_zero") - (const_string "neon_compare"))))] + { + /* For FP comparisons use UNSPECS unless -funsafe-math-optimizations + are enabled. */ + if (GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_FLOAT + && !flag_unsafe_math_optimizations) + { + /* We don't just emit a gen_neon_vc_insn_unspec beca= use + we define gen_neon_vceq_insn_unspec only for float modes + whereas this expander iterates over the integer modes as well, + but we will never expand to UNSPECs for the integer comparisons= . */ + switch (mode) + { + case V2SFmode: + emit_insn (gen_neon_vcv2sf_insn_unspec (operands[0], + operands[1], + operands[2])= ); + break; + case V4SFmode: + emit_insn (gen_neon_vcv4sf_insn_unspec (operands[0], + operands[1], + operands[2])= ); + break; + default: + gcc_unreachable (); + } + } + else + emit_insn (gen_neon_vc_insn (operands[0], + operands[1], + operands[2])); + DONE; + } ) =20 -(define_insn "neon_vcge" +(define_insn "neon_vc_insn" [(set (match_operand: 0 "s_register_operand" "=3Dw,w") - (unspec: - [(match_operand:VDQW 1 "s_register_operand" "w,w") - (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")] - UNSPEC_VCGE))] - "TARGET_NEON" - "@ - vcge.\t%0, %1, %2 - vcge.\t%0, %1, #0" + (neg: + (COMPARISONS: + (match_operand:VDQW 1 "s_register_operand" "w,w") + (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz"))))] + "TARGET_NEON && !(GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_FLOAT + && !flag_unsafe_math_optimizations)" + { + char pattern[100]; + sprintf (pattern, "vc.%s%%#\t%%0," + " %%1, %s", + GET_MODE_CLASS (mode) =3D=3D MODE_VECTOR_FLOAT + ? "f" : "", + which_alternative =3D=3D 0 + ? "%2" : "#0"); + output_asm_insn (pattern, operands); + return ""; + } [(set (attr "type") - (if_then_else (match_test "") - (const_string "neon_fp_compare_s") - (if_then_else (match_operand 2 "zero_operand") + (if_then_else (match_operand 2 "zero_operand") (const_string "neon_compare_zero") - (const_string "neon_compare"))))] -) - -(define_insn "neon_vcgeu" - [(set (match_operand: 0 "s_register_operand" "=3Dw") - (unspec: - [(match_operand:VDQIW 1 "s_register_operand" "w") - (match_operand:VDQIW 2 "s_register_operand" "w")] - UNSPEC_VCGEU))] - "TARGET_NEON" - "vcge.u%#\t%0, %1, %2" - [(set_attr "type" "neon_compare")] + (const_string "neon_compare")))] ) =20 -(define_insn "neon_vcgt" +(define_insn "neon_vc_insn_unspec" [(set (match_operand: 0 "s_register_operand" "=3Dw,w") (unspec: - [(match_operand:VDQW 1 "s_register_operand" "w,w") - (match_operand:VDQW 2 "reg_or_zero_operand" "w,Dz")] - UNSPEC_VCGT))] + [(match_operand:VCVTF 1 "s_register_operand" "w,w") + (match_operand:VCVTF 2 "reg_or_zero_operand" "w,Dz")] + NEON_VCMP))] "TARGET_NEON" - "@ - vcgt.\t%0, %1, %2 - vcgt.\t%0, %1, #0" - [(set (attr "type") - (if_then_else (match_test "") - (const_string "neon_fp_compare_s") - (if_then_else (match_operand 2 "zero_operand") - (const_string "neon_compare_zero") - (const_string "neon_compare"))))] + { + char pattern[100]; + sprintf (pattern, "vc.f%%#\t%%0," + " %%1, %s", + which_alternative =3D=3D 0 + ? "%2" : "#0"); + output_asm_insn (pattern, operands); + return ""; +} + [(set_attr "type" "neon_fp_compare_s")] ) =20 -(define_insn "neon_vcgtu" +(define_insn "neon_vcu" [(set (match_operand: 0 "s_register_operand" "=3Dw") - (unspec: - [(match_operand:VDQIW 1 "s_register_operand" "w") - (match_operand:VDQIW 2 "s_register_operand" "w")] - UNSPEC_VCGTU))] + (neg: + (GTUGEU: + (match_operand:VDQIW 1 "s_register_operand" "w") + (match_operand:VDQIW 2 "s_register_operand" "w"))))] "TARGET_NEON" - "vcgt.u%#\t%0, %1, %2" + "vc.u%#\t%0, %1, %2" [(set_attr "type" "neon_compare")] ) =20 -;; VCLE and VCLT only support comparisons with immediate zero (register -;; variants are VCGE and VCGT with operands reversed). - -(define_insn "neon_vcle" - [(set (match_operand: 0 "s_register_operand" "=3Dw") - (unspec: - [(match_operand:VDQW 1 "s_register_operand" "w") - (match_operand:VDQW 2 "zero_operand" "Dz")] - UNSPEC_VCLE))] +(define_expand "neon_vca" + [(set (match_operand: 0 "s_register_operand") + (neg: + (GTGE: + (abs:VCVTF (match_operand:VCVTF 1 "s_register_operand")) + (abs:VCVTF (match_operand:VCVTF 2 "s_register_operand")))))] "TARGET_NEON" - "vcle.\t%0, %1, #0" - [(set (attr "type") - (if_then_else (match_test "") - (const_string "neon_fp_compare_s") - (if_then_else (match_operand 2 "zero_operand") - (const_string "neon_compare_zero") - (const_string "neon_compare"))))] -) - -(define_insn "neon_vclt" - [(set (match_operand: 0 "s_register_operand" "=3Dw") - (unspec: - [(match_operand:VDQW 1 "s_register_operand" "w") - (match_operand:VDQW 2 "zero_operand" "Dz")] - UNSPEC_VCLT))] - "TARGET_NEON" - "vclt.\t%0, %1, #0" - [(set (attr "type") - (if_then_else (match_test "") - (const_string "neon_fp_compare_s") - (if_then_else (match_operand 2 "zero_operand") - (const_string "neon_compare_zero") - (const_string "neon_compare"))))] + { + if (flag_unsafe_math_optimizations) + emit_insn (gen_neon_vca_insn (operands[0], operands[1], + operands[2])); + else + emit_insn (gen_neon_vca_insn_unspec (operands[0], + operands[1], + operands[2])); + DONE; + } ) =20 -(define_insn "neon_vcage" +(define_insn "neon_vca_insn" [(set (match_operand: 0 "s_register_operand" "=3Dw") - (unspec: [(match_operand:VCVTF 1 "s_register_operand= " "w") - (match_operand:VCVTF 2 "s_register_operand" "w")] - UNSPEC_VCAGE))] - "TARGET_NEON" - "vacge.\t%0, %1, %2" + (neg: + (GTGE: + (abs:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")) + (abs:VCVTF (match_operand:VCVTF 2 "s_register_operand" "w"))))= )] + "TARGET_NEON && flag_unsafe_math_optimizations" + "vac.\t%0, %1, %2" [(set_attr "type" "neon_fp_compare_s")] ) =20 -(define_insn "neon_vcagt" +(define_insn "neon_vca_insn_unspec" [(set (match_operand: 0 "s_register_operand" "=3Dw") (unspec: [(match_operand:VCVTF 1 "s_register_operand= " "w") (match_operand:VCVTF 2 "s_register_operand" "w")] - UNSPEC_VCAGT))] + NEON_VACMP))] "TARGET_NEON" - "vacgt.\t%0, %1, %2" + "vac.\t%0, %1, %2" [(set_attr "type" "neon_fp_compare_s")] ) =20 diff --git a/gcc/testsuite/gcc.target/arm/neon/pr51534.c b/gcc/testsuite/gc= c.target/arm/neon/pr51534.c index 71cbb05..074bbd4 100644 --- a/gcc/testsuite/gcc.target/arm/neon/pr51534.c +++ b/gcc/testsuite/gcc.target/arm/neon/pr51534.c @@ -58,18 +58,18 @@ GEN_COND_TESTS(vceq) /* { dg-final { scan-assembler-times "vcge\.u16\[ \]+\[qQ\]\[0-9\]+, \[qQ= \]\[0-9\]+, \[qQ\]\[0-9\]+" 2 } } */ /* { dg-final { scan-assembler "vcge\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, #0" } } */ /* { dg-final { scan-assembler-times "vcge\.u32\[ \]+\[qQ\]\[0-9\]+, \[qQ= \]\[0-9\]+, \[qQ\]\[0-9\]+" 2 } } */ -/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9= \]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcgt\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9= \]+, \[qQ\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcgt\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, \[qQ\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcgt\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, \[qQ\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9= \]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, \[dD\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9= \]+, \[qQ\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, \[qQ\]\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vcge\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, \[qQ\]\[0-9\]+" } } */ +/* { dg-final { scan-assembler "vclt\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9= \]+, #0" } } */ +/* { dg-final { scan-assembler "vclt\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vclt\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vclt\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9= \]+, #0" } } */ +/* { dg-final { scan-assembler "vclt\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vclt\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s8\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-9= \]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s16\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s32\[ \]+\[dD\]\[0-9\]+, \[dD\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s8\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-9= \]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s16\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, #0" } } */ +/* { dg-final { scan-assembler "vcle\.s32\[ \]+\[qQ\]\[0-9\]+, \[qQ\]\[0-= 9\]+, #0" } } */ /* { dg-final { scan-assembler-times "vceq\.i8\[ \]+\[dD\]\[0-9\]+, \[dD\= ]\[0-9\]+, #0" 2 } } */ /* { dg-final { scan-assembler-times "vceq\.i16\[ \]+\[dD\]\[0-9\]+, \[dD= \]\[0-9\]+, #0" 2 } } */ /* { dg-final { scan-assembler-times "vceq\.i32\[ \]+\[dD\]\[0-9\]+, \[dD= \]\[0-9\]+, #0" 2 } } */= --------------040808090906010201010603--