From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 56652 invoked by alias); 18 Sep 2019 16:30:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 49210 invoked by uid 89); 18 Sep 2019 16:30:43 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.5 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3 autolearn=ham version=3.3.1 spammy=H*M:a838 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 18 Sep 2019 16:30:41 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0B74B337; Wed, 18 Sep 2019 09:30:35 -0700 (PDT) Received: from [10.2.206.47] (e120808-lin.cambridge.arm.com [10.2.206.47]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 73CAF3F59C; Wed, 18 Sep 2019 09:30:34 -0700 (PDT) Subject: Re: [PATCH][ARM] Cleanup 64-bit multiplies To: Wilco Dijkstra , GCC Patches , Richard Earnshaw Cc: nd References: From: Kyrill Tkachov Message-ID: <64f47d61-a838-fae9-246d-1f22e80e8dd1@foss.arm.com> Date: Wed, 18 Sep 2019 16:30:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2019-09/txt/msg01111.txt.bz2 Hi Wilco, On 9/9/19 6:08 PM, Wilco Dijkstra wrote: > ping > > > Cleanup 64-bit multiplies.  Combine the expanders using iterators. >  Merge the signed/unsigned multiplies as well as the pre-Armv6 and Armv6 >  variants.  Split DImode operands early into parallel sets inside the >  MULL/MLAL instructions - this improves register allocation and avoids >  subreg issues due to other DImode operations splitting early. > Hmm... quite a lot going on this patch. Perhaps breaking it into a series would have been easier. But I think I untangled it all and it looks like a good improvement. Ok. Thanks, Kyrill >  Bootstrap OK on armhf, regress passes. > >  ChangeLog: >  2019-09-03  Wilco Dijkstra  > >          * config/arm/arm.md (maddsidi4): Remove expander. >          (mulsidi3adddi): Remove pattern. >          (mulsidi3adddi_v6): Likewise. >          (mulsidi3_nov6): Likewise. >          (mulsidi3_v6): Likewise. >          (umulsidi3): Remove expander. >          (umulsidi3_nov6): Remove pattern. >          (umulsidi3_v6): Likewise. >          (umulsidi3adddi): Likewise. >          (umulsidi3adddi_v6): Likewise. >          (mulsidi3): Add combined expander. >          (maddsidi4): Likewise. >          (mull): Add combined umull and smull pattern. >          (mlal): Likewise. >          * config/arm/iterators.md (Us): Add new iterator. >  -- >  diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md >  index > 1ab203810bf143927a8afa0d00d82537cd7c75ed..c1fea4abdbccedbbbed9a25cab133de5cacb1afb > 100644 >  --- a/gcc/config/arm/arm.md >  +++ b/gcc/config/arm/arm.md >  @@ -1636,144 +1636,80 @@ (define_insn "*mls" >      (set_attr "predicable" "yes")] >   ) > >  -(define_expand "maddsidi4" >  -  [(set (match_operand:DI 0 "s_register_operand") >  -       (plus:DI >  -        (mult:DI >  -         (sign_extend:DI (match_operand:SI 1 "s_register_operand")) >  -         (sign_extend:DI (match_operand:SI 2 "s_register_operand"))) >  -        (match_operand:DI 3 "s_register_operand")))] >  -  "TARGET_32BIT" >  -  "") >  - >  -(define_insn "*mulsidi3adddi" >  -  [(set (match_operand:DI 0 "s_register_operand" "=&r") >  -       (plus:DI >  -        (mult:DI >  -         (sign_extend:DI (match_operand:SI 2 "s_register_operand" "%r")) >  -         (sign_extend:DI (match_operand:SI 3 "s_register_operand" "r"))) >  -        (match_operand:DI 1 "s_register_operand" "0")))] >  -  "TARGET_32BIT && !arm_arch6" >  -  "smlal%?\\t%Q0, %R0, %3, %2" >  -  [(set_attr "type" "smlal") >  -   (set_attr "predicable" "yes")] >  -) >  - >  -(define_insn "*mulsidi3adddi_v6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=r") >  -       (plus:DI >  -        (mult:DI >  -         (sign_extend:DI (match_operand:SI 2 "s_register_operand" "r")) >  -         (sign_extend:DI (match_operand:SI 3 "s_register_operand" "r"))) >  -        (match_operand:DI 1 "s_register_operand" "0")))] >  -  "TARGET_32BIT && arm_arch6" >  -  "smlal%?\\t%Q0, %R0, %3, %2" >  -  [(set_attr "type" "smlal") >  -   (set_attr "predicable" "yes")] >  -) >  - >   ;; 32x32->64 widening multiply. >  -;; As with mulsi3, the only difference between the v3-5 and v6+ >  -;; versions of these patterns is the requirement that the output not >  -;; overlap the inputs, but that still means we have to have a named >  -;; expander and two different starred insns. >  +;; The only difference between the v3-5 and v6+ versions is the > requirement >  +;; that the output does not overlap with either input. > >  -(define_expand "mulsidi3" >  +(define_expand "mulsidi3" >     [(set (match_operand:DI 0 "s_register_operand") >           (mult:DI >  -        (sign_extend:DI (match_operand:SI 1 "s_register_operand")) >  -        (sign_extend:DI (match_operand:SI 2 "s_register_operand"))))] >  +        (SE:DI (match_operand:SI 1 "s_register_operand")) >  +        (SE:DI (match_operand:SI 2 "s_register_operand"))))] >     "TARGET_32BIT" >  -  "" >  -) >  - >  -(define_insn "*mulsidi3_nov6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=&r") >  -       (mult:DI >  -        (sign_extend:DI (match_operand:SI 1 "s_register_operand" "%r")) >  -        (sign_extend:DI (match_operand:SI 2 "s_register_operand" > "r"))))] >  -  "TARGET_32BIT && !arm_arch6" >  -  "smull%?\\t%Q0, %R0, %1, %2" >  -  [(set_attr "type" "smull") >  -   (set_attr "predicable" "yes")] >  -) >  - >  -(define_insn "*mulsidi3_v6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=r") >  -       (mult:DI >  -        (sign_extend:DI (match_operand:SI 1 "s_register_operand" "r")) >  -        (sign_extend:DI (match_operand:SI 2 "s_register_operand" > "r"))))] >  -  "TARGET_32BIT && arm_arch6" >  -  "smull%?\\t%Q0, %R0, %1, %2" >  -  [(set_attr "type" "smull") >  -   (set_attr "predicable" "yes")] >  +  { >  +      emit_insn (gen_mull (gen_lowpart (SImode, operands[0]), >  +                              gen_highpart (SImode, operands[0]), >  +                              operands[1], operands[2])); >  +      DONE; >  +  } >   ) > >  -(define_expand "umulsidi3" >  -  [(set (match_operand:DI 0 "s_register_operand") >  -       (mult:DI >  -        (zero_extend:DI (match_operand:SI 1 "s_register_operand")) >  -        (zero_extend:DI (match_operand:SI 2 "s_register_operand"))))] >  +(define_insn "mull" >  +  [(set (match_operand:SI 0 "s_register_operand" "=r,&r") >  +       (mult:SI >  +        (match_operand:SI 2 "s_register_operand" "%r,r") >  +        (match_operand:SI 3 "s_register_operand" "r,r"))) >  +   (set (match_operand:SI 1 "s_register_operand" "=r,&r") >  +       (truncate:SI >  +        (lshiftrt:DI >  +         (mult:DI (SE:DI (match_dup 2)) (SE:DI (match_dup 3))) >  +         (const_int 32))))] >     "TARGET_32BIT" >  -  "" >  -) >  - >  -(define_insn "*umulsidi3_nov6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=&r") >  -       (mult:DI >  -        (zero_extend:DI (match_operand:SI 1 "s_register_operand" "%r")) >  -        (zero_extend:DI (match_operand:SI 2 "s_register_operand" > "r"))))] >  -  "TARGET_32BIT && !arm_arch6" >  -  "umull%?\\t%Q0, %R0, %1, %2" >  -  [(set_attr "type" "umull") >  -   (set_attr "predicable" "yes")] >  -) >  - >  -(define_insn "*umulsidi3_v6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=r") >  -       (mult:DI >  -        (zero_extend:DI (match_operand:SI 1 "s_register_operand" "r")) >  -        (zero_extend:DI (match_operand:SI 2 "s_register_operand" > "r"))))] >  -  "TARGET_32BIT && arm_arch6" >  -  "umull%?\\t%Q0, %R0, %1, %2" >  +  "mull%?\\t%0, %1, %2, %3" >     [(set_attr "type" "umull") >  -   (set_attr "predicable" "yes")] >  +   (set_attr "predicable" "yes") >  +   (set_attr "arch" "v6,nov6")] >   ) > >  -(define_expand "umaddsidi4" >  +(define_expand "maddsidi4" >     [(set (match_operand:DI 0 "s_register_operand") >           (plus:DI >            (mult:DI >  -         (zero_extend:DI (match_operand:SI 1 "s_register_operand")) >  -         (zero_extend:DI (match_operand:SI 2 "s_register_operand"))) >  +         (SE:DI (match_operand:SI 1 "s_register_operand")) >  +         (SE:DI (match_operand:SI 2 "s_register_operand"))) >            (match_operand:DI 3 "s_register_operand")))] >     "TARGET_32BIT" >  -  "") >  - >  -(define_insn "*umulsidi3adddi" >  -  [(set (match_operand:DI 0 "s_register_operand" "=&r") >  -       (plus:DI >  -        (mult:DI >  -         (zero_extend:DI (match_operand:SI 2 "s_register_operand" "%r")) >  -         (zero_extend:DI (match_operand:SI 3 "s_register_operand" "r"))) >  -        (match_operand:DI 1 "s_register_operand" "0")))] >  -  "TARGET_32BIT && !arm_arch6" >  -  "umlal%?\\t%Q0, %R0, %3, %2" >  -  [(set_attr "type" "umlal") >  -   (set_attr "predicable" "yes")] >  +  { >  +      emit_insn (gen_mlal (gen_lowpart (SImode, operands[0]), >  +                              gen_lowpart (SImode, operands[3]), >  +                              gen_highpart (SImode, operands[0]), >  +                              gen_highpart (SImode, operands[3]), >  +                              operands[1], operands[2])); >  +      DONE; >  +  } >   ) > >  -(define_insn "*umulsidi3adddi_v6" >  -  [(set (match_operand:DI 0 "s_register_operand" "=r") >  -       (plus:DI >  -        (mult:DI >  -         (zero_extend:DI (match_operand:SI 2 "s_register_operand" "r")) >  -         (zero_extend:DI (match_operand:SI 3 "s_register_operand" "r"))) >  -        (match_operand:DI 1 "s_register_operand" "0")))] >  -  "TARGET_32BIT && arm_arch6" >  -  "umlal%?\\t%Q0, %R0, %3, %2" >  +(define_insn "mlal" >  +  [(set (match_operand:SI 0 "s_register_operand" "=r,&r") >  +       (plus:SI >  +        (mult:SI >  +         (SE:DI (match_operand:SI 4 "s_register_operand" "%r,r")) >  +         (SE:DI (match_operand:SI 5 "s_register_operand" "r,r"))) >  +        (match_operand:SI 1 "s_register_operand" "0,0"))) >  +   (set (match_operand:SI 2 "s_register_operand" "=r,&r") >  +       (plus:SI >  +        (truncate:SI >  +         (lshiftrt:DI >  +          (plus:DI >  +           (mult:DI (SE:DI (match_dup 4)) (SE:DI (match_dup 5))) >  +           (zero_extend:DI (match_dup 1))) >  +          (const_int 32))) >  +        (match_operand:SI 3 "s_register_operand" "2,2")))] >  +  "TARGET_32BIT" >  +  "mlal%?\\t%0, %2, %4, %5" >     [(set_attr "type" "umlal") >  -   (set_attr "predicable" "yes")] >  +   (set_attr "predicable" "yes") >  +   (set_attr "arch" "v6,nov6")] >   ) > >   (define_expand "mulsi3_highpart" >  diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md >  index > fa6f0c0529d5364b1e1df705cb1029868578e38c..c29897a3b70d342b025c72b8c032bb3bb831040f > 100644 >  --- a/gcc/config/arm/iterators.md >  +++ b/gcc/config/arm/iterators.md >  @@ -796,6 +796,7 @@ (define_code_attr optab [(ltu "ltu") (geu "geu")]) > >   ;; Assembler mnemonics for signedness of widening operations. >   (define_code_attr US [(sign_extend "s") (zero_extend "u")]) >  +(define_code_attr Us [(sign_extend "") (zero_extend "u")]) > >   ;; Signedness suffix for float->fixed conversions. Empty for signed >   ;; conversion.