From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by sourceware.org (Postfix) with ESMTPS id 53EDD385E44C for ; Sat, 15 Jan 2022 08:29:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 53EDD385E44C Received: by mail-qt1-x82c.google.com with SMTP id q14so13200450qtx.10 for ; Sat, 15 Jan 2022 00:29:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vOytAea9oiw4uDxVXHDCOCxb5MtNu5QW9go8y0mJcOw=; b=RL13HGdejjV9FXg676yIuFHf0oTktUOmXudzj36V3qCp+pI+wDRSGkUlZ3mm5dySOT vHUa9rqtQbmHhVhTX5bEB7EOEaf3ce6v6RWwuH1sPWUGn9GDcwcaSqC8M3WdkT0mBGMm zwQHSroz2Cj6s5ahkX9jvJa7RFG3vCUjbNvBQXlmhkvqRkQ7i9vwyWfqqoKyvTS38hFI zif68iy5XmPmL7IHjkJ6OU8A2a/gWJDnV6d8KCXmqS1m/qYB8APtg1s/AMO+LoQUJquA hSafQw6bkmFU/9rjDT0lm1GOjJaLPNVrtI+pQ7aplepb1Cf6j9ueLjuLhZtbIFNh7Bvc 36Tw== X-Gm-Message-State: AOAM532ExYMgaLo1TrtmYJX/SsAqy23W5K8USEj7/FCbKqALKH3tEBQ8 rQ/DmcF4MzTWiRmuuPeyrnU4k8e5NUpnwgKxIbXkMXvwx4BFRw== X-Google-Smtp-Source: ABdhPJyXXpnbW+jw91rNNmmu/IvQm0RHOcJ9fLj/NqxQvhmL3BMEg4FcqltrZ/8rxUwE/nw+0d8/AroxQWyS5mek2k4= X-Received: by 2002:ac8:5e4a:: with SMTP id i10mr10485502qtx.569.1642235356779; Sat, 15 Jan 2022 00:29:16 -0800 (PST) MIME-Version: 1.0 References: <20220114225607.GP2646553@tucnak> In-Reply-To: <20220114225607.GP2646553@tucnak> From: Uros Bizjak Date: Sat, 15 Jan 2022 09:29:05 +0100 Message-ID: Subject: Re: [PATCH] widening_mul, i386: Improve spaceship expansion on x86 [PR103973] To: Jakub Jelinek Cc: Richard Biener , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Jan 2022 08:29:19 -0000 On Fri, Jan 14, 2022 at 11:56 PM Jakub Jelinek wrote: > > Hi! > > C++20: > #include > auto cmp4way(double a, double b) > { > return a <=> b; > } > expands to: > ucomisd %xmm1, %xmm0 > jp .L8 > movl $0, %eax > jne .L8 > .L2: > ret > .p2align 4,,10 > .p2align 3 > .L8: > comisd %xmm0, %xmm1 > movl $-1, %eax > ja .L2 > ucomisd %xmm1, %xmm0 > setbe %al > addl $1, %eax > ret > That is 3 comparisons of the same operands. > The following patch improves it to just one comparison: > comisd %xmm1, %xmm0 > jp .L4 > seta %al > movl $0, %edx > leal -1(%rax,%rax), %eax > cmove %edx, %eax > ret > .L4: > movl $2, %eax > ret > While a <=> b expands to a == b ? 0 : a < b ? -1 : a > b ? 1 : 2 > where the first comparison is equality and this shouldn't raise > exceptions on qNaN operands, if the operands aren't equal (which > includes unordered cases), then it immediately performs < or > > comparison and that raises exceptions even on qNaNs, so we can just > perform a single comparison that raises exceptions on qNaN. > As the 4 different cases are encoded as > ZF CF PF > 1 1 1 a unordered b > 0 0 0 a > b > 0 1 0 a < b > 1 0 0 a == b > we can emit optimal sequence of comparions, first jp > for the unordered case, then je for the == case and finally jb > for the < case. > > The patch pattern recognizes spaceship-like comparisons during > widening_mul if the spaceship optab is implemented, and replaces > thoose comparisons with comparisons of .SPACESHIP ifn which returns > -1/0/1/2 based on the comparison. This seems to work well both for the > case of just returning the -1/0/1/2 (when we have just a common > successor with a PHI) or when the different cases are handled with > various other basic blocks. The testcases cover both of those cases, > the latter with different function calls in those. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2022-01-14 Jakub Jelinek > > PR target/103973 > * tree-cfg.h (cond_only_block_p): Declare. > * tree-ssa-phiopt.c (cond_only_block_p): Move function to ... > * tree-cfg.c (cond_only_block_p): ... here. No longer static. > * optabs.def (spaceship_optab): New optab. > * internal-fn.def (SPACESHIP): New internal function. > * internal-fn.h (expand_SPACESHIP): Declare. > * internal-fn.c (expand_PHI): Formatting fix. > (expand_SPACESHIP): New function. > * tree-ssa-math-opts.c (optimize_spaceship): New function. > (math_opts_dom_walker::after_dom_children): Use it. > * config/i386/i386.md (spaceship3): New define_expand. > * config/i386/i386-protos.h (ix86_expand_fp_spaceship): Declare. > * config/i386/i386-expand.c (ix86_expand_fp_spaceship): New function. > * doc/md.texi (spaceship@var{m}3): Document. > > * gcc.target/i386/pr103973-1.c: New test. > * gcc.target/i386/pr103973-2.c: New test. > * gcc.target/i386/pr103973-3.c: New test. > * gcc.target/i386/pr103973-4.c: New test. > * g++.target/i386/pr103973-1.C: New test. > * g++.target/i386/pr103973-2.C: New test. > * g++.target/i386/pr103973-3.C: New test. > * g++.target/i386/pr103973-4.C: New test. >--- gcc/config/i386/i386.md.jj 2022-01-14 11:51:34.432384170 +0100 > +++ gcc/config/i386/i386.md 2022-01-14 18:22:41.140906449 +0100 > @@ -23886,6 +23886,18 @@ (define_insn "hreset" > [(set_attr "type" "other") > (set_attr "length" "4")]) > > +;; Spaceship optimization > +(define_expand "spaceship3" > + [(match_operand:SI 0 "register_operand") > + (match_operand:MODEF 1 "cmp_fp_expander_operand") > + (match_operand:MODEF 2 "cmp_fp_expander_operand")] > + "(TARGET_80387 || (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH)) > + && TARGET_CMOVE && TARGET_IEEE_FP" Is there a reason that this pattern is limited to TARGET_IEEE_FP? During the expansion in ix86_expand_fp_spaceship, we can just skip jump on unordered, while ix86_expand_fp_compare will emit the correct comparison mode depending on TARGET_IEEE_FP. > +{ > + ix86_expand_fp_spaceship (operands[0], operands[1], operands[2]); > + DONE; > +}) > + > (include "mmx.md") > (include "sse.md") > (include "sync.md") > --- gcc/config/i386/i386-expand.c.jj 2022-01-14 11:51:34.429384213 +0100 > +++ gcc/config/i386/i386-expand.c 2022-01-14 21:02:09.446437810 +0100 > @@ -2879,6 +2879,46 @@ ix86_expand_setcc (rtx dest, enum rtx_co > emit_insn (gen_rtx_SET (dest, ret)); > } > > +/* Expand floating point op0 <=> op1 if NaNs are honored. */ > + > +void > +ix86_expand_fp_spaceship (rtx dest, rtx op0, rtx op1) > +{ > + gcc_checking_assert (ix86_fp_comparison_strategy (GT) == IX86_FPCMP_COMI); > + rtx gt = ix86_expand_fp_compare (GT, op0, op1); > + rtx l0 = gen_label_rtx (); > + rtx l1 = gen_label_rtx (); > + rtx l2 = gen_label_rtx (); > + rtx lend = gen_label_rtx (); > + rtx un = gen_rtx_fmt_ee (UNORDERED, VOIDmode, > + gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx); > + rtx tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, un, > + gen_rtx_LABEL_REF (VOIDmode, l2), pc_rtx); > + rtx_insn *jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > + add_reg_br_prob_note (jmp, profile_probability:: very_unlikely ()); Please also add JUMP_LABEL to the insn. > + rtx eq = gen_rtx_fmt_ee (UNEQ, VOIDmode, > + gen_rtx_REG (CCFPmode, FLAGS_REG), const0_rtx); > + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, eq, > + gen_rtx_LABEL_REF (VOIDmode, l0), pc_rtx); > + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > + add_reg_br_prob_note (jmp, profile_probability::unlikely ()); > + tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, gt, > + gen_rtx_LABEL_REF (VOIDmode, l1), pc_rtx); > + jmp = emit_jump_insn (gen_rtx_SET (pc_rtx, tmp)); > + add_reg_br_prob_note (jmp, profile_probability::even ()); > + emit_move_insn (dest, constm1_rtx); > + emit_jump (lend); > + emit_label (l0); and LABEL_NUSES label. > + emit_move_insn (dest, const0_rtx); > + emit_jump (lend); > + emit_label (l1); > + emit_move_insn (dest, const1_rtx); > + emit_jump (lend); > + emit_label (l2); > + emit_move_insn (dest, const2_rtx); > + emit_label (lend); > +} > + > /* Expand comparison setting or clearing carry flag. Return true when > successful and set pop for the operation. */ > static bool Uros.