public inbox for
 help / color / mirror / Atom feed
From: "Kewen.Lin" <>
To: Segher Boessenkool <>
Cc: GCC Patches <>,
	David Edelsohn <>,
	Peter Bergner <>,
	Michael Meissner <>
Subject: Re: [PATCH 1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare
Date: Thu, 17 Nov 2022 14:59:00 +0800	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Hi Segher,

Thanks for the comments!

on 2022/11/17 02:44, Segher Boessenkool wrote:
> Hi!
> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>> 	* config/rs6000/ (rs6000_emit_vector_compare_inner): Remove
>> 	float only comparison operators.
> Why?  Is that correct?  Your mail says nothing about this :-(
> Is there any testcase that covers this, and that shows things still
> generate the same code?

Sorry for the unclear description, I thought mistakenly that it's
probably straightforward.

With the change in this patch, all 14 vector float comparison operators
would be handled early in rs6000_emit_vector_compare.

For unordered/ordered/ltgt/uneq, the new way is exactly the same
as what we do in rs6000_emit_vector_compare_inner, it means there is
no chance to get into rs6000_emit_vector_compare_inner with any of them.
For eq/ge/gt, it's the same too, but they are shared with vector integer
comparison, I just left them alone here.  Just noticed we can remove ge
safely too as it's guarded with !MODE_VECTOR_INT.

For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
with reverse_condition_maybe_unordered and invert the result, it's the
same as what we have in

; unge(a,b) = ~lt(a,b)
; unle(a,b) = ~gt(a,b)
; ne(a,b)   = ~eq(a,b)
; ungt(a,b) = ~le(a,b)
; unlt(a,b) = ~ge(a,b)

Then eq/ge/gt on the right side would match the cases that were mentioned
above.  So we just need to focus on lt and le then.

For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
it's the same as what we have in

; lt(a,b)   = gt(b,a)

, and further matches the case mentioned above.

As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
and further handle lt recursively, that is:
   le = lt(a,b) || eq(a,b)
      = gt(b,a) || eq(a,b)

actually this is worse than what supports:

; le(a,b)   = ge(b,a)

In short, the function rs6000_emit_vector_compare_inner is only called by
twice in rs6000_emit_vector_compare, there is no chance to enter
rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
any more, I think it's safe to make the change in function
rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
vector float comparison can improve slightly for UNGT and LE handlings.

I constructed a test case, compiled with option -O2 -ftree-vectorize
-fno-vect-cost-model on ppc64le, which goes into this function
rs6000_emit_vector_compare with all 14 vector float comparison codes,
the assembly of most functions doesn't change after this patch,
excepting for test_UNGT_{float,double} and test_LE_{float,double}.

one example from 

          lxvx 12,3,9
          lxvx 11,4,9
          xvcmpgtsp 0,11,12
          xvcmpeqsp 12,12,11
          xxlor 0,0,12
          xxlandc 0,32,0
          stxvx 0,5,9
          addi 9,9,16
          bdnz .L77


after: (good to be unrolled)

          lxvx 0,4,10
          lxvx 12,3,10
          addi 9,10,16
          lxvx 11,3,9
          xvcmpgesp 12,0,12
          lxvx 0,4,9
          xvcmpgesp 0,0,11
          xxlandc 12,32,12
          stxvx 12,5,10
          addi 10,10,32
          xxlandc 0,32,0
          stxvx 0,5,9
          bdnz .L77

$ cat test.h

#define UNORD(a, b) (__builtin_isunordered ((a), (b)))
#define ORD(a, b) (!__builtin_isunordered ((a), (b)))
#define LTGT(a, b) (__builtin_islessgreater ((a), (b)))
#define UNEQ(a, b) (!__builtin_islessgreater ((a), (b)))
#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
#define UNGE(a, b) (!__builtin_isless ((a), (b)))
#define UNLT(a, b) (!__builtin_isgreaterequal ((a), (b)))
#define UNLE(a, b) (!__builtin_isgreater ((a), (b)))
#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)                                                  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
                                                     int *res, int n)          \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      res[i] = NAME (x[i], y[i]);                                              \

$ cat test.c

#include "test.h"

#define TEST(TYPE)                                                             \
  TEST_VECT (UNORD, TYPE)                                                      \
  TEST_VECT (ORD, TYPE)                                                        \
  TEST_VECT (LTGT, TYPE)                                                       \
  TEST_VECT (UNEQ, TYPE)                                                       \
  TEST_VECT (UNGT, TYPE)                                                       \
  TEST_VECT (UNGE, TYPE)                                                       \
  TEST_VECT (UNLT, TYPE)                                                       \
  TEST_VECT (UNLE, TYPE)                                                       \
  TEST_VECT (GT, TYPE)                                                         \
  TEST_VECT (GE, TYPE)                                                         \
  TEST_VECT (LT, TYPE)                                                         \
  TEST_VECT (LE, TYPE)                                                         \
  TEST_VECT (EQ, TYPE)                                                         \

TEST (float)
TEST (double)

Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
and scan not xvcmp{gt,eq}[sd]p.

With the above explanation, does this patch look good to you?


  reply	other threads:[~2022-11-17  6:59 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-16  6:48 Kewen.Lin
2022-11-16  6:51 ` [PATCH 2/2] rs6000: Refine integer comparison handlings " Kewen.Lin
2022-11-16 18:58   ` Segher Boessenkool
2022-11-17  7:52     ` Kewen.Lin
2022-11-18 15:18       ` Segher Boessenkool
2022-11-16 18:44 ` [PATCH 1/2] rs6000: Emit vector fp comparison directly " Segher Boessenkool
2022-11-17  6:59   ` Kewen.Lin [this message]
2022-11-18 15:10     ` Segher Boessenkool
2022-11-21  2:01       ` Kewen.Lin
2022-11-27 18:16         ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).