[Bug c++/106910] New: roundss not vectorized

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c++/106910] New: roundss not vectorized
@ 2022-09-12 16:15 pilarlatiesa at gmail dot com
  2022-09-12 16:35 ` [Bug target/106910] " pinskia at gcc dot gnu.org
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: pilarlatiesa at gmail dot com @ 2022-09-12 16:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

            Bug ID: 106910
           Summary: roundss not vectorized
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

GCC 7 and newer optimize `std::floor(float)` into `vroundss` with -O2 and
-march=skylake, which is great.

However, I can see in Compiler Explorer that the following example:

```
#include <cmath>

struct TVec { float x, y; };

struct TKey { int i, j; };

class TDom
{
private:
  static int Floor(float const x)
    { return static_cast<int>(std::floor(x)); }

public:
  TKey CalcKey(TVec const &) const;
};

TKey TDom::CalcKey(TVec const &r) const
  { return {Floor(r.x), Floor(r.y)}; }
```

produces:

```

vxorps      %xmm1, %xmm1, %xmm1
vroundss    $9, (%rsi), %xmm1, %xmm0
vroundss    $9, 4(%rsi), %xmm1, %xmm1
vunpcklps   %xmm1, %xmm0, %xmm0
vcvttps2dq  %xmm0, %xmm2
vmovq       %xmm2, %rax
ret
```

Couldn’t the pair of `vroundss` have been merged into a single `vroundps`?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
@ 2022-09-12 16:35 ` pinskia at gcc dot gnu.org
  2022-09-13  6:31 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pinskia at gcc dot gnu.org @ 2022-09-12 16:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2022-09-12

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
On x86_64:
/app/example.cpp:19:35: missed:   function is not vectorizable.
/opt/compiler-explorer/gcc-trunk-20220912/include/c++/13.0.0/cmath:261:28:
missed:   not vectorized: relevant stmt not supported: _9 = __builtin_floorf
(_1);

While on aarch64, GCC can handle the SLP just fine:

  vect__1.7_12 = MEM <const vector(2) float> [(float *)r_4(D)];
  vect__9.8_13 = .FLOOR (vect__1.7_12);
  vect__10.9_14 = (vector(2) int) vect__9.8_13;
  MEM <vector(2) int> [(int *)&D.25164] = vect__10.9_14;

So this is a target backend improvement that is needed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
  2022-09-12 16:35 ` [Bug target/106910] " pinskia at gcc dot gnu.org
@ 2022-09-13  6:31 ` rguenth at gcc dot gnu.org
  2022-09-13  7:30 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-13  6:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
                 CC|                            |crazylht at gmail dot com
             Target|x86_64                      |x86_64-*-*

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Probably missing patterns for V2SFmode here.  Hmm, we don't seem to have any
vector mode patterns here but possibly rely on ix86_builtin_vectorized_function
which indeed doesn't have any V2SFmode support.

The vectorizer would go the direct internal fn way for those, querying the
floor optab but the x86 backend only has scalar modes supported for the
rounding optabs.

The backend should modernize itself, get rid of the
ix86_builtin_vectorized_function parts for those functions and instead rely
on define_expands with vector modes.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
  2022-09-12 16:35 ` [Bug target/106910] " pinskia at gcc dot gnu.org
  2022-09-13  6:31 ` rguenth at gcc dot gnu.org
@ 2022-09-13  7:30 ` crazylht at gmail dot com
  2022-09-15  9:35 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-13  7:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---

> The backend should modernize itself, get rid of the
> ix86_builtin_vectorized_function parts for those functions and instead rely
> on define_expands with vector modes.

Indeed, let me do it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (2 preceding siblings ...)
  2022-09-13  7:30 ` crazylht at gmail dot com
@ 2022-09-15  9:35 ` crazylht at gmail dot com
  2022-09-15 11:40 ` rguenther at suse dot de
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-15  9:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---

> The vectorizer would go the direct internal fn way for those, querying the
> floor optab but the x86 backend only has scalar modes supported for the
> rounding optabs.
For CFN_BUILT_IN_ICEIL, the modifier is narrow, and currently vectorizable_call
require op_in and op_out to be simple_integer_narrowing, which means it fails
to go the direct internal fn way.

---------cut from vectorizable_call-----------
  tree_code convert_code = ERROR_MARK;
  if (cfn != CFN_LAST
      && (modifier == NONE
          || (modifier == NARROW
              && simple_integer_narrowing (vectype_out, vectype_in,
                                           &convert_code))))
    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
                                          vectype_in);
-----------cut end----------------------

Similar for CFN_BUILT_IN_LCEIL under 32-bit target.
For 64-bit target CFN_BUILT_IN_LCEIL, CFN_BUILT_IN_LLCEIL will go the direct
internal fn way, add lceilmn2 expanders works.

Not sure whether vectorizable_call should be extended or just leave
CFN_BUILT_IN_ICEIL/IFLOOR/IRINT/IROUND part in
ix86_builtin_vectorized_function.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (3 preceding siblings ...)
  2022-09-15  9:35 ` crazylht at gmail dot com
@ 2022-09-15 11:40 ` rguenther at suse dot de
  2022-09-16  7:45 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rguenther at suse dot de @ 2022-09-15 11:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 15 Sep 2022, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910
> 
> --- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
> 
> > The vectorizer would go the direct internal fn way for those, querying the
> > floor optab but the x86 backend only has scalar modes supported for the
> > rounding optabs.
> For CFN_BUILT_IN_ICEIL, the modifier is narrow, and currently vectorizable_call
> require op_in and op_out to be simple_integer_narrowing, which means it fails
> to go the direct internal fn way.
> 
> ---------cut from vectorizable_call-----------
>   tree_code convert_code = ERROR_MARK;
>   if (cfn != CFN_LAST
>       && (modifier == NONE
>           || (modifier == NARROW
>               && simple_integer_narrowing (vectype_out, vectype_in,
>                                            &convert_code))))
>     ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>                                           vectype_in);
> -----------cut end----------------------
> 
> Similar for CFN_BUILT_IN_LCEIL under 32-bit target.
> For 64-bit target CFN_BUILT_IN_LCEIL, CFN_BUILT_IN_LLCEIL will go the direct
> internal fn way, add lceilmn2 expanders works.
> 
> Not sure whether vectorizable_call should be extended or just leave
> CFN_BUILT_IN_ICEIL/IFLOOR/IRINT/IROUND part in
> ix86_builtin_vectorized_function.

I think this is a known issue (we may even have a bugreport) so I'd
leave handling of those cases in ix86_builtin_vectorized_function.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (4 preceding siblings ...)
  2022-09-15 11:40 ` rguenther at suse dot de
@ 2022-09-16  7:45 ` cvs-commit at gcc dot gnu.org
  2022-09-20  6:54 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-09-16  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:3e8c4b925a9825fdb8c81f47b621f63108894362

commit r13-2694-g3e8c4b925a9825fdb8c81f47b621f63108894362
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Sep 15 18:43:16 2022 +0800

    Modernize ix86_builtin_vectorized_function with corresponding expanders.

    For ifloor/lfloor/iceil/lceil/irint/lrint/iround/lround when size of
    in_mode is not equal out_mode, vectorizer doesn't go to internal fn
    way,still left that part in the ix86_builtin_vectorized_function.

    Remove others builtins and add corresponding expanders.

    gcc/ChangeLog:

            PR target/106910
            * config/i386/i386-builtins.cc
            (ix86_builtin_vectorized_function): Modernized with
            corresponding expanders.
            * config/i386/sse.md (lrint<mode><sseintvecmodelower>2): New
            expander.
            (floor<mode>2): Ditto.
            (lfloor<mode><sseintvecmodelower>2): Ditto.
            (ceil<mode>2): Ditto.
            (lceil<mode><sseintvecmodelower>2): Ditto.
            (btrunc<mode>2): Ditto.
            (lround<mode><sseintvecmodelower>2): Ditto.
            (exp2<mode>2): Ditto.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (5 preceding siblings ...)
  2022-09-16  7:45 ` cvs-commit at gcc dot gnu.org
@ 2022-09-20  6:54 ` cvs-commit at gcc dot gnu.org
  2022-09-20  7:00 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-09-20  6:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:d0c73b6c85677e6755b60fa02d79a5c5e1a8eacd

commit r13-2730-gd0c73b6c85677e6755b60fa02d79a5c5e1a8eacd
Author: liuhongt <hongtao.liu@intel.com>
Date:   Fri Sep 16 14:28:34 2022 +0800

    Support 64-bit vectorization for single-precision floating rounding
operation.

    Here's list the patch supported.
    rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround.

    gcc/ChangeLog:

            PR target/106910
            * config/i386/mmx.md (nearbyintv2sf2): New expander.
            (rintv2sf2): Ditto.
            (ceilv2sf2): Ditto.
            (lceilv2sfv2si2): Ditto.
            (floorv2sf2): Ditto.
            (lfloorv2sfv2si2): Ditto.
            (btruncv2sf2): Ditto.
            (lrintv2sfv2si2): Ditto.
            (roundv2sf2): Ditto.
            (lroundv2sfv2si2): Ditto.
            (*mmx_roundv2sf2): New define_insn.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/pr106910-1.c: New test.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (6 preceding siblings ...)
  2022-09-20  6:54 ` cvs-commit at gcc dot gnu.org
@ 2022-09-20  7:00 ` crazylht at gmail dot com
  2022-09-21 10:44 ` pilarlatiesa at gmail dot com
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-20  7:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #8 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC13.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (7 preceding siblings ...)
  2022-09-20  7:00 ` crazylht at gmail dot com
@ 2022-09-21 10:44 ` pilarlatiesa at gmail dot com
  2022-09-22  1:09 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: pilarlatiesa at gmail dot com @ 2022-09-21 10:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #9 from Pilar Latiesa <pilarlatiesa at gmail dot com> ---
Thans so much for the fix.

I have tested last night (20220920) snapshot and I can confirm the testcase is
vectorized.

However, for the vectorization to kick in I had to use -fno-trapping-math. I
assume that's intentional. Could you please explain why it is necessary?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (8 preceding siblings ...)
  2022-09-21 10:44 ` pilarlatiesa at gmail dot com
@ 2022-09-22  1:09 ` crazylht at gmail dot com
  2022-09-22  1:27 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-22  1:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
Because the round insn does not trap on denormals.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (9 preceding siblings ...)
  2022-09-22  1:09 ` crazylht at gmail dot com
@ 2022-09-22  1:27 ` crazylht at gmail dot com
  2022-09-22  7:45 ` pilarlatiesa at gmail dot com
  2022-09-22  7:52 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-22  1:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #11 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #10)
> Because the round insn does not trap on denormals.

Note for scalar version, gcc generates roundss $9, %xmm0, %xmm0 which doesn't
raise exception because it's allowed by the below option.
-----
ffp-int-builtin-inexact
Common Var(flag_fp_int_builtin_inexact) Init(1) Optimization
Allow built-in functions ceil, floor, round, trunc to raise \"inexact\"
exceptions.
-----

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (10 preceding siblings ...)
  2022-09-22  1:27 ` crazylht at gmail dot com
@ 2022-09-22  7:45 ` pilarlatiesa at gmail dot com
  2022-09-22  7:52 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: pilarlatiesa at gmail dot com @ 2022-09-22  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #12 from Pilar Latiesa <pilarlatiesa at gmail dot com> ---
Wouldn't it make sense for scalar and vector versions to be affected by the
same options?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Bug target/106910] roundss not vectorized
  2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
                   ` (11 preceding siblings ...)
  2022-09-22  7:45 ` pilarlatiesa at gmail dot com
@ 2022-09-22  7:52 ` crazylht at gmail dot com
  12 siblings, 0 replies; 14+ messages in thread
From: crazylht at gmail dot com @ 2022-09-22  7:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106910

--- Comment #13 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Pilar Latiesa from comment #12)
> Wouldn't it make sense for scalar and vector versions to be affected by the
> same options?

I guess the reason is we don't have __builtin_floorf for vector version, and
that option is only apply to direct call for builtin function.

I'm not sure should we apply that to internal fn either(to make vector version
behave the same as scalar version).

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-09-22  7:52 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-12 16:15 [Bug c++/106910] New: roundss not vectorized pilarlatiesa at gmail dot com
2022-09-12 16:35 ` [Bug target/106910] " pinskia at gcc dot gnu.org
2022-09-13  6:31 ` rguenth at gcc dot gnu.org
2022-09-13  7:30 ` crazylht at gmail dot com
2022-09-15  9:35 ` crazylht at gmail dot com
2022-09-15 11:40 ` rguenther at suse dot de
2022-09-16  7:45 ` cvs-commit at gcc dot gnu.org
2022-09-20  6:54 ` cvs-commit at gcc dot gnu.org
2022-09-20  7:00 ` crazylht at gmail dot com
2022-09-21 10:44 ` pilarlatiesa at gmail dot com
2022-09-22  1:09 ` crazylht at gmail dot com
2022-09-22  1:27 ` crazylht at gmail dot com
2022-09-22  7:45 ` pilarlatiesa at gmail dot com
2022-09-22  7:52 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).