public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
@ 2020-04-23 13:26 ` rguenth at gcc dot gnu.org
  2021-01-13 10:45 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-23 13:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
So the issue is we're both not doing enough and too much, the half way early
optimizations do confuse us later.  Another such opportunity would maybe
be:

   short unsigned int _950;
  _950 = BIT_FIELD_REF <_58, 16, 240>;
  _253 = (unsigned char) _950;

where this is the only use of _950.  It might be tempting to "optimize"
this into

  _253 = BIT_FIELD_REF <_58, 8, 240>;

forwprop does similar transforms for loads of complex and vector (though
the above is not a load but the transform would extend to loads as well).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
  2020-04-23 13:26 ` [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang rguenth at gcc dot gnu.org
@ 2021-01-13 10:45 ` rguenth at gcc dot gnu.org
  2021-01-13 12:38 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-13 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #24 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 49958
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49958&action=edit
unincluded GCC source

The GCC source no longer compiles due to missing changes in the x86 intrinsic
includes in the preprocessed source:

...
/aux/hubicka/trunk-install2/lib/gcc/x86_64-pc-linux-gnu/10.0.0/include/avx512vlbwintrin.h:
In function 'void _mm_mask_cvtsepi16_storeu_epi8(void*, __mmask8, __m128i)':
/aux/hubicka/trunk-install2/lib/gcc/x86_64-pc-linux-gnu/10.0.0/include/avx512vlbwintrin.h:258:38:
error: cannot convert '__v8qi*' to 'long long unsigned int*'
<built-in>: note:   initializing argument 1 of 'void
__builtin_ia32_pmovswb128mem_mask(long long unsigned int*, __vector(8) short
int, unsigned char)'
In file included from
/aux/hubicka/trunk-install2/lib/gcc/x86_64-pc-linux-gnu/10.0.0/include/immintrin.h:69,
                 from
/aux/hubicka/firefox-2019-2/gfx/skia/skia/src/opts/SkOpts_
...

attached unincluded source that can be compiled with trunk and GCC 10
when using -march=haswell

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
  2020-04-23 13:26 ` [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang rguenth at gcc dot gnu.org
  2021-01-13 10:45 ` rguenth at gcc dot gnu.org
@ 2021-01-13 12:38 ` cvs-commit at gcc dot gnu.org
  2021-01-13 13:51 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-13 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:3ddc18251a821fe69d6229abbf83d77284d2340a

commit r11-6644-g3ddc18251a821fe69d6229abbf83d77284d2340a
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 13 12:40:01 2021 +0100

    tree-optimization/92645 - improve SLP with existing vectors

    This improves SLP discovery in the face of existing vectors allowing
    punning of the vector shape (or even punning from an integer type).
    For punning from integer types this does not yet handle lane zero
    extraction being represented as conversion rather than BIT_FIELD_REF.

    2021-01-13  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/92645
            * tree-vect-slp.c (vect_build_slp_tree_1): Relax supported
            BIT_FIELD_REF argument.
            (vect_build_slp_tree_2): Record the desired vector type
            on the external vector def.
            (vectorizable_slp_permutation): Handle required punning
            of existing vector defs.

            * gcc.target/i386/pr92645-6.c: New testcase.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2021-01-13 12:38 ` cvs-commit at gcc dot gnu.org
@ 2021-01-13 13:51 ` cvs-commit at gcc dot gnu.org
  2021-01-13 13:51 ` rguenth at gcc dot gnu.org
  2023-05-22 20:51 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-13 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

--- Comment #26 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:285fa338b06b804e72997c4d876ecf08a9c083af

commit r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 13 13:48:31 2021 +0100

    tree-optimization/92645 - avoid harmful early BIT_FIELD_REF
canonicalization

    This avoids canonicalizing BIT_FIELD_REF <T1> (a, <sz>, 0) to
    (T1)a on integer typed a.  This confuses the vectorizer SLP matching.

    With this delayed to after vector lowering the testcase in PR92645
    from Skia is now finally optimized to reasonable assembly.

    2021-01-13  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/92645
            * match.pd (BIT_FIELD_REF to conversion): Delay canonicalization
            until after vector lowering.

            * gcc.target/i386/pr92645-7.c: New testcase.
            * gcc.dg/tree-ssa/ssa-fre-54.c: Adjust.
            * gcc.dg/pr69047.c: Likewise.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2021-01-13 13:51 ` cvs-commit at gcc dot gnu.org
@ 2021-01-13 13:51 ` rguenth at gcc dot gnu.org
  2023-05-22 20:51 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-13 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #27 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang
       [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-01-13 13:51 ` rguenth at gcc dot gnu.org
@ 2023-05-22 20:51 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-22 20:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92645

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-22 20:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-92645-4@http.gcc.gnu.org/bugzilla/>
2020-04-23 13:26 ` [Bug tree-optimization/92645] Hand written vector code is 450 times slower when compiled with GCC compared to Clang rguenth at gcc dot gnu.org
2021-01-13 10:45 ` rguenth at gcc dot gnu.org
2021-01-13 12:38 ` cvs-commit at gcc dot gnu.org
2021-01-13 13:51 ` cvs-commit at gcc dot gnu.org
2021-01-13 13:51 ` rguenth at gcc dot gnu.org
2023-05-22 20:51 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).