public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/54346] New: combine permutations
@ 2012-08-21 14:02 glisse at gcc dot gnu.org
  2021-08-25  7:26 ` [Bug tree-optimization/54346] " pinskia at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2012-08-21 14:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

             Bug #: 54346
           Summary: combine permutations
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: glisse@gcc.gnu.org


Hello,

when we have two VEC_PERM_EXPR with constant mask, where one is the only user
of the result of the other one, it would be good to compose/merge them into a
single VEC_PERM_EXPR. However, it is too hard for backends to always generate
optimal code for shuffles, so we want to do the optimization only if we know it
actually helps. Currently this means when the composed permutation is the
identity. In the future, it could mean asking the backend.

See the conversation that started at:
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00676.html

and around this message for cost hooks (which could also help the vectorizer):
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00973.html

Related bug is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147 but that one
is about RTL (unless x86 eventually follows ARM and decides to implement _mm_*
functions in terms of __builtin_shuffle).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
@ 2021-08-25  7:26 ` pinskia at gcc dot gnu.org
  2021-08-25  9:02 ` pinskia at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  7:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 102056 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
  2021-08-25  7:26 ` [Bug tree-optimization/54346] " pinskia at gcc dot gnu.org
@ 2021-08-25  9:02 ` pinskia at gcc dot gnu.org
  2022-06-07  9:10 ` crazylht at gmail dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Here is example code which should produce the same code:
typedef int v4si __attribute__((vector_size (16)));

v4si
foo (v4si a, v4si b)
{
    v4si c = __builtin_shuffle (a, b, __extension__ (v4si) {1, 4, 2, 7});
    v4si d = __builtin_shuffle (c, __extension__ (v4si) { 3, 2, 0, 1 });
    return d;
}

typedef int v4si __attribute__((vector_size (16)));

v4si
foo1 (v4si a, v4si b)
{
    v4si c = __builtin_shuffle (a, b, __extension__ (v4si){ 7, 2, 1, 4 });
    return c;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
  2021-08-25  7:26 ` [Bug tree-optimization/54346] " pinskia at gcc dot gnu.org
  2021-08-25  9:02 ` pinskia at gcc dot gnu.org
@ 2022-06-07  9:10 ` crazylht at gmail dot com
  2022-10-11  6:12 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-06-07  9:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #4 from Hongtao.liu <crazylht at gmail dot com> ---
Now more x86 shuffle instrinsics are folded into gimple VEC_PERM_EXPR, I guess
we need some Gimple-level pattern match to simplify successive vec_perm_expr.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-06-07  9:10 ` crazylht at gmail dot com
@ 2022-10-11  6:12 ` cvs-commit at gcc dot gnu.org
  2022-10-11  6:21 ` glisse at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-10-11  6:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:b88adba751da635c6f0c353c5bc51bbe2ecf4c89

commit r13-3212-gb88adba751da635c6f0c353c5bc51bbe2ecf4c89
Author: Liwei Xu <liwei.xu@intel.com>
Date:   Fri Sep 23 13:46:02 2022 +0800

    Optimize nested permutation to single VEC_PERM_EXPR [PR54346]

            This patch implemented the optimization in PR 54346, which Merges

            c = VEC_PERM_EXPR <a, b, VCST0>;
            d = VEC_PERM_EXPR <c, c, VCST1>;
                    to
            d = VEC_PERM_EXPR <a, b, NEW_VCST>;

            Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
            tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it
            is ok to removed it.

    gcc/ChangeLog:

            PR tree-optimization/54346
            * match.pd: Merge the index of VCST then generates the new
vec_perm.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr54346.c: New test.

    Co-authored-by: liuhongt <hongtao.liu@intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-10-11  6:12 ` cvs-commit at gcc dot gnu.org
@ 2022-10-11  6:21 ` glisse at gcc dot gnu.org
  2022-10-11  6:36 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: glisse at gcc dot gnu.org @ 2022-10-11  6:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #6 from Marc Glisse <glisse at gcc dot gnu.org> ---
The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any xfail
or anything. Does it only fail because gimple-simplify leaves some dead code
around, so you could update the test to scan the next DCE pass dump instead of
forwprop1? Or are we missing a transformation that just detects a VEC_PERM_EXPR
with an identity permutation?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-10-11  6:21 ` glisse at gcc dot gnu.org
@ 2022-10-11  6:36 ` crazylht at gmail dot com
  2022-10-21  7:17 ` cvs-commit at gcc dot gnu.org
  2022-10-25  5:53 ` crazylht at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-10-11  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Marc Glisse from comment #6)
> The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any
> xfail or anything. Does it only fail because gimple-simplify leaves some
> dead code around, so you could update the test to scan the next DCE pass
> dump instead of forwprop1? Or are we missing a transformation that just
> detects a VEC_PERM_EXPR with an identity permutation?

Uoops, I didn't notice this, will add an incremental patch to handle the
indentical index (forwporp-19.c) scenario, sorry.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-10-11  6:36 ` crazylht at gmail dot com
@ 2022-10-21  7:17 ` cvs-commit at gcc dot gnu.org
  2022-10-25  5:53 ` crazylht at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-10-21  7:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:fa553ff26d96f6fecaa8f1b00649cfdc6cda5f5a

commit r13-3430-gfa553ff26d96f6fecaa8f1b00649cfdc6cda5f5a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Fri Oct 21 09:16:44 2022 +0200

    match.pd: Fix up gcc.dg/pr54346.c on i686-linux [PR54346]

    The pr54346.c testcase FAILs on i686-linux (without -msse*) for multiple
    reasons.  One is the trivial missing -Wno-psabi which the following patch
    adds, but that isn't enough.  The thing is that without native vector
    support, we have VEC_PERM_EXPRs in the IL and are actually considering
    the nested VEC_PERM_EXPRs into one VEC_PERM_EXPR optimization, but punt
    because can_vec_perm_const_p (result_mode, op_mode, sel2, false) is false.

    Such a test makes sense to prevent "optimizing" two VEC_PERM_EXPRs
    that can be handled by the backend natively into one VEC_PERM_EXPR
    that can't be handled.  But if both of the original VEC_PERM_EXPRs
    can't be handled natively either, having just one VEC_PERM_EXPR that will
be
    lowered by generic vec lowering is IMHO still better than 2.
    Or even if we trade just one VEC_PERM_EXPR that can't be handled plus
    one that can to one that can't be handled.

    Also, removing the testcase's executable permissions...

    2022-10-21  <jakub@redhat.com>

            PR tree-optimization/54346
            * match.pd ((vec_perm (vec_perm@0 @1 @2 VECTOR_CST) @0
VECTOR_CST)):
            Optimize nested VEC_PERM_EXPRs even if target can't handle the
            new one provided we don't increase number of VEC_PERM_EXPRs the
            target can't handle.

            * gcc.dg/pr54346.c: Add -Wno-psabi to dg-options.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug tree-optimization/54346] combine permutations
  2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2022-10-21  7:17 ` cvs-commit at gcc dot gnu.org
@ 2022-10-25  5:53 ` crazylht at gmail dot com
  7 siblings, 0 replies; 9+ messages in thread
From: crazylht at gmail dot com @ 2022-10-25  5:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

--- Comment #9 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #7)
> (In reply to Marc Glisse from comment #6)
> > The log says that this breaks tree-ssa/forwprop-19.c, but I don't see any
> > xfail or anything. Does it only fail because gimple-simplify leaves some
> > dead code around, so you could update the test to scan the next DCE pass
> > dump instead of forwprop1? Or are we missing a transformation that just
> > detects a VEC_PERM_EXPR with an identity permutation?
> 
> Uoops, I didn't notice this, will add an incremental patch to handle the
> indentical index (forwporp-19.c) scenario, sorry.

It's fixed.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-25  5:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-21 14:02 [Bug tree-optimization/54346] New: combine permutations glisse at gcc dot gnu.org
2021-08-25  7:26 ` [Bug tree-optimization/54346] " pinskia at gcc dot gnu.org
2021-08-25  9:02 ` pinskia at gcc dot gnu.org
2022-06-07  9:10 ` crazylht at gmail dot com
2022-10-11  6:12 ` cvs-commit at gcc dot gnu.org
2022-10-11  6:21 ` glisse at gcc dot gnu.org
2022-10-11  6:36 ` crazylht at gmail dot com
2022-10-21  7:17 ` cvs-commit at gcc dot gnu.org
2022-10-25  5:53 ` crazylht at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).