From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <hjl.tools@gmail.com>
Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com
 [IPv6:2607:f8b0:4864:20::435])
 by sourceware.org (Postfix) with ESMTPS id CFF843858C3A
 for <gcc-patches@gcc.gnu.org>; Tue, 21 Sep 2021 14:55:52 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CFF843858C3A
Received: by mail-pf1-x435.google.com with SMTP id 203so11848521pfy.13
 for <gcc-patches@gcc.gnu.org>; Tue, 21 Sep 2021 07:55:52 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=vTWjK5WKseB+MrmH+kisTdnR4kglmvhPIgVDbaung+A=;
 b=HYRNwI16+6X9H7IOoibn/zjc7eeumQI59Qg1jpyQ/GUHIYMhXFVqZdiiMl2UGL5hPx
 pek8QAvBT48jOa2rIvvcyxzsTeAatXIxHVyizvmRaGUy0HF5IJZ+HjTsSVTtW2Vjzl3P
 4k9aQOPT4HOG0G5HUIGzfrd36TtJJt0nFyDcfguDq4qcn6An508CDEWBnikkgNxWNse6
 U94+/O4qKnoAnI91++6lrrvN8/HOC+g3qeCgfp3/kV4Jo0fSLLhpJCsw2AfxzPnO6TtO
 NnBiH67hBVH/KB1NwHmneoRODICcuTflLdy0a0fP8uZQ9wd7Cr5ZfLZGOM4+koVjXw0C
 fF9A==
X-Gm-Message-State: AOAM531Pc4Gg2aTG1C+VjMqXmJMOh30pDozC0pdHI8LHSscw1klpcXwR
 gUp29glfeCfekzQ3n57VtpG+1Z11pFDDUCKMzOU=
X-Google-Smtp-Source: ABdhPJw3qoYaYDX0k+OH0x+cz+ozCx60y4sU0oHxtVcfaaEUXgo7a/5657nKv3k9ehPeBXuYkzIExbDW+wwmbNtyTLI=
X-Received: by 2002:a63:8f0e:: with SMTP id n14mr24747497pgd.75.1632236151538; 
 Tue, 21 Sep 2021 07:55:51 -0700 (PDT)
MIME-Version: 1.0
References: <op1o7789-5ns6-q56-6on9-948o6o2q83r1@fhfr.qr>
In-Reply-To: <op1o7789-5ns6-q56-6on9-948o6o2q83r1@fhfr.qr>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Tue, 21 Sep 2021 07:55:15 -0700
Message-ID: <CAMe9rOoiX0bJ3_pZOkOC0V-u_H09FjkEtW6tYrrZAAMHL2dMLQ@mail.gmail.com>
Subject: Re: [PATCH] Allow different vector types for stmt groups
To: Richard Biener <rguenther@suse.de>, Hongtao Liu <crazylht@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
 Richard Sandiford <richard.sandiford@arm.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-3030.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Sep 2021 14:55:55 -0000

On Mon, Sep 20, 2021 at 5:15 AM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This allows vectorization (in practice non-loop vectorization) to
> have a stmt participate in different vector type vectorizations.
> It allows us to remove vect_update_shared_vectype and replace it
> by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
> vect_analyze_stmt and vect_transform_stmt.
>
> For data-ref the situation is a bit more complicated since we
> analyze alignment info with a specific vector type in mind which
> doesn't play well when that changes.
>
> So the bulk of the change is passing down the actual vector type
> used for a vectorized access to the various accessors of alignment
> info, first and foremost dr_misalignment but also aligned_access_p,
> known_alignment_for_access_p, vect_known_alignment_in_bytes and
> vect_supportable_dr_alignment.  I took the liberty to replace
> ALL_CAPS macro accessors with the lower-case function invocations.
>
> The actual changes to the behavior are in dr_misalignment which now
> is the place factoring in the negative step adjustment as well as
> handling alignment queries for a vector type with bigger alignment
> requirements than what we can (or have) analyze(d).
>
> vect_slp_analyze_node_alignment makes use of this and upon receiving
> a vector type with a bigger alingment desire re-analyzes the DR
> with respect to it but keeps an older more precise result if possible.
> In this context it might be possible to do the analysis just once
> but instead of analyzing with respect to a specific desired alignment
> look for the biggest alignment we can compute a not unknown alignment.
>
> The ChangeLog includes the functional changes but not the bulk due
> to the alignment accessor API changes - I hope that's something good.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, testing on SPEC
> CPU 2017 in progress (for stats and correctness).
>
> Any comments?
>
> Thanks,
> Richard.
>
> 2021-09-17  Richard Biener  <rguenther@suse.de>
>
>         PR tree-optimization/97351
>         PR tree-optimization/97352
>         PR tree-optimization/82426
>         * tree-vectorizer.h (dr_misalignment): Add vector type
>         argument.
>         (aligned_access_p): Likewise.
>         (known_alignment_for_access_p): Likewise.
>         (vect_supportable_dr_alignment): Likewise.
>         (vect_known_alignment_in_bytes): Likewise.  Refactor.
>         (DR_MISALIGNMENT): Remove.
>         (vect_update_shared_vectype): Likewise.
>         * tree-vect-data-refs.c (dr_misalignment): Refactor, handle
>         a vector type with larger alignment requirement and apply
>         the negative step adjustment here.
>         (vect_calculate_target_alignment): Remove.
>         (vect_compute_data_ref_alignment): Get explicit vector type
>         argument, do not apply a negative step alignment adjustment
>         here.
>         (vect_slp_analyze_node_alignment): Re-analyze alignment
>         when we re-visit the DR with a bigger desired alignment but
>         keep more precise results from smaller alignments.
>         * tree-vect-slp.c (vect_update_shared_vectype): Remove.
>         (vect_slp_analyze_node_operations_1): Do not update the
>         shared vector type on stmts.
>         * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
>         vector type of an SLP node to the representative stmt-info.
>         (vect_transform_stmt): Likewise.
>
>         * gcc.target/i386/vect-pr82426.c: New testcase.
>         * gcc.target/i386/vect-pr97352.c: Likewise.
> ---
>  gcc/testsuite/gcc.target/i386/vect-pr82426.c |  32 +++
>  gcc/testsuite/gcc.target/i386/vect-pr97352.c |  22 ++
>  gcc/tree-vect-data-refs.c                    | 217 ++++++++++---------
>  gcc/tree-vect-slp.c                          |  59 -----
>  gcc/tree-vect-stmts.c                        |  77 ++++---
>  gcc/tree-vectorizer.h                        |  32 ++-
>  6 files changed, 231 insertions(+), 208 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr82426.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/vect-pr97352.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-pr82426.c b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> new file mode 100644
> index 00000000000..741a1d14d36
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-pr82426.c
> @@ -0,0 +1,32 @@
> +/* i?86 does not have V2SF, x32 does though.  */
> +/* { dg-do compile { target { lp64 || x32 } } } */

It should be target { ! ia32 }

> +/* ???  With AVX512 we only realize one FMA opportunity.  */

Hongtao, is AVX512 missing 64-bit vector support??

> +/* { dg-options "-O3 -mavx -mfma -mno-avx512f" } */
> +
> +struct Matrix
> +{
> +  float m11;
> +  float m12;
> +  float m21;
> +  float m22;
> +  float dx;
> +  float dy;
> +};
> +
> +struct Matrix multiply(const struct Matrix *a, const struct Matrix *b)
> +{
> +  struct Matrix out;
> +  out.m11 = a->m11*b->m11 + a->m12*b->m21;
> +  out.m12 = a->m11*b->m12 + a->m12*b->m22;
> +  out.m21 = a->m21*b->m11 + a->m22*b->m21;
> +  out.m22 = a->m21*b->m12 + a->m22*b->m22;
> +
> +  out.dx = a->dx*b->m11  + a->dy*b->m21 + b->dx;
> +  out.dy = a->dx*b->m12  + a->dy*b->m22 + b->dy;
> +  return out;
> +}
> +
> +/* The whole kernel should be vectorized with V4SF and V2SF operations.  */
> +/* { dg-final { scan-assembler-times "vadd" 1 } } */
> +/* { dg-final { scan-assembler-times "vmul" 2 } } */
> +/* { dg-final { scan-assembler-times "vfma" 2 } } */

-- 
H.J.