From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id E534F3858C27 for ; Wed, 13 Oct 2021 17:03:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E534F3858C27 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.cz Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id CADF11FD89 for ; Wed, 13 Oct 2021 17:03:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1634144596; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RnURIdKx3/H1blz5V+r2mD5dGmniKXipJSa718rxkec=; b=dw1v3wLkeYzgm9lM5tsQU1IsR3ppYe/NkOSf4l33egIjYjjaX6b13neUQUCaIAZB6h8y0W BlKjWIxZyvIgfSNdus4CaD7BHGHXYzSe66rufd/lQ/nVmAIc9kNd15dfKxaQg+/5NzS7jW z+2A+gH3B7RPs4fsmRwQOq4LG/1oKoU= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1634144596; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RnURIdKx3/H1blz5V+r2mD5dGmniKXipJSa718rxkec=; b=R7TG8T/C+lH88x7r3KkESHWRYZhusTkpmlifipOFSBg7zzwJ3HdauyMuRX3oZubltdqckJ 4IeFqBDMkhqdrwBw== Received: from suse.cz (virgil.suse.cz [10.100.13.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id BC6CFA3B83; Wed, 13 Oct 2021 17:03:16 +0000 (UTC) From: Martin Jambor To: Richard Biener Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Allow different vector types for stmt groups In-Reply-To: <9o1699p0-40no-582-6rr2-o4n9snrq282@fhfr.qr> References: <839254rr-2sns-3p81-1026-2qqnrn13q35q@fhfr.qr> <9o1699p0-40no-582-6rr2-o4n9snrq282@fhfr.qr> User-Agent: Notmuch/0.33.2 (https://notmuchmail.org) Emacs/27.2 (x86_64-suse-linux-gnu) Date: Wed, 13 Oct 2021 19:03:16 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Oct 2021 17:03:20 -0000 Hi, On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote: > [...] > > The following is what I have pushed after re-bootstrapping and testing > on x86_64-unknown-linux-gnu. > > Richard. > > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001 > From: Richard Biener > Date: Wed, 18 Nov 2020 09:36:57 +0100 > Subject: [PATCH] Allow different vector types for stmt groups > To: gcc-patches@gcc.gnu.org > > This allows vectorization (in practice non-loop vectorization) to > have a stmt participate in different vector type vectorizations. > It allows us to remove vect_update_shared_vectype and replace it > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around > vect_analyze_stmt and vect_transform_stmt. > > For data-ref the situation is a bit more complicated since we > analyze alignment info with a specific vector type in mind which > doesn't play well when that changes. > > So the bulk of the change is passing down the actual vector type > used for a vectorized access to the various accessors of alignment > info, first and foremost dr_misalignment but also aligned_access_p, > known_alignment_for_access_p, vect_known_alignment_in_bytes and > vect_supportable_dr_alignment. I took the liberty to replace > ALL_CAPS macro accessors with the lower-case function invocations. > > The actual changes to the behavior are in dr_misalignment which now > is the place factoring in the negative step adjustment as well as > handling alignment queries for a vector type with bigger alignment > requirements than what we can (or have) analyze(d). > > vect_slp_analyze_node_alignment makes use of this and upon receiving > a vector type with a bigger alingment desire re-analyzes the DR > with respect to it but keeps an older more precise result if possible. > In this context it might be possible to do the analysis just once > but instead of analyzing with respect to a specific desired alignment > look for the biggest alignment we can compute a not unknown alignment. > > The ChangeLog includes the functional changes but not the bulk due > to the alignment accessor API changes - I hope that's something good. > > 2021-09-17 Richard Biener > > PR tree-optimization/97351 > PR tree-optimization/97352 > PR tree-optimization/82426 > * tree-vectorizer.h (dr_misalignment): Add vector type > argument. > (aligned_access_p): Likewise. > (known_alignment_for_access_p): Likewise. > (vect_supportable_dr_alignment): Likewise. > (vect_known_alignment_in_bytes): Likewise. Refactor. > (DR_MISALIGNMENT): Remove. > (vect_update_shared_vectype): Likewise. > * tree-vect-data-refs.c (dr_misalignment): Refactor, handle > a vector type with larger alignment requirement and apply > the negative step adjustment here. > (vect_calculate_target_alignment): Remove. > (vect_compute_data_ref_alignment): Get explicit vector type > argument, do not apply a negative step alignment adjustment > here. > (vect_slp_analyze_node_alignment): Re-analyze alignment > when we re-visit the DR with a bigger desired alignment but > keep more precise results from smaller alignments. > * tree-vect-slp.c (vect_update_shared_vectype): Remove. > (vect_slp_analyze_node_operations_1): Do not update the > shared vector type on stmts. > * tree-vect-stmts.c (vect_analyze_stmt): Push/pop the > vector type of an SLP node to the representative stmt-info. > (vect_transform_stmt): Likewise. I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP 433.milc bechmark when compiled with -Ofast -march=native -flto to this commit. See also: https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0&plot.1=289.70.0& I am not sure if a bugzilla bug is in order because I cannot reproduce the regression neither on an AMD zen3 machine nor on Intel CascadeLake, because the history of the benchmark performance and because I know milc can be sensitive to conditions outside our control. And the list of dependencies of PR 26163 is long enough as it is. OTOH, the regression reproduces reliably for me. Some relevant perf data: BEFORE: # Samples: 585K of event 'cycles:u' # Event count (approx.): 472738682838 # # Overhead Samples Command Shared Object Symbol # ........ ............ ............... ...................... ......................................... # 24.59% 140397 milc_peak.mine- milc_peak.mine-lto-nat [.] u_shift_fermion 18.47% 105497 milc_peak.mine- milc_peak.mine-lto-nat [.] add_force_to_mom 15.97% 96343 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_na 15.29% 90027 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_nn 5.55% 35114 milc_peak.mine- milc_peak.mine-lto-nat [.] path_product 4.75% 27693 milc_peak.mine- milc_peak.mine-lto-nat [.] compute_gen_staple 2.76% 16109 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_an 2.42% 14255 milc_peak.mine- milc_peak.mine-lto-nat [.] imp_gauge_force.constprop.0 2.02% 11561 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_adj_su3_mat_4vec AFTER: # Samples: 634K of event 'cycles:u' # Event count (approx.): 513635733685 # # Overhead Samples Command Shared Object Symbol # ........ ............ ............... ...................... ......................................... # 24.04% 149010 milc_peak.mine- milc_peak.mine-lto-nat [.] add_force_to_mom 23.76% 147370 milc_peak.mine- milc_peak.mine-lto-nat [.] u_shift_fermion 14.19% 90929 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_nn 14.14% 92912 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_na 4.90% 33846 milc_peak.mine- milc_peak.mine-lto-nat [.] path_product 3.89% 24621 milc_peak.mine- milc_peak.mine-lto-nat [.] mult_su3_an 3.62% 22831 milc_peak.mine- milc_peak.mine-lto-nat [.] compute_gen_staple 2.05% 13215 milc_peak.mine- milc_peak.mine-lto-nat [.] imp_gauge_force.constprop.0 Martin