From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 83A623858D1E for ; Wed, 14 Jun 2023 09:46:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 83A623858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 6C47122420; Wed, 14 Jun 2023 09:46:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1686735978; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=i4HC3zlJybGjcgAL6bnw9Z+USmwdc4+RFLuPCMcW24M=; b=bweFHK/u83ov3zzh88m1EjVk1dl3gMFkNDamT0+lKTsX0F4NhIPiPJk7wTty5d7NzbCV2q 59s5yjjEtx4MjZhsgbswJS3UXjOfgjwP4QtNA3LPojHjBCNTOwEkVebk+7McVpuiBdiGUZ is+8tzH6WJZcgAJknPzgrS4pFhH19mo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1686735978; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=i4HC3zlJybGjcgAL6bnw9Z+USmwdc4+RFLuPCMcW24M=; b=OES/K9NoTT1iYWK9Pg84nQCKFyj0E44atPWFvnPSTe335njPtzRC3uTqHOkh3/iYIlIU/Y u4ISfeGY9N2dDkAA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 5B6492C142; Wed, 14 Jun 2023 09:46:18 +0000 (UTC) Date: Wed, 14 Jun 2023 09:46:18 +0000 (UTC) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] [RFC] main loop masked vectorization with --param vect-partial-vector-usage=1 User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_NUMSUBJECT,MISSING_MID,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Message-ID: <20230614094618.HQS-t5L4ZwiMOKBaq5ox7NM-vNiOMiUodVk6O3PO8Ag@z> Currently vect_determine_partial_vectors_and_peeling will decide to apply fully masking to the main loop despite --param vect-partial-vector-usage=1 when the currently analyzed vector mode results in a vectorization factor that's bigger than the number of scalar iterations. That's undesirable for targets where a vector mode can handle both partial vector and non-partial vector vectorization. I understand that for AARCH64 we have SVE and NEON but SVE can only do partial vector and NEON only non-partial vector vectorization, plus the target chooses to let cost comparison decide the vector mode to use. For x86 and the upcoming AVX512 partial vector support the story is different, the target chooses the first (and largest) vector mode that can successfully used for vectorization. But that means with --param vect-partial-vector-usage=1 we will always choose AVX512 with partial vectors for the main loop even if, for example, V4SI would be a perfect fit with full vectors and no required epilog! The following tries to find the appropriate condition for this - I suppose simply refusing to set LOOP_VINFO_USING_PARTIAL_VECTORS_P on the main loop when --param vect-partial-vector-usage=1 will hurt AARCH64? Incidentially looking up the docs for vect-partial-vector-usage suggests that it's not supposed to control epilog vectorization but instead "1 allows partial vector loads and stores if vectorization removes the need for the code to iterate". That's probably OK in the end but if there's a fixed size vector mode that allows the same thing without using masking that would be better. I wonder if we should special-case known niter (bounds) somehow when analyzing the vector modes and override the targets sorting? Maybe we want a new --param in addition to vect-epilogues-nomask and vect-partial-vector-usage to say we want masked epilogues? * tree-vect-loop.cc (vect_determine_partial_vectors_and_peeling): For non-VLA vectorization interpret param_vect_partial_vector_usage == 1 as only applying to epilogues. --- gcc/tree-vect-loop.cc | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 9be66b8fbc5..9323aa572d4 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -2478,7 +2478,15 @@ vect_determine_partial_vectors_and_peeling (loop_vec_info loop_vinfo, && !LOOP_VINFO_EPILOGUE_P (loop_vinfo) && !vect_known_niters_smaller_than_vf (loop_vinfo)) LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; - else + /* Avoid using a large fixed size vectorization mode with masking + for the main loop when we were asked to only use masking for + the epilog. + ??? Ideally we'd start analysis with a better sized mode, + the param_vect_partial_vector_usage == 2 case suffers from + this as well. But there's a catch-22. */ + else if (!(!LOOP_VINFO_EPILOGUE_P (loop_vinfo) + && param_vect_partial_vector_usage == 1 + && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = true; } -- 2.35.3