From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 528C03858422 for ; Tue, 15 Aug 2023 09:05:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 528C03858422 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5DBFF1063; Tue, 15 Aug 2023 02:05:49 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2E37D3F762; Tue, 15 Aug 2023 02:05:06 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,"Kewen.Lin" , Stefan Schulze Frielinghaus , "juzhe.zhong\@rivai.ai" , Robin Dapp , GCC Patches , richard.sandiford@arm.com Cc: "Kewen.Lin" , Stefan Schulze Frielinghaus , "juzhe.zhong\@rivai.ai" , Robin Dapp , GCC Patches Subject: Re: [PATCH V4] VECT: Support loop len control on EXTRACT_LAST vectorization References: <20230811142448.2913622-1-juzhe.zhong@rivai.ai> <9eb77e11-ef32-992e-71ca-780950d7cf74@linux.ibm.com> <785b260f-1854-f461-be75-cdbfbcadb9bf@gmail.com> <8a9c4ab0-81e5-b516-470c-03dc48b8f337@linux.ibm.com> <3b7fafe9-fcc9-6ce7-f86d-5d4bdad72e3f@linux.ibm.com> Date: Tue, 15 Aug 2023 10:05:04 +0100 In-Reply-To: (Richard Biener's message of "Tue, 15 Aug 2023 08:55:10 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-25.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Tue, 15 Aug 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Tue, 15 Aug 2023, Kewen.Lin wrote: >> > >> >> Hi Stefan, >> >> >> >> on 2023/8/15 02:51, Stefan Schulze Frielinghaus wrote: >> >> > Hi everyone, >> >> > >> >> > I have bootstrapped and regtested the patch below on s390. For the >> >> > 64-bit target I do not see any changes regarding the testsuite. For the >> >> > 31-bit target I see the following failures: >> >> > >> >> > FAIL: gcc.dg/vect/no-scevccp-outer-14.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/no-scevccp-outer-14.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr50451.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr50451.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr50451.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr50451.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr53773.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr53773.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr71407.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr71407.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr71407.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr71407.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr71416-1.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr71416-1.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr94443.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr94443.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr94443.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr97558.c (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr97558.c (test for excess errors) >> >> > FAIL: gcc.dg/vect/pr97558.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/pr97558.c -flto -ffat-lto-objects (test for excess errors) >> >> > FAIL: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects (internal compiler error: in require, at machmode.h:313) >> >> > FAIL: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects (test for excess errors) >> >> > UNRESOLVED: gcc.dg/vect/no-scevccp-outer-14.c compilation failed to produce executable >> >> > UNRESOLVED: gcc.dg/vect/pr53773.c -flto -ffat-lto-objects scan-tree-dump-times optimized "\\* 10" 2 >> >> > UNRESOLVED: gcc.dg/vect/pr53773.c scan-tree-dump-times optimized "\\* 10" 2 >> >> > UNRESOLVED: gcc.dg/vect/pr71416-1.c -flto -ffat-lto-objects compilation failed to produce executable >> >> > UNRESOLVED: gcc.dg/vect/pr71416-1.c compilation failed to produce executable >> >> > UNRESOLVED: gcc.dg/vect/vect-reduc-pattern-3.c -flto -ffat-lto-objects compilation failed to produce executable >> >> > >> >> > I've randomely picked pr50451.c and ran gcc against it which results in: >> >> > >> >> > during GIMPLE pass: vect >> >> > dump file: pr50451.c.174t.vect >> >> > /gcc-verify-workdir/patched/src/gcc/testsuite/gcc.dg/vect/pr50451.c: In function ?foo?: >> >> > /gcc-verify-workdir/patched/src/gcc/testsuite/gcc.dg/vect/pr50451.c:5:1: internal compiler error: in require, at machmode.h:313 >> >> > 0x1265d21 opt_mode::require() const >> >> > /gcc-verify-workdir/patched/src/gcc/machmode.h:313 >> >> > 0x1d7e4e9 opt_mode::require() const >> >> > /gcc-verify-workdir/patched/src/gcc/vec.h:955 >> >> > 0x1d7e4e9 vect_verify_loop_lens >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:1471 >> >> > 0x1da29ab vect_analyze_loop_2 >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:2929 >> >> > 0x1da40c7 vect_analyze_loop_1 >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:3330 >> >> > 0x1da499d vect_analyze_loop(loop*, vec_info_shared*) >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vect-loop.cc:3484 >> >> > 0x1deed27 try_vectorize_loop_1 >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1064 >> >> > 0x1deed27 try_vectorize_loop >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1180 >> >> > 0x1def5c1 execute >> >> > /gcc-verify-workdir/patched/src/gcc/tree-vectorizer.cc:1296 >> >> > Please submit a full bug report, with preprocessed source (by using -freport-bug). >> >> > Please include the complete backtrace with any bug report. >> >> > See for instructions. >> >> > >> >> >> >> It looks like s390 supports variable index vec_extract at -m31 but >> >> no vector with length. It seems we need to further check the vector >> >> with length capability, with something like: >> >> >> >> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc >> >> index 5ae9f69c7eb..ef754467baf 100644 >> >> --- a/gcc/tree-vect-loop.cc >> >> +++ b/gcc/tree-vect-loop.cc >> >> @@ -10327,10 +10327,11 @@ vectorizable_live_operation (vec_info *vinfo, stmt_vec_info stmt_info, >> >> vect_record_loop_mask (loop_vinfo, >> >> &LOOP_VINFO_MASKS (loop_vinfo), >> >> 1, vectype, NULL); >> >> - else if (can_vec_extract_var_idx_p ( >> >> + else if (get_len_load_store_mode (TYPE_MODE (vectype), true) >> >> + .exists () >> >> + && can_vec_extract_var_idx_p ( >> >> TYPE_MODE (vectype), TYPE_MODE (TREE_TYPE (vectype)))) >> >> - vect_record_loop_len (loop_vinfo, >> >> - &LOOP_VINFO_LENS (loop_vinfo), >> >> + vect_record_loop_len (loop_vinfo, &LOOP_VINFO_LENS (loop_vinfo), >> >> 1, vectype, 1); >> >> else >> >> { >> >> >> >> sigh, the formatting looks odd. >> > >> > I think the error is in vect_verify_loop_lens which assumes that >> > when we record _any_ length related op the target has to support >> > both len_load and len_store. Now that we have many other _len >> > functions that's certainly not true. >> > >> > Instead a vect_verify_loop_lens-local "fix" would be to not use >> > .require () but instead when !.exists () simply return false. >> > That would still effectively require both len-load and len-store >> > for any -len predicated loop, but at least avoid the ICE. >> >> Yeah, agree that would be the simplest workaround. But I think >> instead we should require vectorizable_load and vectorizable_store >> to record the bias that they want to use (perhaps in a hash_set?). >> Then vect_verify_loop_lens can return false if the set has more >> than one element. It can use a bias of 0 if the set is empty. > > But with all the other _LEN fns now also having a bias, never > quering it but using the one from len_{load,store} having > that supported looks like a requirement? OTOH if we would require > querying it for each used _LEN fn then supporting multiple > different biases wouldn't be an issue either? Yeah, the OTOH is what I think we should do. The bias stuff dates back to when loads and stores were the only LEN_* operations. The reason it checks both loads and stores is to ensure consistency between "all" length operations. If instead we took the position that LEN_LOAD determines the biases for everything, we wouldn't need to check LEN_STORE biases. But the biases need to be consistent because they're built into the IVs. In principle we could create a different LEN input for each bias, but that's pointless when no target needs it. Thanks, Richard