From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 923803858D28 for ; Fri, 12 Aug 2022 10:00:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 923803858D28 Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 27C9ftH2028517 for ; Fri, 12 Aug 2022 10:00:52 GMT Received: from ppma03fra.de.ibm.com (6b.4a.5195.ip4.static.sl-reverse.com [149.81.74.107]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3hwmf1gdy1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 12 Aug 2022 10:00:52 +0000 Received: from pps.filterd (ppma03fra.de.ibm.com [127.0.0.1]) by ppma03fra.de.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 27C9rhHT022142 for ; Fri, 12 Aug 2022 10:00:49 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma03fra.de.ibm.com with ESMTP id 3huwvfta6f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 12 Aug 2022 10:00:49 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 27CA0kSD24445344 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 12 Aug 2022 10:00:46 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 67D26AE04D; Fri, 12 Aug 2022 10:00:46 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 35104AE051; Fri, 12 Aug 2022 10:00:46 +0000 (GMT) Received: from [9.171.46.216] (unknown [9.171.46.216]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTPS; Fri, 12 Aug 2022 10:00:46 +0000 (GMT) Message-ID: <4166b06c-7713-2d4a-3c86-54e99f4a9f53@linux.ibm.com> Date: Fri, 12 Aug 2022 12:00:45 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Content-Language: en-US From: Robin Dapp Subject: [PATCH] s390: Add -munroll-only-small-loops. To: GCC Patches Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: Ah8Os3LqrFjcSaxr5jOKqED8e1i_y-ji X-Proofpoint-GUID: Ah8Os3LqrFjcSaxr5jOKqED8e1i_y-ji X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.883,Hydra:6.0.517,FMLib:17.11.122.1 definitions=2022-08-12_06,2022-08-11_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 phishscore=0 adultscore=0 priorityscore=1501 suspectscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 malwarescore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2207270000 definitions=main-2208120026 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Aug 2022 10:00:59 -0000 Hi, inspired by Power we also introduce -munroll-only-small-loops. This implies activating -funroll-loops and -munroll-only-small-loops at -O2 and above. Bootstrapped and regtested. This introduces one regression in gcc.dg/sms-compare-debug-1.c but currently dumps for sms are broken as well. The difference is in the location of some INSN_DELETED notes so I would consider this a minor issue. Is it OK? Regards Robin gcc/ChangeLog: * common/config/s390/s390-common.cc: Enable -funroll-loops and -munroll-only-small-loops for OPT_LEVELS_2_PLUS_SPEED_ONLY. * config/s390/s390.cc (s390_loop_unroll_adjust): Do not unroll loops larger than 12 instructions. (s390_override_options_after_change): Set unroll options. (s390_option_override_internal): Likewise. * config/s390/s390.opt: Document munroll-only-small-loops. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-copysign.c: Do not unroll. * gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Dito. * gcc.target/s390/zvector/autovec-double-signaling-ltgt.c: Dito. * gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Dito. * gcc.target/s390/zvector/autovec-float-signaling-ltgt.c: Dito. --- gcc/common/config/s390/s390-common.cc | 5 +++ gcc/config/s390/s390.cc | 31 +++++++++++++++++++ gcc/config/s390/s390.opt | 4 +++ .../gcc.target/s390/vector/vec-copysign.c | 2 +- .../s390/zvector/autovec-double-quiet-uneq.c | 2 +- .../zvector/autovec-double-signaling-ltgt.c | 2 +- .../s390/zvector/autovec-float-quiet-uneq.c | 2 +- .../zvector/autovec-float-signaling-ltgt.c | 2 +- 8 files changed, 45 insertions(+), 5 deletions(-) diff --git a/gcc/common/config/s390/s390-common.cc b/gcc/common/config/s390/s390-common.cc index 72a5ef47eaac..be3e6f201429 100644 --- a/gcc/common/config/s390/s390-common.cc +++ b/gcc/common/config/s390/s390-common.cc @@ -64,6 +64,11 @@ static const struct default_options s390_option_optimization_table[] = /* Enable -fsched-pressure by default when optimizing. */ { OPT_LEVELS_1_PLUS, OPT_fsched_pressure, NULL, 1 }, + /* Enable -munroll-only-small-loops with -funroll-loops to unroll small + loops at -O2 and above by default. */ + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_funroll_loops, NULL, 1 }, + { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_munroll_only_small_loops, NULL, 1 }, + /* ??? There are apparently still problems with -fcaller-saves. */ { OPT_LEVELS_ALL, OPT_fcaller_saves, NULL, 0 }, diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc index 5644600edf3d..ef38fbe68c84 100644 --- a/gcc/config/s390/s390.cc +++ b/gcc/config/s390/s390.cc @@ -15457,6 +15457,21 @@ s390_loop_unroll_adjust (unsigned nunroll, struct loop *loop) if (s390_tune < PROCESSOR_2097_Z10) return nunroll; + if (unroll_only_small_loops) + { + /* Only unroll loops smaller than or equal to 12 insns. */ + const unsigned int small_threshold = 12; + + if (loop->ninsns > small_threshold) + return 0; + + /* ???: Make this dependent on the type of registers in + the loop. Increase the limit for vector registers. */ + const unsigned int max_insns = optimize >= 3 ? 36 : 24; + + nunroll = MIN (nunroll, max_insns / loop->ninsns); + } + /* Count the number of memory references within the loop body. */ bbs = get_loop_body (loop); subrtx_iterator::array_type array; @@ -15531,6 +15546,19 @@ static void s390_override_options_after_change (void) { s390_default_align (&global_options); + + /* Explicit -funroll-loops turns -munroll-only-small-loops off. */ + if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops) + || (OPTION_SET_P (flag_unroll_all_loops) + && flag_unroll_all_loops)) + { + if (!OPTION_SET_P (unroll_only_small_loops)) + unroll_only_small_loops = 0; + if (!OPTION_SET_P (flag_cunroll_grow_size)) + flag_cunroll_grow_size = 1; + } + else if (!OPTION_SET_P (flag_cunroll_grow_size)) + flag_cunroll_grow_size = flag_peel_loops || optimize >= 3; } static void @@ -15740,6 +15768,9 @@ s390_option_override_internal (struct gcc_options *opts, /* Set the default alignment. */ s390_default_align (opts); + /* Set unroll options. */ + s390_override_options_after_change (); + /* Call target specific restore function to do post-init work. At the moment, this just sets opts->x_s390_cost_pointer. */ s390_function_specific_restore (opts, opts_set, NULL); diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt index 9e8d3bfd404c..c375b9c5f729 100644 --- a/gcc/config/s390/s390.opt +++ b/gcc/config/s390/s390.opt @@ -321,3 +321,7 @@ and the default behavior is to emit separate multiplication and addition instructions for long doubles in vector registers, because measurements show that this improves performance. This option allows overriding it for testing purposes. + +munroll-only-small-loops +Target Undocumented Var(unroll_only_small_loops) Init(0) Save +; Use conservative small loop unrolling. diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-copysign.c b/gcc/testsuite/gcc.target/s390/vector/vec-copysign.c index 64c6970c23e2..b723ceb13be9 100644 --- a/gcc/testsuite/gcc.target/s390/vector/vec-copysign.c +++ b/gcc/testsuite/gcc.target/s390/vector/vec-copysign.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { s390*-*-* } } } */ -/* { dg-options "-O2 -ftree-vectorize -mzarch" } */ +/* { dg-options "-O2 -ftree-vectorize -mzarch -fno-unroll-loops" } */ /* { dg-final { scan-assembler-times "vgmg" 1 } } */ /* { dg-final { scan-assembler-times "vgmf" 1 } } */ /* { dg-final { scan-assembler-times "vsel" 2 } } */ diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c index 7c9b20fd2e0f..8948be28ed5d 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-quiet-uneq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z13 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z13 -mzvector -mzarch -fno-unroll-loops" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c index 9dfae8f2f7e7..9417b0c4838f 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-double-signaling-ltgt.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fno-unroll-loops" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c index 5ab9337880d0..0a2aca0d5dd3 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-quiet-uneq.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fno-unroll-loops" } */ #include "autovec.h" diff --git a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c index c34cf0916087..15e61b70b0bd 100644 --- a/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c +++ b/gcc/testsuite/gcc.target/s390/zvector/autovec-float-signaling-ltgt.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O3 -march=z14 -mzvector -mzarch" } */ +/* { dg-options "-O3 -march=z14 -mzvector -mzarch -fno-unroll-loops" } */ #include "autovec.h" -- 2.31.1