From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id A77C7385802E for ; Mon, 15 Nov 2021 13:03:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A77C7385802E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=kznyj6x8ekTGAAyGsexgfhIWm50ZR3gPxGIZQQpVuhc=; b=fh4yyDEw1+ZSe5pcM1owa0vAIB Dh+aUclJtt1wpMs5GcnBgHXu/1S+jzwGR+4TsB0PpgCouevd5vzHmQ/5R97wnERaPEuMRQcsHIppZ 09TtZKCFQDHpMvCVi1mRZj5ZeLQ6U/XOc4uE2aARGxnkgB+0XkVQ+F7uBHqGIwdq82LM3tczomSru GJad45l75dEfCLpSJlQYxWV29Xo/gaIP/cxyBdLngkcHhME7Xm9XqpbWedR+QDUlJjo1XbM7yK8SM r9OAioYc1Ny6kJFIpbY8GmagLLdiwTUN79ItanniUnH4kSSgBHdyCewraePFYehLMNR2fZpQb4rNB IyLua5bA==; Received: from host81-157-90-156.range81-157.btcentralplus.com ([81.157.90.156]:60533 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mmbdb-0001Vf-WD for gcc-patches@gcc.gnu.org; Mon, 15 Nov 2021 08:03:08 -0500 From: "Roger Sayle" To: "'GCC Patches'" Subject: [PATCH] ivopts: Improve code generated for very simple loops. Date: Mon, 15 Nov 2021 13:03:06 -0000 Message-ID: <079f01d7da21$22c264a0$68472de0$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_07A0_01D7DA21.22C4D5A0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdfaIAt/QkYz07oNRkiibY/KdyeBOg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Nov 2021 13:03:10 -0000 This is a multipart message in MIME format. ------=_NextPart_000_07A0_01D7DA21.22C4D5A0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit This patch tidies up the code that GCC generates for simple loops, by selecting/generating a simpler loop bound expression in ivopts. The original motivation came from looking at the following loop (from gcc.target/i386/pr90178.c) int *find_ptr (int* mem, int sz, int val) { for (int i = 0; i < sz; i++) if (mem[i] == val) return &mem[i]; return 0; } which GCC currently compiles to: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 leal -1(%rsi), %ecx leaq 4(%rdi,%rcx,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret Notice the relatively complex leal/leaq instructions, that result from ivopts using the following expression for the loop bound: inv_expr 2: ((unsigned long) ((unsigned int) sz_8(D) + 4294967295) * 4 + (unsigned long) mem_9(D)) + 4 which results from NITERS being (unsigned int) sz_8(D) + 4294967295, i.e. (sz - 1), and the logic in cand_value_at determining the bound as BASE + NITERS*STEP at the start of the final iteration and as BASE + NITERS*STEP + STEP at the end of the final iteration. Ideally, we'd like the middle-end optimizers to simplify BASE + NITERS*STEP + STEP as BASE + (NITERS+1)*STEP, especially when NITERS already has the form BOUND-1, but with type conversions and possible overflow to worry about, the above "inv_expr 2" is the best that can be done by fold (without additional context information). This patch improves ivopts' cand_value_at by instead of using just the tree expression for NITERS, passing the data structure that explains how that expression was derived. This allows us to peek under the surface to check that NITERS+1 doesn't overflow, and in this patch to use the SSA_NAME already holding the required value. In the motivating loop above, inv_expr 2 now becomes: (unsigned long) sz_8(D) * 4 + (unsigned long) mem_9(D) And as a result, on x86_64 we now generate: find_ptr: movq %rdi, %rax testl %esi, %esi jle .L4 movslq %esi, %rsi leaq (%rdi,%rsi,4), %rcx jmp .L3 .L7: addq $4, %rax cmpq %rcx, %rax je .L4 .L3: cmpl %edx, (%rax) jne .L7 ret .L4: xorl %eax, %eax ret This improvement required one minor tweak to GCC's testsuite for gcc.dg/wrapped-binop-simplify.c, where we again generate better code, and therefore no longer find as many optimization opportunities in later passes (vrp2). Previously: void v1 (unsigned long *in, unsigned long *out, unsigned int n) { int i; for (i = 0; i < n; i++) { out[i] = in[i]; } } on x86_64 generated: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax .L3: movq (%rdi,%rax,8), %rcx movq %rcx, (%rsi,%rax,8) addq $1, %rax cmpq %rax, %rdx jne .L3 .L1: ret and now instead generates: v1: testl %edx, %edx je .L1 movl %edx, %edx xorl %eax, %eax leaq 0(,%rdx,8), %rcx .L3: movq (%rdi,%rax), %rdx movq %rdx, (%rsi,%rax) addq $8, %rax cmpq %rax, %rcx jne .L3 .L1: ret This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap and make -k check with no new failures. Ok for mainline? 2021-11-15 Roger Sayle gcc/ChangeLog * tree-ssa-loop-ivopts.c (cand_value_at): Take a class tree_niter_desc* argument instead of just a tree for NITER. If we require the iv candidate value at the end of the final loop iteration, try using the original loop bound as the NITER for sufficiently simple loops. (may_eliminate_iv): Update (only) call to cand_value_at. gcc/testsuite * gcc.dg/wrapped-binop-simplify.c: Update expected test result. Thanks in advance, Roger -- ------=_NextPart_000_07A0_01D7DA21.22C4D5A0 Content-Type: text/plain; name="patchn.txt" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="patchn.txt" diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c=0A= index 4a498ab..cc81196 100644=0A= --- a/gcc/tree-ssa-loop-ivopts.c=0A= +++ b/gcc/tree-ssa-loop-ivopts.c=0A= @@ -5034,28 +5034,48 @@ determine_group_iv_cost_address (struct = ivopts_data *data,=0A= return !sum_cost.infinite_cost_p ();=0A= }=0A= =0A= -/* Computes value of candidate CAND at position AT in iteration NITER, = and=0A= - stores it to VAL. */=0A= +/* Computes value of candidate CAND at position AT in iteration = DESC->NITER,=0A= + and stores it to VAL. */=0A= =0A= static void=0A= -cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at, tree = niter,=0A= - aff_tree *val)=0A= +cand_value_at (class loop *loop, struct iv_cand *cand, gimple *at,=0A= + class tree_niter_desc *desc, aff_tree *val)=0A= {=0A= aff_tree step, delta, nit;=0A= struct iv *iv =3D cand->iv;=0A= tree type =3D TREE_TYPE (iv->base);=0A= + tree niter =3D desc->niter;=0A= + bool after_adjust =3D stmt_after_increment (loop, cand, at);=0A= tree steptype;=0A= +=0A= if (POINTER_TYPE_P (type))=0A= steptype =3D sizetype;=0A= else=0A= steptype =3D unsigned_type_for (type);=0A= =0A= + /* If AFTER_ADJUST is required, the code below generates the = equivalent=0A= + * of BASE + NITER * STEP + STEP, when ideally we'd prefer the = expression=0A= + * BASE + (NITER + 1) * STEP, especially when NITER is often of the = form=0A= + * SSA_NAME - 1. Unfortunately, guaranteeing that adding 1 to NITER=0A= + * doesn't overflow is tricky, so we peek inside the TREE_NITER_DESC=0A= + * class for common idioms that we know are safe. */=0A= + if (after_adjust=0A= + && desc->control.no_overflow=0A= + && integer_onep (desc->control.step)=0A= + && integer_onep (desc->control.base)=0A= + && desc->cmp =3D=3D LT_EXPR=0A= + && TREE_CODE (desc->bound) =3D=3D SSA_NAME)=0A= + {=0A= + niter =3D desc->bound;=0A= + after_adjust =3D false;=0A= + }=0A= +=0A= tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &step);=0A= aff_combination_convert (&step, steptype);=0A= tree_to_aff_combination (niter, TREE_TYPE (niter), &nit);=0A= aff_combination_convert (&nit, steptype);=0A= aff_combination_mult (&nit, &step, &delta);=0A= - if (stmt_after_increment (loop, cand, at))=0A= + if (after_adjust)=0A= aff_combination_add (&delta, &step);=0A= =0A= tree_to_aff_combination (iv->base, type, val);=0A= @@ -5406,7 +5426,7 @@ may_eliminate_iv (struct ivopts_data *data,=0A= return true;=0A= }=0A= =0A= - cand_value_at (loop, cand, use->stmt, desc->niter, &bnd);=0A= + cand_value_at (loop, cand, use->stmt, desc, &bnd);=0A= =0A= *bound =3D fold_convert (TREE_TYPE (cand->iv->base),=0A= aff_combination_to_tree (&bnd));=0A= diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c = b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c=0A= index a5d953b..706eed8 100644=0A= --- a/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c=0A= +++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify.c=0A= @@ -1,6 +1,6 @@=0A= /* { dg-do compile { target { { i?86-*-* x86_64-*-* s390*-*-* } && lp64 = } } } */=0A= /* { dg-options "-O2 -fdump-tree-vrp2-details" } */=0A= -/* { dg-final { scan-tree-dump-times "gimple_simplified to" 4 "vrp2" } = } */=0A= +/* { dg-final { scan-tree-dump-times "gimple_simplified to" 1 "vrp2" } = } */=0A= =0A= void v1 (unsigned long *in, unsigned long *out, unsigned int n)=0A= {=0A= ------=_NextPart_000_07A0_01D7DA21.22C4D5A0--