From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 82756 invoked by alias); 22 Feb 2018 07:42:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 82747 invoked by uid 89); 22 Feb 2018 07:42:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-16.9 required=5.0 tests=BAYES_00,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=nn, H*MI:41B8, H*M:41B8 X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Feb 2018 07:42:47 +0000 Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 32B13ACC3; Thu, 22 Feb 2018 07:42:45 +0000 (UTC) Date: Thu, 22 Feb 2018 07:42:00 -0000 User-Agent: K-9 Mail for Android In-Reply-To: <20180221222836.GS5867@tucnak> References: <20180221222836.GS5867@tucnak> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH] Fix store merging (PR tree-optimization/84503) To: Jakub Jelinek ,Jeff Law CC: gcc-patches@gcc.gnu.org From: Richard Biener Message-ID: X-SW-Source: 2018-02/txt/msg01261.txt.bz2 On February 21, 2018 11:28:36 PM GMT+01:00, Jakub Jelinek wrote: >Hi! > >The long comment above the new check_no_overlap function below should >hopefully explain the problem. coalesce_immediate_stores is splitting >the >stores from the same base into groups that we are going to optimize >individually; for each successfully merged group we emit new stores at >the >location of the last store from that group. And >coalesce_immediate_stores >processes stores ordered primarily by increasing bit position and only >secondarily by the original ordering in the basic block. >On the following testcases, we have the unfortunate ordering of: > MEM[(long long int *)p_28] =3D 0; > MEM[(long long int *)p_28 + 8B] =3D 0; > MEM[(long long int *)p_28 + 16B] =3D 0; > MEM[(long long int *)p_28 + 24B] =3D 0; > _129 =3D (int) _130; > MEM[(int *)p_28 + 8B] =3D _129; > MEM[(int *)p_28].a =3D -1; >We already have > MEM[(long long int *)p_28] =3D 0; > MEM[(int *)p_28].a =3D -1; >in the first group (which is fine) and the following 2 stores to >consider >are > MEM[(long long int *)p_28 + 8B] =3D 0; >and > MEM[(int *)p_28 + 8B] =3D _129; >We add the first store to the group, which has the effect of emitting >the >merged stores at the location of former MEM[(int *)p_28].a =3D -1; >store. Then we process MEM[(int *)p_28 + 8B] =3D _129; (which was >handled >just as potential bswap possibility and as it overlaps with previous >and can't be merged with next either, it will be its own group and thus >reuse the original stmt; the net effect is that we've reordered the >MEM[(long long int *)p_28 + 8B] =3D 0; store from before the >MEM[(int *)p_28 + 8B] =3D _129; store to after it, but those two do >alias. > >Instead of detecting this situation late, where we could just throw >away the >whole group (which would mean we couldn't merge even the >MEM[(long long int *)p_28] =3D 0; and MEM[(int *)p_28].a =3D -1; stores) >this patch attempts to check before calling merge_into whether it will >not overlap with later stores we won't be able to emit in the same >group >and if it does, it terminates the group right away. > >I had to fix also the merged_store_group::merge_into width computation, >it was mistakenly just counting sum of the bits we've added to the >group, >but we really need the distance between the first bit in the group and >last >bit, including any gaps in between, but exclusing gaps before the start >and >after the end (still within the bitregion). > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK.=20 Richard.=20 >For 7.x I think we'll need something along the lines of PR82916 >instead, >while the same testcase fails there, we only handle stores with lhs >being >a constant and so the problem is that we aren't terminating the chain >when >we should on the variable store. > >2018-02-21 Jakub Jelinek > > PR tree-optimization/84503 > * gimple-ssa-store-merging.c (merged_store_group::merge_into): Compute > width as info->bitpos + info->bitsize - start. > (merged_store_group::merge_overlapping): Simplify width computation. > (check_no_overlap): New function. > (imm_store_chain_info::try_coalesce_bswap): Compute expected > start + width and last_order of the group, fail if check_no_overlap > fails. > (imm_store_chain_info::coalesce_immediate_stores): Don't merge info > to group if check_no_overlap fails. > > * gcc.dg/pr84503-1.c: New test. > * gcc.dg/pr84503-2.c: New test. > >--- gcc/gimple-ssa-store-merging.c.jj 2018-01-16 09:52:26.230235131 >+0100 >+++ gcc/gimple-ssa-store-merging.c 2018-02-21 20:13:45.239129599 +0100 >@@ -1891,12 +1891,11 @@ merged_store_group::do_merge (store_imme > void > merged_store_group::merge_into (store_immediate_info *info) > { >- unsigned HOST_WIDE_INT wid =3D info->bitsize; >/* Make sure we're inserting in the position we think we're inserting.=20 >*/ > gcc_assert (info->bitpos >=3D start + width > && info->bitregion_start <=3D bitregion_end); >=20 >- width +=3D wid; >+ width =3D info->bitpos + info->bitsize - start; > do_merge (info); > } >=20 >@@ -1909,7 +1908,7 @@ merged_store_group::merge_overlapping (s > { > /* If the store extends the size of the group, extend the width. */ > if (info->bitpos + info->bitsize > start + width) >- width +=3D info->bitpos + info->bitsize - (start + width); >+ width =3D info->bitpos + info->bitsize - start; >=20 > do_merge (info); > } >@@ -2304,6 +2303,55 @@ gather_bswap_load_refs (vec *refs, > } > } >=20 >+/* Check if there are any stores in M_STORE_INFO after index I >+ (where M_STORE_INFO must be sorted by sort_by_bitpos) that overlap >+ a potential group ending with END that have their order >+ smaller than LAST_ORDER. RHS_CODE is the kind of store in the >+ group. Return true if there are no such stores. >+ Consider: >+ MEM[(long long int *)p_28] =3D 0; >+ MEM[(long long int *)p_28 + 8B] =3D 0; >+ MEM[(long long int *)p_28 + 16B] =3D 0; >+ MEM[(long long int *)p_28 + 24B] =3D 0; >+ _129 =3D (int) _130; >+ MEM[(int *)p_28 + 8B] =3D _129; >+ MEM[(int *)p_28].a =3D -1; >+ We already have >+ MEM[(long long int *)p_28] =3D 0; >+ MEM[(int *)p_28].a =3D -1; >+ stmts in the current group and need to consider if it is safe to >+ add MEM[(long long int *)p_28 + 8B] =3D 0; store into the same group. >+ There is an overlap between that store and the MEM[(int *)p_28 + >8B] =3D _129; >+ store though, so if we add the MEM[(long long int *)p_28 + 8B] =3D 0; >+ into the group and merging of those 3 stores is successful, merged >+ stmts will be emitted at the latest store from that group, i.e. >+ LAST_ORDER, which is the MEM[(int *)p_28].a =3D -1; store. >+ The MEM[(int *)p_28 + 8B] =3D _129; store that originally follows >+ the MEM[(long long int *)p_28 + 8B] =3D 0; would now be before it, >+ so we need to refuse merging MEM[(long long int *)p_28 + 8B] =3D 0; >+ into the group. That way it will be its own store group and will >+ not be touched. If RHS_CODE is INTEGER_CST and there are >overlapping >+ INTEGER_CST stores, those are mergeable using merge_overlapping, >+ so don't return false for those. */ >+ >+static bool >+check_no_overlap (vec m_store_info, unsigned >int i, >+ enum tree_code rhs_code, unsigned int last_order, >+ unsigned HOST_WIDE_INT end) >+{ >+ unsigned int len =3D m_store_info.length (); >+ for (++i; i < len; ++i) >+ { >+ store_immediate_info *info =3D m_store_info[i]; >+ if (info->bitpos >=3D end) >+ break; >+ if (info->order < last_order >+ && (rhs_code !=3D INTEGER_CST || info->rhs_code !=3D INTEGER_CST)) >+ return false; >+ } >+ return true; >+} >+ > /* Return true if m_store_info[first] and at least one following store > form a group which store try_size bitsize value which is byte swapped > from a memory load or some value, or identity from some value. >@@ -2375,6 +2423,7 @@ imm_store_chain_info::try_coalesce_bswap > unsigned int last_order =3D merged_store->last_order; > gimple *first_stmt =3D merged_store->first_stmt; > gimple *last_stmt =3D merged_store->last_stmt; >+ unsigned HOST_WIDE_INT end =3D merged_store->start + >merged_store->width; > store_immediate_info *infof =3D m_store_info[first]; >=20 > for (unsigned int i =3D first; i <=3D last; ++i) >@@ -2413,25 +2462,23 @@ imm_store_chain_info::try_coalesce_bswap > } > else > { >- if (n.base_addr) >+ if (n.base_addr && n.vuse !=3D this_n.vuse) > { >- if (n.vuse !=3D this_n.vuse) >- { >- if (vuse_store =3D=3D 0) >- return false; >- vuse_store =3D 1; >- } >- if (info->order > last_order) >- { >- last_order =3D info->order; >- last_stmt =3D info->stmt; >- } >- else if (info->order < first_order) >- { >- first_order =3D info->order; >- first_stmt =3D info->stmt; >- } >+ if (vuse_store =3D=3D 0) >+ return false; >+ vuse_store =3D 1; > } >+ if (info->order > last_order) >+ { >+ last_order =3D info->order; >+ last_stmt =3D info->stmt; >+ } >+ else if (info->order < first_order) >+ { >+ first_order =3D info->order; >+ first_stmt =3D info->stmt; >+ } >+ end =3D MAX (end, info->bitpos + info->bitsize); >=20 > ins_stmt =3D perform_symbolic_merge (ins_stmt, &n, info->ins_stmt, > &this_n, &n); >@@ -2452,6 +2499,9 @@ imm_store_chain_info::try_coalesce_bswap > if (n.base_addr =3D=3D NULL_TREE && !is_gimple_val (n.src)) > return false; >=20 >+ if (!check_no_overlap (m_store_info, last, LROTATE_EXPR, last_order, >end)) >+ return false; >+ > /* Don't handle memory copy this way if normal non-bswap processing > would handle it too. */ > if (n.n =3D=3D cmpnop && (unsigned) n.n_ops =3D=3D last - first + 1) >@@ -2633,7 +2683,13 @@ imm_store_chain_info::coalesce_immediate > : !info->ops[0].base_addr) > && (infof->ops[1].base_addr > ? compatible_load_p (merged_store, info, base_addr, 1) >- : !info->ops[1].base_addr)) >+ : !info->ops[1].base_addr) >+ && check_no_overlap (m_store_info, i, info->rhs_code, >+ MAX (merged_store->last_order, >+ info->order), >+ MAX (merged_store->start >+ + merged_store->width, >+ info->bitpos + info->bitsize))) > { > merged_store->merge_into (info); > continue; >--- gcc/testsuite/gcc.dg/pr84503-1.c.jj 2018-02-21 20:36:33.474226039 >+0100 >+++ gcc/testsuite/gcc.dg/pr84503-1.c 2018-02-21 20:20:53.144165116 >+0100 >@@ -0,0 +1,68 @@ >+/* PR tree-optimization/84503 */ >+/* { dg-do run } */ >+/* { dg-options "-O3" } */ >+ >+typedef __SIZE_TYPE__ size_t; >+typedef __UINTPTR_TYPE__ uintptr_t; >+ >+struct S { int a; unsigned short b; int c, d, e; long f, g, h; int i, >j; }; >+static struct S *k; >+static size_t l =3D 0; >+int m; >+ >+static int >+bar (void) >+{ >+ unsigned i; >+ int j; >+ if (k[0].c =3D=3D 0) >+ { >+ ++m; >+ size_t n =3D l * 2; >+ struct S *o; >+ o =3D (struct S *) __builtin_realloc (k, sizeof (struct S) * n); >+ if (!o) >+ __builtin_exit (0); >+ k =3D o; >+ for (i =3D l; i < n; i++) >+ { >+ void *p =3D (void *) &k[i]; >+ int q =3D 0; >+ size_t r =3D sizeof (struct S); >+ if ((((uintptr_t) p) % __alignof__ (long)) =3D=3D 0 >+ && r % sizeof (long) =3D=3D 0) >+ { >+ long __attribute__ ((may_alias)) *s =3D (long *) p; >+ long *t =3D (long *) ((char *) s + r); >+ while (s < t) >+ *s++ =3D 0; >+ } >+ else >+ __builtin_memset (p, q, r); >+ k[i].c =3D i + 1; >+ k[i].a =3D -1; >+ } >+ k[n - 1].c =3D 0; >+ k[0].c =3D l; >+ l =3D n; >+ } >+ j =3D k[0].c; >+ k[0].c =3D k[j].c; >+ return j; >+} >+ >+int >+main () >+{ >+ k =3D (struct S *) __builtin_malloc (sizeof (struct S)); >+ if (!k) >+ __builtin_exit (0); >+ __builtin_memset (k, '\0', sizeof (struct S)); >+ k->a =3D -1; >+ l =3D 1; >+ for (int i =3D 0; i < 15; ++i) >+ bar (); >+ if (m !=3D 4) >+ __builtin_abort (); >+ return 0; >+} >--- gcc/testsuite/gcc.dg/pr84503-2.c.jj 2018-02-21 20:36:41.874226585 >+0100 >+++ gcc/testsuite/gcc.dg/pr84503-2.c 2018-02-21 20:36:59.875227757 >+0100 >@@ -0,0 +1,5 @@ >+/* PR tree-optimization/84503 */ >+/* { dg-do run } */ >+/* { dg-options "-O3 -fno-tree-vectorize -fno-ivopts" } */ >+ >+#include "pr84503-1.c" > > Jakub