From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 82737 invoked by alias); 24 Jun 2015 16:50:16 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 82725 invoked by uid 89); 24 Jun 2015 16:50:16 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.0 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 24 Jun 2015 16:50:11 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7D08F2A; Wed, 24 Jun 2015 09:50:32 -0700 (PDT) Received: from [10.2.206.27] (e105545-lin.cambridge.arm.com [10.2.206.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4E9EB3F23A; Wed, 24 Jun 2015 09:50:08 -0700 (PDT) Message-ID: <558ADFC2.1070204@foss.arm.com> Date: Wed, 24 Jun 2015 16:51:00 -0000 From: Ramana Radhakrishnan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: gcc-patches@gcc.gnu.org CC: s.pop@samsung.com, a.skolknik@samsung.com Subject: Re: fix PR46029: reimplement if conversion of loads and stores References: <20150612205047.GA27819@cc00.spa.sarc.sas> In-Reply-To: <20150612205047.GA27819@cc00.spa.sarc.sas> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2015-06/txt/msg01722.txt.bz2 On 12/06/15 21:50, Abe Skolnik wrote: > Hi everybody! > > In the current implementation of if conversion, loads and stores are > if-converted in a thread-unsafe way: > > * loads were always executed, even when they should have not been. > Some source code could be rendered invalid due to null pointers > that were OK in the original program because they were never > dereferenced. > > * writes were if-converted via load/maybe-modify/store, which > renders some code multithreading-unsafe. > > This patch reimplements if-conversion of loads and stores in a safe > way using a scratchpad allocated by the compiler on the stack: > > * loads are done through an indirection, reading either the correct > data from the correct source [if the condition is true] or reading > from the scratchpad and later ignoring this read result [if the > condition is false]. > > * writes are also done through an indirection, writing either to the > correct destination [if the condition is true] or to the > scratchpad [if the condition is false]. > > Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the > old if-conversion resulted in unsafe code that could fail under > multithreading even though the as-written code _was_ thread-safe. > > Passed regression testing and bootstrap on amd64-linux. > Is this OK to commit to trunk? I can't approve or reject but this certainly looks like an improvement compared to where we are as we get rid of the data races. The only gotcha I can think with this approach is that this introduces false dependencies that would cause "unnecessary" write-after-write hazards with the writes to the scratchpad when you unroll the loop - but that's not necessarily worse than where we are today. Some fun stats from a previous Friday afternoon poke at this without doing any benchmarking as such. In a bootstrap with BOOT_CFLAGS="-O2 -ftree-loop-if-convert-stores" and one without it, I see about 12.20% more csel's on an AArch64 bootstrap (goes from 7898 -> 8862) vs plain old -O2. And I did see the one case in libquantum get sorted with this, though the performance results were funny let's say (+5% in one case, -1.5% on another core), I haven't analysed it deeply yet but it does look interesting. regards Ramana > > Regards, > > Abe > > > > > 2015-06-12 Sebastian Pop > Abe Skolnik > > PR tree-optimization/46029 > * tree-data-ref.c (struct data_ref_loc_d): Moved... > (get_references_in_stmt): Exported. > * tree-data-ref.h (struct data_ref_loc_d): ... here. > (get_references_in_stmt): Declared. > > * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. > * tree-if-conv.c (struct ifc_dr): Removed. > (IFC_DR): Removed. > (DR_WRITTEN_AT_LEAST_ONCE): Removed. > (DR_RW_UNCONDITIONALLY): Removed. > (memrefs_read_or_written_unconditionally): Removed. > (write_memrefs_written_at_least_once): Removed. > (ifcvt_could_trap_p): Does not take refs parameter anymore. > (ifcvt_memrefs_wont_trap): Removed. > (has_non_addressable_refs): New. > (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. > Removed use of refs. > (if_convertible_stmt_p): Removed use of refs. > (if_convertible_gimple_assign_stmt_p): Same. > (if_convertible_loop_p_1): Removed use of refs. Remove initialization > of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. > (insert_address_of): New. > (create_scratchpad): New. > (create_indirect_cond_expr): New. > (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra > parameter for scratch_pad. > (combine_blocks): Same. > (tree_if_conversion): Same. > > testsuite/ > * g++.dg/tree-ssa/ifc-pr46029.C: New. > * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. > * gcc.dg/tree-ssa/ifc-8.c: New. > * gcc.dg/tree-ssa/ifc-9.c: New. > * gcc.dg/tree-ssa/ifc-10.c: New. > * gcc.dg/tree-ssa/ifc-11.c: New. > * gcc.dg/tree-ssa/ifc-12.c: New. > * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. > * gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New. > --- > gcc/ChangeLog | 28 ++ > gcc/doc/invoke.texi | 18 +- > gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C | 76 ++++ > gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c | 17 + > gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c | 16 + > gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c | 13 + > gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c | 19 +- > gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c | 29 ++ > gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c | 17 + > .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c | 10 +- > .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c | 46 +++ > gcc/tree-data-ref.c | 13 +- > gcc/tree-data-ref.h | 14 + > gcc/tree-if-conv.c | 392 +++++++++------------ > 14 files changed, 460 insertions(+), 248 deletions(-) > create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c > create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3dec6b1..70af07c 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,31 @@ > +2015-05-18 Sebastian Pop > + > + PR tree-optimization/46029 > + * doc/invoke.texi (-ftree-loop-if-convert-stores): Update description. > + * tree-if-conv.c (has_unaligned_memory_refs): New. > + (if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs. > + (create_scratchpad): New. > + (create_indirect_cond_expr): New. > + (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra > + parameter for scratch_pad. > + (combine_blocks): Same. > + (tree_if_conversion): Same. > + (main_tree_if_conversion): Pass to tree_if_conversion a pointer to > + scratch_pad. > + (struct ifc_dr): Removed. > + (IFC_DR): Removed. > + (DR_WRITTEN_AT_LEAST_ONCE): Removed. > + (DR_RW_UNCONDITIONALLY): Removed. > + (memrefs_read_or_written_unconditionally): Removed. > + (write_memrefs_written_at_least_once): Removed. > + (ifcvt_memrefs_wont_trap): Removed. > + (ifcvt_could_trap_p): Does not take refs parameter anymore. > + (if_convertible_gimple_assign_stmt_p): Same. > + (if_convertible_stmt_p): Same. > + (if_convertible_loop_p_1): Remove initialization of dr->aux, > + DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. > + (if_convertible_loop_p): Remove deallocation of the same. > + > 2015-05-15 Marc Glisse > > PR tree-optimization/64454 > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 117b5d9..c0e27e8 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -8815,20 +8815,26 @@ if vectorization is enabled. > @item -ftree-loop-if-convert-stores > @opindex ftree-loop-if-convert-stores > Attempt to also if-convert conditional jumps containing memory writes. > -This transformation can be unsafe for multi-threaded programs as it > -transforms conditional memory writes into unconditional memory writes. > For example, > @smallexample > for (i = 0; i < N; i++) > if (cond) > - A[i] = expr; > + A[i] = B[i] + 2; > @end smallexample > is transformed to > @smallexample > -for (i = 0; i < N; i++) > - A[i] = cond ? expr : A[i]; > +void *scratchpad = alloca (64); > +for (i = 0; i < N; i++) @{ > + a = cond ? &A[i] : scratchpad; > + b = cond ? &B[i] : scratchpad; > + *a = *b + 2; > +@} > @end smallexample > -potentially producing data races. > +The compiler allocates a scratchpad memory on the stack for each > +function in which the if-conversion of memory stores or reads > +happened. This scratchpad memory is used during the part of the > +computation that is discarded, i.e., when the condition is evaluated > +to false. > > @item -ftree-loop-distribution > @opindex ftree-loop-distribution > diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C > new file mode 100644 > index 0000000..2a54bdb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C > @@ -0,0 +1,76 @@ > +// { dg-do run } > +/* { dg-options "-O -ftree-loop-if-convert-stores" } */ > + > +namespace > +{ > + struct rb_tree_node_ > + { > + rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0) > + { > + } > + unsigned &get_metadata () > + { > + return m_metadata; > + } > + rb_tree_node_ *m_p_left; > + rb_tree_node_ *m_p_parent; > + unsigned m_metadata; > + }; > + > + struct bin_search_tree_const_node_it_ > + { > + bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd) > + { > + } > + unsigned &get_metadata () > + { > + return m_p_nd->get_metadata (); > + } > + bin_search_tree_const_node_it_ get_l_child () > + { > + return bin_search_tree_const_node_it_ (m_p_nd->m_p_left); > + } > + > + rb_tree_node_ *m_p_nd; > + }; > + > + struct bin_search_tree_no_data_ > + { > + typedef rb_tree_node_ *node_pointer; > + bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ()) > + { > + } > + void insert_imp_empty (int r_value) > + { > + rb_tree_node_ *p_new_node = new rb_tree_node_ (); > + m_p_head->m_p_parent = p_new_node; > + p_new_node->m_p_parent = m_p_head; > + update_to_top (m_p_head->m_p_parent); > + } > + void apply_update (bin_search_tree_const_node_it_ nd_it) > + { > + unsigned > + l_max_endpoint > + = > + (nd_it.get_l_child ().m_p_nd == > + 0) ? 0 : nd_it.get_l_child ().get_metadata (); > + nd_it.get_metadata () = l_max_endpoint; > + } > + void update_to_top (node_pointer p_nd) > + { > + while (p_nd != m_p_head) > + { > + apply_update (p_nd); > + p_nd = p_nd->m_p_parent; > + } > + } > + > + rb_tree_node_ * m_p_head; > + }; > +} > + > +int main () > +{ > + bin_search_tree_no_data_ ().insert_imp_empty (0); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c > new file mode 100644 > index 0000000..3dcc202 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */ > + > +/* middle-end/PR65947 */ > + > +int a[32]; > + > +int main(int argc, char **argv) > +{ > + int res = 3; > + for (int i = 0; i < 32; i++) > + if (a[i]) res = 7; > + return res; > +} > + > +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ > +/* { dg-final { cleanup-tree-dump "ifcvt" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c > new file mode 100644 > index 0000000..2943c04 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */ > + > +int read(int *r, int *c) > +{ > + int res = 42; > + int i; > + for (i = 0; i < 32; i++) > + if (c[i]) > + res = *r; > + > + return res; > +} > + > +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ > +/* { dg-final { cleanup-tree-dump "ifcvt" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c > new file mode 100644 > index 0000000..01acf0b > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */ > + > +void write(int *r, int *c) > +{ > + int i; > + for (i = 0; i < 32; i++) > + if (c[i]) > + *r = c[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ > +/* { dg-final { cleanup-tree-dump "ifcvt" } } */ > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c > index a9cc816..4be2cdb 100644 > --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */ > +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */ > > void > dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs) > @@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs) > for (i = 0; i <= nCoeffs; i++) > { > level = block[i]; > - if (level < 0) > - level = level * qmul - qadd; > - else > - level = level * qmul + qadd; > - block[i] = level; > + if (level) > + { > + if (level < 0) > + { > + level = level * qmul - qadd; > + } > + else > + { > + level = level * qmul + qadd; > + } > + block[i] = level; > + } > } > } > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c > new file mode 100644 > index 0000000..d7cf279 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c > @@ -0,0 +1,29 @@ > +/* { dg-do compile } */ > +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */ > + > +typedef union tree_node *tree; > +struct tree_common > +{ > + unsigned volatile_flag : 1; > + unsigned unsigned_flag : 1; > +}; > +struct tree_type > +{ > + tree next_variant; > + tree main_variant; > +}; > +union tree_node > +{ > + struct tree_common common; > + struct tree_type type; > +}; > +void finish_enum (tree enumtype) > +{ > + tree tem; > + for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant)) > + { > + if (tem == enumtype) > + continue; > + ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag); > + } > +} > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c > new file mode 100644 > index 0000000..6d82bb4 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c > @@ -0,0 +1,17 @@ > +/* { dg-do compile } */ > +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */ > + > +/* middle-end/PR65947 */ > + > +int a[32]; > + > +int main(int argc, char **argv) > +{ > + int res = 3; > + for (int i = 0; i < 32; i++) > + if (a[i]) res = 7; > + return res; > +} > + > +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */ > +/* { dg-final { cleanup-tree-dump "ifcvt" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c > index cdf687a..7bc652b 100644 > --- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c > +++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c > @@ -65,5 +65,13 @@ main (void) > return 0; > } > > -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */ > +/* "foo()" is not vectorized FOR NOW because gather-load > + cannot handle conditional gather loads as of June 2015. > + > + The old way of if-converting loads was unsafe because > + it resulted in thread-unsafe code where the as-written code was OK. > + The old if conversion did not contain indirection in the loads, > + so it was simpler, therefor the vectorizer was able to pick it up. */ > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */ > /* { dg-final { cleanup-tree-dump "vect" } } */ > diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c > new file mode 100644 > index 0000000..6331d5a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c > @@ -0,0 +1,46 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +#include > +#include "tree-vect.h" > + > +#define N 50 > + > +typedef struct { > + short a; > + short b; > +} data; > + > +data in1[N], in2[N], out[N]; > + > +__attribute__ ((noinline)) void > +foo () > +{ > + int i; > + short c, d; > + > + for (i = 0; i < N; i++) > + { > + c = in1[i].b; > + d = in2[i].b; > + > + if (c >= d) > + { > + out[i].b = 11; > + out[i].a = d + 5; > + } > + else > + { > + out[i].b = d - 12; > + out[i].a = d; > + } > + } > +} > + > +/* It is safe to if-convert and vectorize the above loop because > + out[i].a and out[i].b are always written to. > + This is the same as "if-cvt-stores-vect-ifcvt-18.c" > + except there are no array loads inside the "if" here. */ > + > +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */ > +/* { dg-final { cleanup-tree-dump "vect" } } */ > diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c > index 3d1d5f8..eb1c559 100644 > --- a/gcc/tree-data-ref.c > +++ b/gcc/tree-data-ref.c > @@ -4393,22 +4393,11 @@ compute_all_dependences (vec datarefs, > return true; > } > > -/* Describes a location of a memory reference. */ > - > -typedef struct data_ref_loc_d > -{ > - /* The memory reference. */ > - tree ref; > - > - /* True if the memory reference is read. */ > - bool is_read; > -} data_ref_loc; > - > > /* Stores the locations of memory references in STMT to REFERENCES. Returns > true if STMT clobbers memory, false otherwise. */ > > -static bool > +bool > get_references_in_stmt (gimple stmt, vec *references) > { > bool clobbers_memory = false; > diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h > index edb3b56..06a0f1a 100644 > --- a/gcc/tree-data-ref.h > +++ b/gcc/tree-data-ref.h > @@ -557,4 +557,18 @@ lambda_matrix_new (int m, int n, struct obstack *lambda_obstack) > return mat; > } > > +/* Describes a location of a memory reference. */ > + > +typedef struct data_ref_loc_d > +{ > + /* The memory reference. */ > + tree ref; > + > + /* True if the memory reference is read. */ > + bool is_read; > +} data_ref_loc; > + > +bool > +get_references_in_stmt (gimple /* stmt */, vec* /* references */); > + > #endif /* GCC_TREE_DATA_REF_H */ > diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c > index 49ff458..1272a01 100644 > --- a/gcc/tree-if-conv.c > +++ b/gcc/tree-if-conv.c > @@ -19,7 +19,7 @@ along with GCC; see the file COPYING3. If not see > . */ > > /* This pass implements a tree level if-conversion of loops. Its > - initial goal is to help the vectorizer to vectorize loops with > + initial goal is to help the vectorizer to vectorize loops with > conditions. > > A short description of if-conversion: > @@ -606,188 +606,39 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi, > return true; > } > > -/* Records the status of a data reference. This struct is attached to > - each DR->aux field. */ > - > -struct ifc_dr { > - /* -1 when not initialized, 0 when false, 1 when true. */ > - int written_at_least_once; > - > - /* -1 when not initialized, 0 when false, 1 when true. */ > - int rw_unconditionally; > -}; > - > -#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux) > -#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once) > -#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally) > - > -/* Returns true when the memory references of STMT are read or written > - unconditionally. In other words, this function returns true when > - for every data reference A in STMT there exist other accesses to > - a data reference with the same base with predicates that add up (OR-up) to > - the true predicate: this ensures that the data reference A is touched > - (read or written) on every iteration of the if-converted loop. */ > +/* Wrapper around gimple_could_trap_p refined for the needs of the > + if-conversion. */ > > static bool > -memrefs_read_or_written_unconditionally (gimple stmt, > - vec drs) > +ifcvt_could_trap_p (gimple stmt) > { > - int i, j; > - data_reference_p a, b; > - tree ca = bb_predicate (gimple_bb (stmt)); > - > - for (i = 0; drs.iterate (i, &a); i++) > - if (DR_STMT (a) == stmt) > - { > - bool found = false; > - int x = DR_RW_UNCONDITIONALLY (a); > - > - if (x == 0) > - return false; > - > - if (x == 1) > - continue; > - > - for (j = 0; drs.iterate (j, &b); j++) > - { > - tree ref_base_a = DR_REF (a); > - tree ref_base_b = DR_REF (b); > - > - if (DR_STMT (b) == stmt) > - continue; > - > - while (TREE_CODE (ref_base_a) == COMPONENT_REF > - || TREE_CODE (ref_base_a) == IMAGPART_EXPR > - || TREE_CODE (ref_base_a) == REALPART_EXPR) > - ref_base_a = TREE_OPERAND (ref_base_a, 0); > - > - while (TREE_CODE (ref_base_b) == COMPONENT_REF > - || TREE_CODE (ref_base_b) == IMAGPART_EXPR > - || TREE_CODE (ref_base_b) == REALPART_EXPR) > - ref_base_b = TREE_OPERAND (ref_base_b, 0); > - > - if (!operand_equal_p (ref_base_a, ref_base_b, 0)) > - { > - tree cb = bb_predicate (gimple_bb (DR_STMT (b))); > - > - if (DR_RW_UNCONDITIONALLY (b) == 1 > - || is_true_predicate (cb) > - || is_true_predicate (ca > - = fold_or_predicates (EXPR_LOCATION (cb), ca, cb))) > - { > - DR_RW_UNCONDITIONALLY (a) = 1; > - DR_RW_UNCONDITIONALLY (b) = 1; > - found = true; > - break; > - } > - } > - } > - > - if (!found) > - { > - DR_RW_UNCONDITIONALLY (a) = 0; > - return false; > - } > - } > + if (gimple_vuse (stmt) > + && !gimple_could_trap_p_1 (stmt, false, false)) > + return false; > > - return true; > + return gimple_could_trap_p (stmt); > } > > -/* Returns true when the memory references of STMT are unconditionally > - written. In other words, this function returns true when for every > - data reference A written in STMT, there exist other writes to the > - same data reference with predicates that add up (OR-up) to the true > - predicate: this ensures that the data reference A is written on > - every iteration of the if-converted loop. */ > - > +/* Returns true when stmt contains a data reference. */ > static bool > -write_memrefs_written_at_least_once (gimple stmt, > - vec drs) > +has_non_addressable_refs (gimple stmt) > { > - int i, j; > - data_reference_p a, b; > - tree ca = bb_predicate (gimple_bb (stmt)); > - > - for (i = 0; drs.iterate (i, &a); i++) > - if (DR_STMT (a) == stmt > - && DR_IS_WRITE (a)) > - { > - bool found = false; > - int x = DR_WRITTEN_AT_LEAST_ONCE (a); > - > - if (x == 0) > - return false; > - > - if (x == 1) > - continue; > - > - for (j = 0; drs.iterate (j, &b); j++) > - if (DR_STMT (b) != stmt > - && DR_IS_WRITE (b) > - && same_data_refs_base_objects (a, b)) > - { > - tree cb = bb_predicate (gimple_bb (DR_STMT (b))); > + vec* refs; > + vec_alloc (refs, 3); > > - if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1 > - || is_true_predicate (cb) > - || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb), > - ca, cb))) > - { > - DR_WRITTEN_AT_LEAST_ONCE (a) = 1; > - DR_WRITTEN_AT_LEAST_ONCE (b) = 1; > - found = true; > - break; > - } > - } > + bool res = get_references_in_stmt (stmt, refs); > + unsigned i; > + data_ref_loc *ref; > > - if (!found) > - { > - DR_WRITTEN_AT_LEAST_ONCE (a) = 0; > - return false; > - } > + FOR_EACH_VEC_ELT (*refs, i, ref) > + if (may_be_nonaddressable_p (ref->ref)) > + { > + res = true; > + break; > } > > - return true; > -} > - > -/* Return true when the memory references of STMT won't trap in the > - if-converted code. There are two things that we have to check for: > - > - - writes to memory occur to writable memory: if-conversion of > - memory writes transforms the conditional memory writes into > - unconditional writes, i.e. "if (cond) A[i] = foo" is transformed > - into "A[i] = cond ? foo : A[i]", and as the write to memory may not > - be executed at all in the original code, it may be a readonly > - memory. To check that A is not const-qualified, we check that > - there exists at least an unconditional write to A in the current > - function. > - > - - reads or writes to memory are valid memory accesses for every > - iteration. To check that the memory accesses are correctly formed > - and that we are allowed to read and write in these locations, we > - check that the memory accesses to be if-converted occur at every > - iteration unconditionally. */ > - > -static bool > -ifcvt_memrefs_wont_trap (gimple stmt, vec refs) > -{ > - return write_memrefs_written_at_least_once (stmt, refs) > - && memrefs_read_or_written_unconditionally (stmt, refs); > -} > - > -/* Wrapper around gimple_could_trap_p refined for the needs of the > - if-conversion. Try to prove that the memory accesses of STMT could > - not trap in the innermost loop containing STMT. */ > - > -static bool > -ifcvt_could_trap_p (gimple stmt, vec refs) > -{ > - if (gimple_vuse (stmt) > - && !gimple_could_trap_p_1 (stmt, false, false) > - && ifcvt_memrefs_wont_trap (stmt, refs)) > - return false; > - > - return gimple_could_trap_p (stmt); > + vec_free (refs); > + return res; > } > > /* Return true if STMT could be converted into a masked load or store > @@ -849,7 +700,6 @@ ifcvt_can_use_mask_load_store (gimple stmt) > > static bool > if_convertible_gimple_assign_stmt_p (gimple stmt, > - vec refs, > bool *any_mask_load_store) > { > tree lhs = gimple_assign_lhs (stmt); > @@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt, > > if (flag_tree_loop_if_convert_stores) > { > - if (ifcvt_could_trap_p (stmt, refs)) > + if (ifcvt_could_trap_p (stmt)) > { > if (ifcvt_can_use_mask_load_store (stmt)) > { > @@ -892,9 +742,17 @@ if_convertible_gimple_assign_stmt_p (gimple stmt, > return true; > } > if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "tree could trap...\n"); > + fprintf (dump_file, "tree could trap\n"); > return false; > } > + > + if (has_non_addressable_refs (stmt)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "has non-addressable memory references\n"); > + return false; > + } > + > return true; > } > > @@ -942,7 +800,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt, > - it is builtins call. */ > > static bool > -if_convertible_stmt_p (gimple stmt, vec refs, > +if_convertible_stmt_p (gimple stmt, > bool *any_mask_load_store) > { > switch (gimple_code (stmt)) > @@ -953,7 +811,7 @@ if_convertible_stmt_p (gimple stmt, vec refs, > return true; > > case GIMPLE_ASSIGN: > - return if_convertible_gimple_assign_stmt_p (stmt, refs, > + return if_convertible_gimple_assign_stmt_p (stmt, > any_mask_load_store); > > case GIMPLE_CALL: > @@ -1173,8 +1031,8 @@ get_loop_body_in_if_conv_order (const struct loop *loop) > predicate_bbs first allocates the predicates of the basic blocks. > These fields are then initialized with the tree expressions > representing the predicates under which a basic block is executed > - in the LOOP. As the loop->header is executed at each iteration, it > - has the "true" predicate. Other statements executed under a > + in the LOOP. As the loop->header is executed at each iteration, > + it has the "true" predicate. Other statements executed under a > condition are predicated with that condition, for example > > | if (x) > @@ -1322,17 +1180,7 @@ if_convertible_loop_p_1 (struct loop *loop, > } > > if (flag_tree_loop_if_convert_stores) > - { > - data_reference_p dr; > - > - for (i = 0; refs->iterate (i, &dr); i++) > - { > - dr->aux = XNEW (struct ifc_dr); > - DR_WRITTEN_AT_LEAST_ONCE (dr) = -1; > - DR_RW_UNCONDITIONALLY (dr) = -1; > - } > - predicate_bbs (loop); > - } > + predicate_bbs (loop); > > for (i = 0; i < loop->num_nodes; i++) > { > @@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop, > /* Check the if-convertibility of statements in predicated BBs. */ > if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)) > for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr)) > - if (!if_convertible_stmt_p (gsi_stmt (itr), *refs, > + if (!if_convertible_stmt_p (gsi_stmt (itr), > any_mask_load_store)) > return false; > } > @@ -1988,6 +1836,82 @@ mask_exists (int size, vec vec) > return -1; > } > > +/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and > + returns the ADDRESS_OF_AI. */ > + > +static tree > +insert_address_of (tree ai, gimple_stmt_iterator *gsi) > +{ > + tree addr_expr = build_fold_addr_expr (ai); > + tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_"); > + gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr); > + > + address_of_ai = make_ssa_name (address_of_ai, addr_stmt); > + gimple_assign_set_lhs (addr_stmt, address_of_ai); > + SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt; > + update_stmt (addr_stmt); > + gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT); > + > + return address_of_ai; > +} > + > +/* Insert at the beginning of the first basic block of the current > + function the allocation on the stack of N_BYTES of memory and > + return a pointer to this scratchpad memory. */ > + > +static tree > +create_scratchpad (unsigned int n_bytes) > +{ > + basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) ); > + gimple_stmt_iterator gsi = gsi_after_labels (bb); > + tree x = build_int_cst (integer_type_node, n_bytes - 1); > + tree elt_type = char_type_node; > + tree array_type = build_array_type (elt_type, build_index_type (x)); > + tree base = create_tmp_reg (array_type, "scratch_pad"); > + tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE, > + NULL_TREE); > + > + return insert_address_of (a0, &gsi); > +} > + > +/* Returns a memory reference to the pointer defined by the > + conditional expression: pointer = cond ? &A[i] : scratch_pad; and > + inserts this code at GSI. */ > + > +static tree > +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad, > + gimple_stmt_iterator *gsi, bool swap) > +{ > + tree cond_expr; > + tree pointer; > + gimple pointer_stmt; > + > + /* address_of_ai = &A[i]; */ > + tree address_of_ai = insert_address_of (ai, gsi); > + > + /* Allocate the scratch pad only once per function. */ > + if (!*scratch_pad) > + *scratch_pad = create_scratchpad (64); > + > + /* pointer = cond ? address_of_ai : scratch_pad; */ > + pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_"); > + > + cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai), > + unshare_expr (cond), > + swap ? *scratch_pad : address_of_ai, > + swap ? address_of_ai : *scratch_pad); > + > + pointer_stmt = gimple_build_assign (pointer, cond_expr); > + pointer = make_ssa_name (pointer, pointer_stmt); > + gimple_assign_set_lhs (pointer_stmt, pointer); > + SSA_NAME_DEF_STMT (pointer) = pointer_stmt; > + update_stmt (pointer_stmt); > + gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT); > + > + return build2 (MEM_REF, TREE_TYPE (ai), pointer, > + build_int_cst (reference_alias_ptr_type (ai), 0)); > +} > + > /* Predicate each write to memory in LOOP. > > This function transforms control flow constructs containing memory > @@ -1997,12 +1921,21 @@ mask_exists (int size, vec vec) > | if (cond) > | A[i] = expr; > > - into the following form that does not contain control flow: > + into the following form: > > - | for (i = 0; i < N; i++) > - | A[i] = cond ? expr : A[i]; > + | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES); > + | > + | for (i = 0; i < N; i++) { > + | p = cond ? &A[i] : scratch_pad; > + | *p = expr; > + | } > > - The original CFG looks like this: > + SCRATCH_PAD is allocated on the stack for each function once and it is > + large enough to contain any kind of scalar assignment or read. All > + values read or written to SCRATCH_PAD are not used in the computation. > + > + In a more detailed way, the if-conversion of memory writes works > + like this, supposing that the original CFG looks like this: > > | bb_0 > | i = 0 > @@ -2052,10 +1985,12 @@ mask_exists (int size, vec vec) > | goto bb_1 > | end_bb_4 > > - predicate_mem_writes is then predicating the memory write as follows: > + predicate_mem_writes is then allocating SCRATCH_PAD in the basic block > + preceding the loop header, and is predicating the memory write: > > | bb_0 > | i = 0 > + | scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES); > | end_bb_0 > | > | bb_1 > @@ -2063,12 +1998,14 @@ mask_exists (int size, vec vec) > | end_bb_1 > | > | bb_2 > + | cond = some_computation; > | if (cond) goto bb_3 else goto bb_4 > | end_bb_2 > | > | bb_3 > | cond = some_computation; > - | A[i] = cond ? expr : A[i]; > + | p = cond ? &A[i] : scratch_pad; > + | *p = expr; > | goto bb_4 > | end_bb_3 > | > @@ -2081,12 +2018,14 @@ mask_exists (int size, vec vec) > > | bb_0 > | i = 0 > + | scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES); > | if (i < N) goto bb_5 else goto bb_1 > | end_bb_0 > | > | bb_1 > | cond = some_computation; > - | A[i] = cond ? expr : A[i]; > + | p = cond ? &A[i] : scratch_pad; > + | *p = expr; > | if (i < N) goto bb_5 else goto bb_4 > | end_bb_1 > | > @@ -2096,7 +2035,7 @@ mask_exists (int size, vec vec) > */ > > static void > -predicate_mem_writes (loop_p loop) > +predicate_mem_writes (loop_p loop, tree *scratch_pad) > { > unsigned int i, orig_loop_num_nodes = loop->num_nodes; > auto_vec vect_sizes; > @@ -2179,24 +2118,29 @@ predicate_mem_writes (loop_p loop) > } > else if (gimple_vdef (stmt)) > { > - tree lhs = gimple_assign_lhs (stmt); > - tree rhs = gimple_assign_rhs1 (stmt); > - tree type = TREE_TYPE (lhs); > > - lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi); > - rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi); > - if (swap) > - { > - tree tem = lhs; > - lhs = rhs; > - rhs = tem; > - } > - cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond), > - is_gimple_condexpr, NULL_TREE, > - true, GSI_SAME_STMT); > - rhs = fold_build_cond_expr (type, unshare_expr (cond), rhs, lhs); > - gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi)); > + /* A[i] = x; */ > + tree ai = gimple_assign_lhs (stmt); > + > + /* pointer = cond ? &A[i] : scratch_pad; */ > + tree star_pointer = create_indirect_cond_expr (ai, cond, > + scratch_pad, &gsi, swap); > + /* *pointer = x; */ > + gimple_assign_set_lhs (stmt, star_pointer); > update_stmt (stmt); > + > + } > + else if (gimple_vuse (stmt)) > + { > + /* x = A[i]; */ > + tree ai = gimple_assign_rhs1 (stmt); > + > + /* pointer = cond ? &A[i] : scratch_pad; */ > + tree star_pointer = create_indirect_cond_expr (ai, cond, > + scratch_pad, &gsi, swap); > + /* x = *pointer; */ > + gimple_assign_set_rhs1 (stmt, star_pointer); > + update_stmt (stmt); > } > } > } > @@ -2247,7 +2191,7 @@ remove_conditions_and_labels (loop_p loop) > blocks. Replace PHI nodes with conditional modify expressions. */ > > static void > -combine_blocks (struct loop *loop, bool any_mask_load_store) > +combine_blocks (struct loop *loop, bool any_mask_load_store, tree *scratch_pad) > { > basic_block bb, exit_bb, merge_target_bb; > unsigned int orig_loop_num_nodes = loop->num_nodes; > @@ -2261,7 +2205,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store) > predicate_all_scalar_phis (loop); > > if (flag_tree_loop_if_convert_stores || any_mask_load_store) > - predicate_mem_writes (loop); > + predicate_mem_writes (loop, scratch_pad); > > /* Merge basic blocks: first remove all the edges in the loop, > except for those from the exit block. */ > @@ -2523,7 +2467,7 @@ ifcvt_walk_pattern_tree (tree var, vec *defuse_list, > return; > } > > -/* Returns true if STMT can be a root of bool pattern apllied > +/* Returns true if STMT can be a root of bool pattern applied > by vectorizer. */ > > static bool > @@ -2553,7 +2497,7 @@ stmt_is_root_of_bool_pattern (gimple stmt) > return false; > } > > -/* Traverse all statements in BB which correspondent to loop header to > +/* Traverse all statements in BB which correspond to loop header to > find out all statements which can start bool pattern applied by > vectorizer and convert multiple uses in it to conform pattern > restrictions. Such case can occur if the same predicate is used both > @@ -2634,7 +2578,7 @@ ifcvt_local_dce (basic_block bb) > gimple_set_plf (phi, GF_PLF_2, true); > worklist.safe_push (phi); > } > - /* Consider load/store statemnts, CALL and COND as live. */ > + /* Consider load/store statements, CALL and COND as live. */ > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > { > stmt = gsi_stmt (gsi); > @@ -2716,13 +2660,13 @@ ifcvt_local_dce (basic_block bb) > changed. */ > > static unsigned int > -tree_if_conversion (struct loop *loop) > +tree_if_conversion (struct loop *loop, tree *scratch_pad) > { > unsigned int todo = 0; > ifc_bbs = NULL; > bool any_mask_load_store = false; > > - /* Set-up aggressive if-conversion for loops marked with simd pragma. */ > + /* Set up aggressive if-conversion for loops marked with simd pragma. */ > aggressive_if_conv = loop->force_vectorize; > /* Check either outer loop was marked with simd pragma. */ > if (!aggressive_if_conv) > @@ -2751,7 +2695,7 @@ tree_if_conversion (struct loop *loop) > /* Now all statements are if-convertible. Combine all the basic > blocks into one huge basic block doing the if-conversion > on-the-fly. */ > - combine_blocks (loop, any_mask_load_store); > + combine_blocks (loop, any_mask_load_store, scratch_pad); > > /* Delete dead predicate computations and repair tree correspondent > to bool pattern to delete multiple uses of preidcates. */ > @@ -2832,12 +2776,14 @@ pass_if_conversion::execute (function *fun) > if (number_of_loops (fun) <= 1) > return 0; > > + tree scratch_pad = NULL_TREE; > + > FOR_EACH_LOOP (loop, 0) > if (flag_tree_loop_if_convert == 1 > || flag_tree_loop_if_convert_stores == 1 > || ((flag_tree_loop_vectorize || loop->force_vectorize) > && !loop->dont_vectorize)) > - todo |= tree_if_conversion (loop); > + todo |= tree_if_conversion (loop, &scratch_pad); > > #ifdef ENABLE_CHECKING > { >