From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117200 invoked by alias); 19 Nov 2015 00:35:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117182 invoked by uid 89); 19 Nov 2015 00:35:50 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 X-HELO: fencepost.gnu.org Received: from fencepost.gnu.org (HELO fencepost.gnu.org) (208.118.235.10) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-SHA encrypted) ESMTPS; Thu, 19 Nov 2015 00:35:49 +0000 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40907) by fencepost.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1ZzDCU-0006JO-VN for gcc-patches@gnu.org; Wed, 18 Nov 2015 19:35:47 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZzDCQ-0003l9-G6 for gcc-patches@gnu.org; Wed, 18 Nov 2015 19:35:46 -0500 Received: from relay1.mentorg.com ([192.94.38.131]:59534) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZzDCQ-0003ku-7a for gcc-patches@gnu.org; Wed, 18 Nov 2015 19:35:42 -0500 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1ZzDCM-0003IW-42 from Tom_deVries@mentor.com ; Wed, 18 Nov 2015 16:35:38 -0800 Received: from [127.0.0.1] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Thu, 19 Nov 2015 00:35:36 +0000 Subject: Re: [PATCH, 10/16] Add pass_oacc_kernels pass group in passes.def To: Richard Biener References: <5640BD31.2060602@mentor.com> <5640FB07.6010008@mentor.com> <5649C41A.40403@mentor.com> <564A64B3.7080305@mentor.com> <564B3F69.50600@mentor.com> CC: Richard Biener , "gcc-patches@gnu.org" , Jakub Jelinek From: Tom de Vries Message-ID: <564D1930.8040104@mentor.com> Date: Thu, 19 Nov 2015 00:35:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <564B3F69.50600@mentor.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit X-detected-operating-system: by eggs.gnu.org: Windows NT kernel [generic] [fuzzy] X-Received-From: 192.94.38.131 X-SW-Source: 2015-11/txt/msg02296.txt.bz2 On 17/11/15 15:53, Tom de Vries wrote: >> And the above LIM example >> is none for why you need two LIM passes... > > Indeed. I'm planning a separate reply to explain in more detail the need > for the two pass_lims. I. I managed to get rid of the two pass_lims for the motivating example that I used until now (goacc/kernels-double-reduction.c). I found that by adding a pass_dominator instance after pass_ch, I could get rid of the second pass_lim (and pass_copyprop as well). But... then I wrote a counter example (goacc/kernels-double-reduction-n.c), and I'm back at two pass_lims (and two pass_dominators). Also I've split the pass group into a bit before and after pass_fre. So, the current pass group looks like: ... NEXT_PASS (pass_build_ealias); /* Pass group that runs when the function is an offloaded function containing oacc kernels loops. Part 1. */ NEXT_PASS (pass_oacc_kernels); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels) /* We need pass_ch here, because pass_lim has no effect on exit-first loops (PR65442). Ideally we want to remove both this pass instantiation, and the reverse transformation transform_to_exit_first_loop_alt, which is done in pass_parallelize_loops_oacc_kernels. */ NEXT_PASS (pass_ch); POP_INSERT_PASSES () NEXT_PASS (pass_fre); /* Pass group that runs when the function is an offloaded function containing oacc kernels loops. Part 2. */ NEXT_PASS (pass_oacc_kernels2); PUSH_INSERT_PASSES_WITHIN (pass_oacc_kernels2) /* We use pass_lim to rewrite in-memory iteration and reduction variable accesses in loops into local variables accesses. */ NEXT_PASS (pass_lim); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_lim); NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */); NEXT_PASS (pass_dce); NEXT_PASS (pass_parallelize_loops_oacc_kernels); NEXT_PASS (pass_expand_omp_ssa); POP_INSERT_PASSES () NEXT_PASS (pass_merge_phi); ... II. The motivating test-case kernels-double-reduction-n.c: ... #include #define N 500 unsigned int a[N][N]; void __attribute__((noinline,noclone)) foo (unsigned int n) { int i, j; unsigned int sum = 1; #pragma acc kernels copyin (a[0:n]) copy (sum) { for (i = 0; i < n; ++i) for (j = 0; j < n; ++j) sum += a[i][j]; } if (sum != 5001) abort (); } ... III. Before first pass_lim. Note no phis on inner or outer loop header for iteration varables or reduction variable: ... : _5 = *.omp_data_i_4(D).i; *_5 = 0; _44 = *.omp_data_i_4(D).n; _45 = *_44; if (_45 != 0) goto ; else goto ; : outer loop header _12 = *.omp_data_i_4(D).j; *_12 = 0; if (_45 != 0) goto ; else goto ; : inner loop header, latch _19 = *.omp_data_i_4(D).a; _21 = *_5; _23 = *_12; _24 = *_19[_21][_23]; _25 = *.omp_data_i_4(D).sum; sum.0_26 = *_25; sum.1_27 = _24 + sum.0_26; *_25 = sum.1_27; _33 = _23 + 1; *_12 = _33; j.2_16 = (unsigned int) _33; if (j.2_16 < _45) goto ; else goto ; : outer loop latch _36 = *_5; _38 = _36 + 1; *_5 = _38; i.3_9 = (unsigned int) _38; if (i.3_9 < _45) goto ; else goto ; : return; ... IV. After first pass_lim/pass_dom pair. Note there are phis on the inner loop header for the reduction and the iteration variable, but not on the outer loop header: ... : _5 = *.omp_data_i_4(D).i; *_5 = 0; _44 = *.omp_data_i_4(D).n; _45 = *_44; if (_45 != 0) goto ; else goto ; : _12 = *.omp_data_i_4(D).j; _19 = *.omp_data_i_4(D).a; D__lsm.10_50 = *_12; D__lsm.11_51 = 0; _25 = *.omp_data_i_4(D).sum; : outer loop header D__lsm.10_20 = 0; D__lsm.11_22 = 1; _21 = *_5; D__lsm.12_28 = *_25; D__lsm.13_30 = 0; goto ; : inner loop header, latch # D__lsm.10_47 = PHI <0(5), _33(7)> # D__lsm.12_49 = PHI _23 = D__lsm.10_47; _24 = *_19[_21][D__lsm.10_47]; sum.0_26 = D__lsm.12_49; sum.1_27 = _24 + D__lsm.12_49; D__lsm.12_31 = sum.1_27; D__lsm.13_32 = 1; _33 = D__lsm.10_47 + 1; D__lsm.10_14 = _33; D__lsm.11_15 = 1; j.2_16 = (unsigned int) _33; if (j.2_16 < _45) goto ; else goto ; : outer loop latch # D__lsm.10_35 = PHI <_33(7)> # D__lsm.11_37 = PHI <1(7)> # D__lsm.12_7 = PHI # D__lsm.13_8 = PHI <1(7)> *_25 = sum.1_27; _36 = *_5; _38 = _36 + 1; *_5 = _38; i.3_9 = (unsigned int) _38; if (i.3_9 < _45) goto ; else goto ; : # D__lsm.10_10 = PHI <_33(8)> # D__lsm.11_11 = PHI <1(8)> *_12 = _33; goto ; : return; ... V. After second pass_lim/pass_dom pair. Note there are phis on the inner and outer loop header for the reduction and the iteration variables: ... : _5 = *.omp_data_i_4(D).i; *_5 = 0; _44 = *.omp_data_i_4(D).n; _45 = *_44; if (_45 != 0) goto ; else goto ; : _12 = *.omp_data_i_4(D).j; _19 = *.omp_data_i_4(D).a; D__lsm.10_50 = *_12; D__lsm.11_51 = 0; _25 = *.omp_data_i_4(D).sum; D__lsm.14_40 = 0; D__lsm.15_2 = 0; D__lsm.16_1 = *_25; D__lsm.17_46 = 0; : outer loop header # D__lsm.14_13 = PHI <0(4), _38(8)> # D__lsm.16_34 = PHI D__lsm.10_20 = 0; D__lsm.11_22 = 1; _21 = D__lsm.14_13; D__lsm.12_28 = D__lsm.16_34; D__lsm.13_30 = 0; goto ; : inner loop header, latch # D__lsm.10_47 = PHI <0(5), _33(7)> # D__lsm.12_49 = PHI _23 = D__lsm.10_47; _24 = *_19[D__lsm.14_13][D__lsm.10_47]; sum.0_26 = D__lsm.12_49; sum.1_27 = _24 + D__lsm.12_49; D__lsm.12_31 = sum.1_27; D__lsm.13_32 = 1; _33 = D__lsm.10_47 + 1; D__lsm.10_14 = _33; D__lsm.11_15 = 1; j.2_16 = (unsigned int) _33; if (j.2_16 < _45) goto ; else goto ; : outer loop latch # D__lsm.10_35 = PHI <_33(7)> # D__lsm.11_37 = PHI <1(7)> # D__lsm.12_7 = PHI # D__lsm.13_8 = PHI <1(7)> # sum.1_48 = PHI # _53 = PHI <_33(7)> D__lsm.16_56 = sum.1_27; D__lsm.17_57 = 1; _36 = D__lsm.14_13; _38 = D__lsm.14_13 + 1; D__lsm.14_58 = _38; D__lsm.15_59 = 1; i.3_9 = (unsigned int) _38; if (i.3_9 < _45) goto ; else goto ; : # D__lsm.10_10 = PHI <_33(8)> # D__lsm.11_11 = PHI <1(8)> # _43 = PHI <_33(8)> # D__lsm.16_62 = PHI # D__lsm.17_63 = PHI <1(8)> # D__lsm.14_64 = PHI <_38(8)> # D__lsm.15_65 = PHI <1(8)> *_5 = _38; *_25 = sum.1_27; *_12 = _33; goto ; : return; ... VI. After pass_dce, so before parloops-oacc-kernels: ... : _5 = *.omp_data_i_4(D).i; *_5 = 0; _44 = *.omp_data_i_4(D).n; _45 = *_44; if (_45 != 0) goto ; else goto ; : _12 = *.omp_data_i_4(D).j; _19 = *.omp_data_i_4(D).a; _25 = *.omp_data_i_4(D).sum; D__lsm.16_1 = *_25; : outer loop header # D__lsm.14_13 = PHI <0(4), _38(8)> # D__lsm.16_34 = PHI goto ; : inner loop header, latch # D__lsm.10_47 = PHI <0(5), _33(7)> # D__lsm.12_49 = PHI _24 = *_19[D__lsm.14_13][D__lsm.10_47]; sum.1_27 = _24 + D__lsm.12_49; _33 = D__lsm.10_47 + 1; j.2_16 = (unsigned int) _33; if (j.2_16 < _45) goto ; else goto ; : outer loop latch _38 = D__lsm.14_13 + 1; i.3_9 = (unsigned int) _38; if (i.3_9 < _45) goto ; else goto ; : *_5 = _38; *_25 = sum.1_27; *_12 = _33; goto ; : return; ... Thanks, - Tom