From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 23023 invoked by alias); 3 Jul 2019 03:20:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 23015 invoked by uid 89); 3 Jul 2019 03:20:26 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-9.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_1,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.1 spammy= X-HELO: mx0a-001b2d01.pphosted.com Received: from mx0a-001b2d01.pphosted.com (HELO mx0a-001b2d01.pphosted.com) (148.163.156.1) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 03 Jul 2019 03:20:24 +0000 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x633HWEX042017 for ; Tue, 2 Jul 2019 23:20:22 -0400 Received: from e06smtp07.uk.ibm.com (e06smtp07.uk.ibm.com [195.75.94.103]) by mx0a-001b2d01.pphosted.com with ESMTP id 2tgfkx8jt2-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Tue, 02 Jul 2019 23:20:22 -0400 Received: from localhost by e06smtp07.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 3 Jul 2019 04:20:20 +0100 Received: from b06cxnps4074.portsmouth.uk.ibm.com (9.149.109.196) by e06smtp07.uk.ibm.com (192.168.101.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 3 Jul 2019 04:20:18 +0100 Received: from d06av22.portsmouth.uk.ibm.com (d06av22.portsmouth.uk.ibm.com [9.149.105.58]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x633KHbo12910696 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 3 Jul 2019 03:20:17 GMT Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id EA9CA4C052; Wed, 3 Jul 2019 03:20:16 +0000 (GMT) Received: from d06av22.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3501D4C044; Wed, 3 Jul 2019 03:20:15 +0000 (GMT) Received: from kewenlins-mbp.cn.ibm.com (unknown [9.200.147.35]) by d06av22.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 3 Jul 2019 03:20:14 +0000 (GMT) Subject: Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref To: Richard Biener Cc: GCC Patches , Bill Schmidt , Segher Boessenkool References: From: "Kewen.Lin" Date: Wed, 03 Jul 2019 03:20:00 -0000 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit x-cbid: 19070303-0028-0000-0000-0000037FD7F7 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19070303-0029-0000-0000-000024401447 Message-Id: <8d92a6c9-e338-e662-9eec-ef3059faba71@linux.ibm.com> X-IsSubscribed: yes X-SW-Source: 2019-07/txt/msg00203.txt.bz2 Hi Richard, Thanks very much for reviewing my patch. I'll update it as your comments. Before sending the next version, I've several questions embedded for further check. on 2019/7/2 下午8:43, Richard Biener wrote: > On Wed, 20 Mar 2019, Kewen.Lin wrote: > >> +/* { dg-require-effective-target vect_double } */ >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */ >> +/* { dg-options "-O2 -ffast-math" } */ >> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */ > > Use > > /* { dg-additional-options "-mvsx" { target { powerpc... > > that saves duplicate typing. I guess that x86_64/i?86 also works > if you enable SSE2 - can you check that and do adjustments accordingly? > OK, I'll learn SSE2 and update it. >> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME. >> + - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR. >> + - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF. */ >> +struct v_info >> +{ >> + auto_vec offsets; > > with SVE this probably needs to be poly_int64 or so > OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?) >> + auto_vec ops_indexes; >> +}; > > To have less allocations you could use a > > auto_vec, 32> elements; > > or even > > hash_map > > > > where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS > (vector_type)) during collecting and then can use quick_push () > (ah, no - duplicates...). > Good idea! >> + >> +typedef struct v_info *v_info_ptr; >> + >> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets. */ >> +static int >> +unsigned_cmp (const void *p_i, const void *p_j) >> +{ >> + if (*(const unsigned HOST_WIDE_INT *) p_i >> + >= *(const unsigned HOST_WIDE_INT *) p_j) >> + return 1; >> + else >> + return -1; > > That's an issue with some qsort implementations comparing > an element against itself. > > I think so you should add the equality case. > The equality case seems already involved in ">=". Do you mean that I need to make it equality case explicitly? Or return zero for "=="? like: const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i; const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j; if (val_i != val_j) return val_i > val_j? 1: -1; else return 0; >> + >> + TODO: >> + 1) The current implementation restrict all candidate VECTORs should have >> + the same VECTOR type, but it can be extended into different groups by >> + VECTOR types in future if any profitable cases found. >> + 2) The current implementation requires the whole VECTORs should be fully >> + covered, but it can be extended to support partial, checking adjacent >> + but not fill the whole, it may need some cost model to define the >> + boundary to do or not. >> + >> + tree elem_type = TREE_TYPE (vec_type); >> + unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); >> + if (size != TREE_INT_CST_LOW (op1)) > > if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1)) > > avoids some of the TREE_INT_CST_LOW we like to avoid. > > You miss a check for op2 % op1 being zero. Given you store a > HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2 > (beware of SVE...). OK, thanks! For op2 % op1 == zero, I thought it's a must for granted, I'll fix it. I think it can be constructed in IR artificially, but since I have almost none knowledge on other targets vector support, I'm curious that does it exist in real world codes? btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to be updated due to SVE? > > There's also a poly-int friendly multiple_p predicate so you could > have the initial checks SVE friendly but bail out on non-INTEGER_CST > offset later. > Got it, thanks! > > Since you are using a hashtable keyed off SSA name pointers code > generation will depend on host addresses here if you consider > there's one mismatching vector type that might become ref_vec > dependent on the order of elements in the hashtable. That's > a no-no. > > Even if doing it like above looks to possibly save compile-time > by avoiding qsort calls I think the appropriate thing to do is > to partition the vector candidates into sets of compatible > vectors (consider summing two v2df and two v4df vectors for example) > and then take out ones you disqualify because they do not cover > the vector in full. > You are absolutely right, the randomness of hashtable keys order could be a problem here. The partition part is TODO 1). Since Power has only one kind whole vector width now, doesn't have v2df and v4df co-existence, it's put into TODO. I will extend it in the next version of patch, add one more hashtable from vector type mode to v_info_map, feel free to correct me if you have any concerns. > I think whether the vector is fully covered can be done way cheaper > as well by using a sbitmap and clearing a bit whenever its > corresponding lane is in the vector (and bailing out if the bit > was already clear since you then got a duplicate). So start > with bitmap_ones () and at the end check bitmap_empty_p (). > I don't think you depend on the vector being sorted later? > Good idea, yes, it doesn't depend on sorted or not. >> + { >> + sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode); >> + info = *(v_info_map.get (tr)); >> + unsigned j; >> + FOR_EACH_VEC_ELT (info->ops_indexes, j, idx) >> + { >> + gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op); >> + gimple_set_visited (def, true); > > you set the visited flag to get the ops definition DCEd eventually, right? > Note this in a comment. > Yes, it's for that. Will add the comment, thanks!