From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27776 invoked by alias); 24 Oct 2017 19:20:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 26492 invoked by uid 89); 24 Oct 2017 19:20:29 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 24 Oct 2017 19:20:28 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 025DE13A55; Tue, 24 Oct 2017 19:20:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 025DE13A55 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=jakub@redhat.com Received: from tucnak.zalov.cz (ovpn-116-247.ams2.redhat.com [10.36.116.247]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 73F6A5D6A2; Tue, 24 Oct 2017 19:20:26 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v9OJKMwu001217; Tue, 24 Oct 2017 21:20:22 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v9OJKHBA001216; Tue, 24 Oct 2017 21:20:17 +0200 Date: Tue, 24 Oct 2017 19:36:00 -0000 From: Jakub Jelinek To: Alexander Monakov Cc: Thomas Schwinge , Martin Jambor , Cesar Philippidis , gcc-patches@gcc.gnu.org Subject: Re: [RFC PATCH] Coalesce host to device transfers in libgomp Message-ID: <20171024192017.GO14653@tucnak> Reply-To: Jakub Jelinek References: <20171024095527.GJ14653@tucnak> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.7.1 (2016-10-04) X-IsSubscribed: yes X-SW-Source: 2017-10/txt/msg01761.txt.bz2 On Tue, Oct 24, 2017 at 08:39:13PM +0300, Alexander Monakov wrote: > On Tue, 24 Oct 2017, Jakub Jelinek wrote: > > loop transfering the addresses or firstprivate_int values to the device > > - where we issued mapnum host2dev transfers each just pointer-sized > > when we could have just prepared all the pointers in an array and host2dev > > copy them all together. > > Can you please give an example OpenMP code? I thought such variables are > just fields of one omp_data_? struct that is copied all at once, but I guess > I'm misunderstanding. Basically anything with multiple mappings. void foo () { int a[10], b[10], c[10], d, e, f; struct S { int g[10]; } h; init (a, b, c, &d, &e, &f, &h); #pragma omp target map(to:a, b, c) firstprivate (d, e, f, h) use (a, b, c, d, e, f, h); } The above has mapnum 7, if none of this is mapped, then the current trunk will perform 3 host2dev 40 byte copies for the 3 arrays, 1 40 byte copy for the firstprivate h, and 7 pointer-sized copies for the addresses of the 3 arrays, one firstprivate struct and 3 ints encoded in pointers. As all the 4 40 byte allocations plus the 7 * pointer sized allocations are adjacent with no gaps, with the patch there will be a single host2dev transfer of 160+7*sizeof(void*) bytes. > > +struct gomp_map_cache > > +{ > > + void *buf; > > + struct target_mem_desc *tgt; > > + size_t *chunks; > > + long chunk_cnt; > > + long use_cnt; > > +}; > > Would really appreciate comments for meaning of fields here. Also, is the > struct properly named? From the patch description I understood it to be a > copy coalescing buffer, not a cache. I'll rename it and add comments. > > @@ -449,19 +531,34 @@ gomp_map_vars (struct gomp_device_descr > > size_t align = (size_t) 1 << (kind >> rshift); > > if (tgt_align < align) > > tgt_align = align; > > - tgt_size -= (uintptr_t) hostaddrs[first] > > - - (uintptr_t) hostaddrs[i]; > > + tgt_size -= (uintptr_t) hostaddrs[first] - cur_node.host_start; > > tgt_size = (tgt_size + align - 1) & ~(align - 1); > > - tgt_size += cur_node.host_end - (uintptr_t) hostaddrs[i]; > > + tgt_size += cur_node.host_end - cur_node.host_start; > > not_found_cnt += last - i; > > for (i = first; i <= last; i++) > > - tgt->list[i].key = NULL; > > + { > > + tgt->list[i].key = NULL; > > + switch (get_kind (short_mapkind, kinds, i) & typemask) > > + { > > + case GOMP_MAP_ALLOC: > > + case GOMP_MAP_FROM: > > + case GOMP_MAP_FORCE_ALLOC: > > + case GOMP_MAP_ALWAYS_FROM: > > + break; > > + default: > > + /* All the others copy data if newly allocated. */ > > + gomp_cache_add (&cache, tgt_size - cur_node.host_end > > + + (uintptr_t) hostaddrs[i], > > + sizes[i]); > > A similar switch needed to be duplicated below. Would it be appropriate to > pass the map kind to gomp_cache_add, or have a thin wrapper around it to have > checks for appropriate kinds in one place? No, I'd prefer to keep the logic out of gomp_cache_add, but can add an inline predicate whether kind writes to device. Jakub