From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 106934 invoked by alias); 9 Dec 2015 11:55:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 106921 invoked by uid 89); 9 Dec 2015 11:55:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 09 Dec 2015 11:55:13 +0000 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id B4E48219E for ; Wed, 9 Dec 2015 11:55:11 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-113-142.phx2.redhat.com [10.3.113.142]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id tB9BtAfq024411 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Wed, 9 Dec 2015 06:55:11 -0500 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id tB9Bt8oL011041 for ; Wed, 9 Dec 2015 12:55:08 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id tB9Bt7vC011040 for gcc-patches@gcc.gnu.org; Wed, 9 Dec 2015 12:55:07 +0100 Resent-From: Jakub Jelinek Resent-Date: Wed, 9 Dec 2015 12:55:07 +0100 Resent-Message-ID: <20151209115507.GI22971@tucnak.redhat.com> Resent-To: gcc-patches@gcc.gnu.org Date: Wed, 09 Dec 2015 11:55:00 -0000 From: Jakub Jelinek To: GCC Patches Subject: Re: [hsa 2/10] Modifications to libgomp proper Message-ID: <20151209113536.GP5675@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <20151207111758.GA24234@virgil.suse.cz> <20151207111957.GC24234@virgil.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151207111957.GC24234@virgil.suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-12/txt/msg00984.txt.bz2 On Mon, Dec 07, 2015 at 12:19:57PM +0100, Martin Jambor wrote: > +/* Flag set when the subsequent element in the device-specific argument > + values. */ > +#define GOMP_TARGET_ARG_SUBSEQUENT_PARAM (1 << 7) > + > +/* Bitmask to apply to a target argument to find out the value identifier. */ > +#define GOMP_TARGET_ARG_ID_MASK (((1 << 8) - 1) << 8) > +/* Target argument index of NUM_TEAMS. */ > +#define GOMP_TARGET_ARG_NUM_TEAMS (1 << 8) > +/* Target argument index of THREAD_LIMIT. */ > +#define GOMP_TARGET_ARG_THREAD_LIMIT (2 << 8) I meant that these two would be just special, passed as the first two pointers in the array, without the markup. Because, otherwise you either need to use GOMP_TARGET_ARG_SUBSEQUENT_PARAM for these always, or for 32-bit arches and for 64-bit ones shift often at runtime. Having the markup even for them is perhaps cleaner, but less efficient, so if you really want to go that way, please make sure you handle it properly for 32-bit pointers architectures though. num_teams or thread_limit could be > 32767 or > 65535. > -static void > -gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, > - void **hostaddrs, size_t *sizes, > - unsigned short *kinds) > +static void * > +gomp_target_unshare_firstprivate (size_t mapnum, void **hostaddrs, > + size_t *sizes, unsigned short *kinds) > { > size_t i, tgt_align = 0, tgt_size = 0; > char *tgt = NULL; > @@ -1281,7 +1282,7 @@ gomp_target_fallback_firstprivate (void (*fn) (void *), size_t mapnum, > } > if (tgt_align) > { > - tgt = gomp_alloca (tgt_size + tgt_align - 1); > + tgt = gomp_malloc (tgt_size + tgt_align - 1); I don't like using gomp_malloc here, either copy/paste the function, or create separate inline functions for the two loops, one for the first loop which returns you tgt_align and tgt_size, and another for the stuff after the allocation. Then you can use those two inline functions to implement both gomp_target_fallback_firstprivate which will use alloca, and gomp_target_unshare_firstprivate which will use gomp_malloc instead. > @@ -1356,6 +1377,11 @@ GOMP_target (int device, void (*fn) (void *), const void *unused, > and several arguments have been added: > FLAGS is a bitmask, see GOMP_TARGET_FLAG_* in gomp-constants.h. > DEPEND is array of dependencies, see GOMP_task for details. > + ARGS is a pointer to an array consisting of NUM_TEAMS, THREAD_LIMIT and a > + variable number of device-specific arguments, which always take two elements > + where the first specifies the type and the second the actual value. The > + last element of the array is a single NULL. Note, here you document NUM_TEAMS and THREAD_LIMIT as special values, not encoded. > @@ -1473,6 +1508,7 @@ GOMP_target_data (int device, const void *unused, size_t mapnum, > struct gomp_device_descr *devicep = resolve_device (device); > > if (devicep == NULL > + || (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) > || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)) Would be nice to have some consistency in the order of capabilities checks. Usually you check SHARED_MEM after OPENMP_400, so perhaps do it this way here too. > @@ -1741,23 +1784,38 @@ gomp_target_task_fn (void *data) > > if (ttask->state == GOMP_TARGET_TASK_FINISHED) > { > - gomp_unmap_vars (ttask->tgt, true); > + if (ttask->tgt) > + gomp_unmap_vars (ttask->tgt, true); > return false; > } This doesn't make sense. For the GOMP_OFFLOAD_CAP_SHARED_MEM case, unless you want to run the free (ttask->firstprivate_copies); as a task, you shouldn't be queing the target task again for further execution, instead it should just be handled like a finished task at that point. The reason why for XeonPhi or PTX gomp_target_task_fn is run with GOMP_TARGET_TASK_FINISHED state is that gomp_unmap_vars can perform IO actions and doing it with the tasking lock held is highly undesirable. > > - void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn); > - ttask->tgt > - = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL, > - ttask->sizes, ttask->kinds, true, > - GOMP_MAP_VARS_TARGET); > + bool shared_mem; > + if (devicep->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM) > + { > + shared_mem = true; > + ttask->tgt = NULL; > + ttask->firstprivate_copies > + = gomp_target_unshare_firstprivate (ttask->mapnum, ttask->hostaddrs, > + ttask->sizes, ttask->kinds); > + } > + else > + { > + shared_mem = false; > + ttask->tgt = gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, > + NULL, ttask->sizes, ttask->kinds, true, > + GOMP_MAP_VARS_TARGET); > + } > ttask->state = GOMP_TARGET_TASK_READY_TO_RUN; > > devicep->async_run_func (devicep->target_id, fn_addr, > - (void *) ttask->tgt->tgt_start, (void *) ttask); > + shared_mem ? ttask->hostaddrs > + : (void *) ttask->tgt->tgt_start, > + ttask->args, (void *) ttask); Replace the shared_mem bool variable with a void *arg or some other name and just set it to ttask->hostaddrs in the then case and ttask->tgt->tgt_start otherwise? > --- a/libgomp/task.c > +++ b/libgomp/task.c > @@ -581,6 +581,7 @@ GOMP_PLUGIN_target_task_completion (void *data) > gomp_mutex_unlock (&team->task_lock); > } > ttask->state = GOMP_TARGET_TASK_FINISHED; > + free (ttask->firstprivate_copies); > gomp_target_task_completion (team, task); > gomp_mutex_unlock (&team->task_lock); > } So, this function should have a special case for the SHARED_MEM case, handle it closely to say how GOMP_taskgroup_end handles the finish_cancelled: case. Just note that the target task is missing from certain queues at that point. Jakub