From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 33557 invoked by alias); 21 Oct 2015 08:54:23 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 33538 invoked by uid 89); 21 Oct 2015 08:54:22 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Wed, 21 Oct 2015 08:54:21 +0000 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (Postfix) with ESMTPS id 39496C0C18A5; Wed, 21 Oct 2015 08:54:20 +0000 (UTC) Received: from tucnak.zalov.cz (ovpn-116-53.ams2.redhat.com [10.36.116.53]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id t9L8sIwk001602 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 21 Oct 2015 04:54:19 -0400 Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id t9L8sGaC019049; Wed, 21 Oct 2015 10:54:17 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id t9L8sFsC019048; Wed, 21 Oct 2015 10:54:15 +0200 Date: Wed, 21 Oct 2015 08:56:00 -0000 From: Jakub Jelinek To: Alexander Monakov Cc: gcc-patches@gcc.gnu.org, Dmitry Melnik Subject: Re: [gomp4 00/14] NVPTX: further porting Message-ID: <20151021085415.GK478@tucnak.redhat.com> Reply-To: Jakub Jelinek References: <1445366076-16082-1-git-send-email-amonakov@ispras.ru> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1445366076-16082-1-git-send-email-amonakov@ispras.ru> User-Agent: Mutt/1.5.23 (2014-03-12) X-IsSubscribed: yes X-SW-Source: 2015-10/txt/msg02033.txt.bz2 On Tue, Oct 20, 2015 at 09:34:22PM +0300, Alexander Monakov wrote: > I've opted not to use dynamic parallelism. It increases the hardware > requirement from sm_30 to sm_35, needs a library from CUDA Toolkit at link I'll try to add the thread_limit/num_teams arguments to GOMP_target_41 soon (together with the target teams clause evaluation changes), so sometimes you'll have that information at target time, but not always. Using teams/thread preallocation when possible is fine with me, but I think it is not always possible, if you can't see what teams will require for number of teams or what thread_limit will it want, or if thread_limit is unspecified and you have no idea how many threads will be requested... I think requiring sm_35 should not be a very big deal. > time (libcudadevrt.a), and imposes overhead at run time. The last point might But if this is the case, that is really serious issue. Is that really something that isn't available in a shared library? E.g. with my distro GCC maintainer hat on, I'd really like to tweak the libgomp PTX plugin, so that it compiles against a stub cuda.h header and doesn't like against libcuda*.so at all, but instead dlopens it, to avoid hard dependencies on the non-free CUDA stuff and more importantly any link time dependencies on that. If libcudadevrt is not available as shared library, this wouldn't of course work. Would be nice to talk to NVidia about this... > libgomp.c/thread-limit-2.c: fails to link due to 'usleep' unavailable on > NVPTX. Note, the test does not run anything on the device because the target > region has 'if (0)' clause. As optimization, perhaps we could avoid adding the "omp target entrypoint" attribute for the body of if(0) target region, that one always goes to host fallback, so no offloaded code is needed. As for other tests, XFAILing them always is undesirable, supposedly we could add a dejagnu target check whether the default target goes to PTX (if we don't have it already) and use that to xfail? Of course that doesn't help the thread-limit-2.c testcase. Jakub