From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24819 invoked by alias); 2 Jul 2018 14:39:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 24799 invoked by uid 89); 2 Jul 2018 14:39:10 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.8 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=preferences, reductions, H*r:0700 X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 02 Jul 2018 14:39:08 +0000 Received: from svr-orw-mbx-01.mgc.mentorg.com ([147.34.90.201]) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1fZzys-0002yt-9t from Cesar_Philippidis@mentor.com ; Mon, 02 Jul 2018 07:39:06 -0700 Received: from [127.0.0.1] (147.34.91.1) by svr-orw-mbx-01.mgc.mentorg.com (147.34.90.201) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Mon, 2 Jul 2018 07:39:03 -0700 Subject: Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions To: Tom de Vries CC: "gcc-patches@gcc.gnu.org" , Jakub Jelinek , Thomas Schwinge References: <7864b0d0-b39f-713b-9d5c-13e91c11bd55@suse.de> <0034e0fb-402e-8681-0f62-3fd274d00a99@codesourcery.com> <130dc2ce-1db1-1fdd-a4a3-63e479466beb@suse.de> From: Cesar Philippidis Message-ID: Date: Mon, 02 Jul 2018 14:39:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <130dc2ce-1db1-1fdd-a4a3-63e479466beb@suse.de> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-SW-Source: 2018-07/txt/msg00050.txt.bz2 On 07/02/2018 07:14 AM, Tom de Vries wrote: > On 06/21/2018 03:58 PM, Cesar Philippidis wrote: >> On 06/20/2018 03:15 PM, Tom de Vries wrote: >>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote: >>>> Now it follows the formula contained in >>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA. >>> >>> Any reason we're not using the cuda runtime functions to get the >>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns >>> to determine launch configuration in nvptx ) ? >> >> There are two reasons: >> >> 1) cuda_occupancy.h depends on the CUDA runtime to extract the device >> properties instead of the CUDA driver API. However, we can always >> teach libgomp how to populate the cudaDeviceProp struct using the >> driver API. >> >> 2) CUDA is not always present on the build host, and that's why >> libgomp maintains its own cuda.h. So at the very least, this >> functionality would be good to have in libgomp as a fallback >> implementation; > > Libgomp maintains its own cuda.h to "allow building GCC with PTX > offloading even without CUDA being installed" ( > https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ). > > The libgomp nvptx plugin however uses the cuda driver API to launch > kernels etc, so we can assume that's always available at launch time. > And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch > Configuration", the occupancy API is also available in the driver API. Thanks for the info. I was not aware that the CUDA driver API had a thread occupancy calculator (it' described in section 4.18). > What we cannot assume to be available is the occupancy API pre cuda-6.5. > So it's fine to have a fallback for that (properly isolated in utility > functions), but for cuda 6.5 and up we want to use the occupancy API. That seems reasonable. I'll run some experiments with that. In the meantime, would it be OK to make this fallback the default, then add support for the driver occupancy calculator as a follow up? >> its not good to have program fail due to >> insufficient hardware resources errors when it is avoidable. >> > > Right, in fact there are two separate things you're trying to address > here: launch failure and occupancy heuristic, so split the patch. ACK. I'll split those changes into separate patches. By the way, do you have any preferences on how to break up the nvptx vector length changes for trunk submission? I was planning on breaking it down into four components - generic ME changes, tests, nvptx reductions and the rest. Those two nvptx compoinents are large, so I'll probably break them down to smaller patches, but I'm not sure if it's worthwhile to make them independent from one another with the use of a lot of stub functions. Cesar