From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 74746 invoked by alias); 30 Mar 2018 14:46:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 73075 invoked by uid 89); 30 Mar 2018 14:45:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS,TIME_LIMIT_EXCEEDED autolearn=unavailable version=3.3.2 spammy= X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Mar 2018 14:45:18 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-MBX-04.mgc.mentorg.com) by relay1.mentorg.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-SHA384:256) id 1f1vH9-0006tG-05 from Tom_deVries@mentor.com for gcc-patches@gcc.gnu.org; Fri, 30 Mar 2018 07:45:07 -0700 Received: from [172.30.72.232] (137.202.0.87) by SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) with Microsoft SMTP Server (TLS) id 15.0.1320.4; Fri, 30 Mar 2018 15:45:03 +0100 Subject: Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization From: Tom de Vries To: Cesar Philippidis References: <823cc381-8752-14df-d6e2-0203de5da2fb@codesourcery.com> <5f60a648-184d-6ada-7412-b931b087826e@mentor.com> CC: "gcc-patches@gcc.gnu.org" Message-ID: <6c698973-520b-26bc-ee83-c17077442b85@mentor.com> Date: Fri, 30 Mar 2018 14:48:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <5f60a648-184d-6ada-7412-b931b087826e@mentor.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-ClientProxiedBy: svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-SW-Source: 2018-03/txt/msg01582.txt.bz2 On 03/30/2018 03:07 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >> As a follow up patch will show, the nvptx BE falls back to using >> vector_length = 32 when a vector loop is nested inside a worker loop. > > I disabled the fallback, and analyzed the vred2d-128.c illegal memory > access execution failure. > > I minimized that down to this ptx: > ... > .shared .align 8 .u8 __oacc_bcast[176]; > > { >   { >     .reg .u32 %x; >     mov.u32 %x,%tid.x; >     setp.ne.u32 %r86,%x,0; >   } > >   { >     .reg .u32 %tidy; >     .reg .u64 %t_bcast; >     .reg .u64 %y64; >     mov.u32 %tidy,%tid.y; >     cvt.u64.u32 %y64,%tidy; >     add.u64 %y64,%y64,1; >     cvta.shared.u64 %t_bcast,__oacc_bcast; >     mad.lo.u64 %r66,%y64,88,%t_bcast; >   } > >   @ %r86 bra $L28; >   st.u32 [%r66+80],0; >  $L28: >   ret; > } > ... > > The ptx is called with 2 workers and 128 vector_length. > > So, 2 workers mean %tid.y has values 0 and 1. > Then %y64 has values 1 and 2. > Then %r66 has values __oacc_bcast + (1 * 88) and __oacc_bcast + (2 * 88). > Then the st.u32 accesss __oacc_bcast + (1 * 88) + 80 and __oacc_bcast + > (2 * 88) + 80. > > So we're accessing memory at location 256, while the __oacc_bcast is > only 176 bytes big. > > I formulated this assert that AFAIU detects this situation in the compiler: > ... > @@ -1125,6 +1125,8 @@ nvptx_init_axis_predicate (FILE *file, int regno, > const char *name) >    fprintf (file, "\t}\n"); >  } > > +static int nvptx_mach_max_workers (); > + >  /* Emit code to initialize OpenACC worker broadcast and synchronization >     registers.  */ > > @@ -1148,6 +1150,7 @@ nvptx_init_oacc_workers (FILE *file) >                "// vector broadcast offset\n", >                REGNO (cfun->machine->bcast_partition), >                oacc_bcast_partition); > +      gcc_assert (oacc_bcast_partition * (nvptx_mach_max_workers () + > 1) <= oacc_bcast_size); >      } >    if (cfun->machine->sync_bar) >      fprintf (file, "\t\tadd.u32\t\t%%r%d, %%tidy, 1; " > ... > > The assert is not triggered when the fallback is used. I've tracked the problem down to: ... > - if (oacc_bcast_size < data.offset) > - oacc_bcast_size = data.offset; > + if (oacc_bcast_partition < data.offset) > + { > + int psize = data.offset; > + psize = (psize + oacc_bcast_align - 1) & ~(oacc_bcast_align - 1); > + oacc_bcast_partition = psize; > + oacc_bcast_size = psize * (nvptx_mach_max_workers () + 1); > + } ... We hit this if clause for a first compiled function, with num_workers(1). This sets oacc_bcast_partition and oacc_bcast_size as required for that functions. Then we hit this if clause for a second compiled function, with num_workers (2). We need oacc_bcast_size updated, but the 'oacc_bcast_partition < data.offset' is false, so the update doesn't happen. I managed to fix this by making the code unconditional, and using MAX to update oacc_bcast_partition and oacc_bcast_size. Thanks, - Tom