On 03/21/2018 10:10 AM, Tom de Vries wrote: > On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >> In addition, nvptx_cta_sync and the corresponding nvptx_barsync insn, >> have been extended to take a barrier ID and a thread count. The idea >> here is to assign one barrier for each logical vector. Worker-single >> synchronization is controlled by barrier 0. Therefore, the vector >> barrier ID is set to tid.y+1 (because there's one vector unit per >> worker) in nvptx_init_oacc_workers and placed into a register stored in >> cfun->machine->sync_bar. If no workers are present, then the barrier ID >> falls back to 0. > > I compiled a worker loop before and after the patch series, and observed > this change: > ... > @@ -70,7 +71,7 @@ >   $L2: >    // joining 2; >   $L5: > -  bar.sync 1; > +  bar.sync 0; >    // join 2; >    ret; >  } > ... > > AFAICT from your explanation above, that change is intentional. > > Changing the code generation scheme for workers is fine, but obviously > that should be a minimal, separate patch that we can bisect back to. That sounds reasonable. I'll apply this patch to og7 once testing has completed. While all of the functionality it introduces is unnecessary without the vector length changes, at least it can be applied independently. Cesar