On 03/22/2018 04:59 AM, Cesar Philippidis wrote: > On 03/21/2018 10:10 AM, Tom de Vries wrote: >> On 03/02/2018 05:55 PM, Cesar Philippidis wrote: >>> In addition, nvptx_cta_sync and the corresponding nvptx_barsync insn, >>> have been extended to take a barrier ID and a thread count. The idea >>> here is to assign one barrier for each logical vector. Worker-single >>> synchronization is controlled by barrier 0. Therefore, the vector >>> barrier ID is set to tid.y+1 (because there's one vector unit per >>> worker) in nvptx_init_oacc_workers and placed into a register stored in >>> cfun->machine->sync_bar. If no workers are present, then the barrier ID >>> falls back to 0. >> >> I compiled a worker loop before and after the patch series, and observed >> this change: >> ... >> @@ -70,7 +71,7 @@ >>   $L2: >>    // joining 2; >>   $L5: >> -  bar.sync 1; >> +  bar.sync 0; >>    // join 2; >>    ret; >>  } >> ... >> >> AFAICT from your explanation above, that change is intentional. >> >> Changing the code generation scheme for workers is fine, but obviously >> that should be a minimal, separate patch that we can bisect back to. > > That sounds reasonable. I'll apply this patch to og7 once testing has > completed. While all of the functionality it introduces is unnecessary In other words, the patch is not minimal. Thanks, - Tom > without the vector length changes, at least it can be applied independently. >