Hi Thomas, On 2024/3/15 7:24 PM, Thomas Schwinge wrote: > Hi Chung-Lin! > > I realized: please add "PR libgomp/92840" to the Git commit log, as your > changes are directly a continuation of my earlier changes. Okay, I'll remember to do that. ... > - if (n->refcount != REFCOUNT_INFINITY) > + if (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA) > n->refcount--; > n->dynamic_refcount--; > } > > + /* Mappings created by 'acc_map_data' may only be deleted by > + 'acc_unmap_data'. */ > + if (n->refcount == REFCOUNT_ACC_MAP_DATA > + && n->dynamic_refcount == 0) > + n->dynamic_refcount = 1; > + > if (n->refcount == 0) > { > bool copyout = (kind == GOMP_MAP_FROM > > ..., which really should have the same semantics? No strong opinion on > which of the two variants you now chose. My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will be lighter on any branch predictors (faster performing overall), so I will stick with my version here. >>> >>> It's not clear to me why you need this handling -- instead of just >>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is, >>> early 'return'? >>> >>> Per my understanding, this code is for OpenACC only exercised for >>> structured data regions, and it seems strange (unnecessary?) to adjust >>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data? Or am I >>> missing anything? >> >> No, that is not true. It goes through almost everything through gomp_map_vars_existing/_internal. >> This is what happens when you acc_create/acc_copyin on a mapping created by acc_map_data. > > But I don't understand what you foresee breaking with the following (on > top of your v2): > > --- a/libgomp/target.c > +++ b/libgomp/target.c > @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr *devicep, void *devptr) > static inline void > gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set) > { > - if (k == NULL || k->refcount == REFCOUNT_INFINITY) > + if (k == NULL > + || k->refcount == REFCOUNT_INFINITY > + || k->refcount == REFCOUNT_ACC_MAP_DATA) > return; > > uintptr_t *refcount_ptr = &k->refcount; > > - if (k->refcount == REFCOUNT_ACC_MAP_DATA) > - refcount_ptr = &k->dynamic_refcount; > - else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > + if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount)) > refcount_ptr = &k->structelem_refcount; ... > Can you please show a test case? I have re-tested the patch *without* the gomp_increment/decrement_refcount changes, and have these regressions (just to demonstrate what is affected): +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test Now, I have also re-tested your version (aka, just break early and return when k->refcount == REFCOUNT_ACC_MAP_DATA) And for the record, that also works (no regressions). However, I strongly suggest we use my version here where we adjust the dynamic_refcount, simply because: *It is the whole point of this project item in OpenACC 2.7* The 2.7 spec articulated how increment/decrement interacts with acc_map_data/acc_unmap_data and this patch was supposed to make libgomp more conforming to it implementation-wise. (otherwise, no point in working on this at all, as there wasn't really anything behaviorally wrong about our implementation before) > I see we already have: > > if ((kinds[i] & 0xff) == GOMP_MAP_TO_PSET > && tgt->list_count == 0) > { > /* 'declare target'. */ > assert (n->refcount == REFCOUNT_INFINITY); > > I think I wanted to you to add: > > --- a/libgomp/oacc-mem.c > +++ b/libgomp/oacc-mem.c > @@ -1217,7 +1209,8 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, > processed = true; > } > else > - assert (n->refcount != REFCOUNT_INFINITY); > + assert (n->refcount != REFCOUNT_INFINITY > + && n->refcount != REFCOUNT_ACC_MAP_DATA); > > for (size_t j = 0; j < tgt->list_count; j++) > if (tgt->list[j].key == n) I have added this fragment to the patch, thanks. > > Please check these items, and then we're good to go. I have attached v3 of this patch, of course re-tested without regressions. If there are no objections I will commit this before end of week (maybe weekend) Thanks, Chung-Lin