* [OG12][committed] amdgcn: OpenMP low-latency allocator
@ 2023-02-16 18:06 Andrew Stubbs
2023-02-16 21:11 ` [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator) Thomas Schwinge
2023-03-24 16:30 ` [OG12][committed] amdgcn: OpenMP low-latency allocator Thomas Schwinge
0 siblings, 2 replies; 4+ messages in thread
From: Andrew Stubbs @ 2023-02-16 18:06 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 400 bytes --]
These patches implement an LDS memory allocator for OpenMP on AMD.
1. 230216-basic-allocator.patch
Separate the allocator from NVPTX so the code can be shared.
2. 230216-amd-low-lat.patch
Allocate the memory, adjust the default address space, and hook up the
allocator.
They will need to be integrated with the rest of the memory management
patch-stack when I repost that for mainline.
Andrew
[-- Attachment #2: 230216-basic-allocator.patch --]
[-- Type: text/plain, Size: 23962 bytes --]
nvptx, libgomp: Move the low-latency allocator code
There shouldn't be a functionality change; this is just so AMD can share
the code.
The new basic-allocator.c is designed to be included so it can be used as a
template multiple times and inlined.
libgomp/ChangeLog:
* config/nvptx/allocator.c (BASIC_ALLOC_PREFIX): New define, and
include basic-allocator.c.
(__nvptx_lowlat_heap_root): Remove.
(heapdesc): Remove.
(nvptx_memspace_alloc): Move implementation to basic-allocator.c.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_free): Likewise.
(nvptx_memspace_realloc): Likewise.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): Remove.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* basic-allocator.c: New file.
diff --git a/libgomp/basic-allocator.c b/libgomp/basic-allocator.c
new file mode 100644
index 00000000000..94b99a89e0b
--- /dev/null
+++ b/libgomp/basic-allocator.c
@@ -0,0 +1,380 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* This is a basic "malloc" implementation intended for use with small,
+ low-latency memories.
+
+ To use this template, define BASIC_ALLOC_PREFIX, and then #include the
+ source file. The other configuration macros are optional.
+
+ The root heap descriptor is stored in the first bytes of the heap, and each
+ free chunk contains a similar descriptor for the next free chunk in the
+ chain.
+
+ The descriptor is two values: offset and size, which describe the
+ location of a chunk of memory available for allocation. The offset is
+ relative to the base of the heap. The special offset value 0xffffffff
+ indicates that the heap (free chain) is locked. The offset and size are
+ 32-bit values so the base alignment can be 8-bytes.
+
+ Memory is allocated to the first free chunk that fits. The free chain
+ is always stored in order of the offset to assist coalescing adjacent
+ chunks. */
+
+#include "libgomp.h"
+
+#ifndef BASIC_ALLOC_PREFIX
+#error "BASIC_ALLOC_PREFIX not defined."
+#endif
+
+#ifndef BASIC_ALLOC_YIELD
+#deine BASIC_ALLOC_YIELD
+#endif
+
+#define ALIGN(VAR) (((VAR) + 7) & ~7) /* 8-byte granularity. */
+
+#define fn1(prefix, name) prefix ## _ ## name
+#define fn(prefix, name) fn1 (prefix, name)
+#define basic_alloc_init fn(BASIC_ALLOC_PREFIX,init)
+#define basic_alloc_alloc fn(BASIC_ALLOC_PREFIX,alloc)
+#define basic_alloc_calloc fn(BASIC_ALLOC_PREFIX,calloc)
+#define basic_alloc_free fn(BASIC_ALLOC_PREFIX,free)
+#define basic_alloc_realloc fn(BASIC_ALLOC_PREFIX,realloc)
+
+typedef struct {
+ uint32_t offset;
+ uint32_t size;
+} heapdesc;
+
+void
+basic_alloc_init (char *heap, size_t limit)
+{
+ if (heap == NULL)
+ return;
+
+ /* Initialize the head of the free chain. */
+ heapdesc *root = (heapdesc*)heap;
+ root->offset = ALIGN(1);
+ root->size = limit - root->offset;
+
+ /* And terminate the chain. */
+ heapdesc *next = (heapdesc*)(heap + root->offset);
+ next->offset = 0;
+ next->size = 0;
+}
+
+static void *
+basic_alloc_alloc (char *heap, size_t size)
+{
+ if (heap == NULL)
+ return NULL;
+
+ /* Memory is allocated in N-byte granularity. */
+ size = ALIGN (size);
+
+ /* Acquire a lock on the low-latency heap. */
+ heapdesc root, *root_ptr = (heapdesc*)heap;
+ do
+ {
+ root.offset = __atomic_exchange_n (&root_ptr->offset, 0xffffffff,
+ MEMMODEL_ACQUIRE);
+ if (root.offset != 0xffffffff)
+ {
+ root.size = root_ptr->size;
+ break;
+ }
+ /* Spin. */
+ BASIC_ALLOC_YIELD;
+ }
+ while (1);
+
+ /* Walk the free chain. */
+ heapdesc chunk = root;
+ heapdesc *prev_chunkptr = NULL;
+ heapdesc *chunkptr = (heapdesc*)(heap + chunk.offset);
+ heapdesc onward_chain = *chunkptr;
+ while (chunk.size != 0 && (uint32_t)size > chunk.size)
+ {
+ chunk = onward_chain;
+ prev_chunkptr = chunkptr;
+ chunkptr = (heapdesc*)(heap + chunk.offset);
+ onward_chain = *chunkptr;
+ }
+
+ void *result = NULL;
+ if (chunk.size != 0)
+ {
+ /* Allocation successful. */
+ result = chunkptr;
+
+ /* Update the free chain. */
+ heapdesc stillfree = chunk;
+ stillfree.offset += size;
+ stillfree.size -= size;
+ heapdesc *stillfreeptr = (heapdesc*)(heap + stillfree.offset);
+
+ if (stillfree.size == 0)
+ /* The whole chunk was used. */
+ stillfree = onward_chain;
+ else
+ /* The chunk was split, so restore the onward chain. */
+ *stillfreeptr = onward_chain;
+
+ /* The previous free slot or root now points to stillfree. */
+ if (prev_chunkptr)
+ *prev_chunkptr = stillfree;
+ else
+ root = stillfree;
+ }
+
+ /* Update the free chain root and release the lock. */
+ root_ptr->size = root.size;
+ __atomic_store_n (&root_ptr->offset, root.offset, MEMMODEL_RELEASE);
+
+ return result;
+}
+
+static void *
+basic_alloc_calloc (char *heap, size_t size)
+{
+ /* Memory is allocated in N-byte granularity. */
+ size = ALIGN (size);
+
+ uint64_t *result = basic_alloc_alloc (heap, size);
+ if (result)
+ /* Inline memset in which we know size is a multiple of 8. */
+ for (unsigned i = 0; i < (unsigned)size/8; i++)
+ result[i] = 0;
+
+ return result;
+}
+
+static void
+basic_alloc_free (char *heap, void *addr, size_t size)
+{
+ /* Memory is allocated in N-byte granularity. */
+ size = ALIGN (size);
+
+ /* Acquire a lock on the low-latency heap. */
+ heapdesc root, *root_ptr = (heapdesc*)heap;
+ do
+ {
+ root.offset = __atomic_exchange_n (&root_ptr->offset, 0xffffffff,
+ MEMMODEL_ACQUIRE);
+ if (root.offset != 0xffffffff)
+ {
+ root.size = root_ptr->size;
+ break;
+ }
+ /* Spin. */
+ }
+ while (1);
+
+ /* Walk the free chain to find where to insert a new entry. */
+ heapdesc chunk = root, prev_chunk;
+ heapdesc *prev_chunkptr = NULL, *prevprev_chunkptr = NULL;
+ heapdesc *chunkptr = (heapdesc*)(heap + chunk.offset);
+ heapdesc onward_chain = *chunkptr;
+ while (chunk.size != 0 && addr > (void*)chunkptr)
+ {
+ prev_chunk = chunk;
+ chunk = onward_chain;
+ prevprev_chunkptr = prev_chunkptr;
+ prev_chunkptr = chunkptr;
+ chunkptr = (heapdesc*)(heap + chunk.offset);
+ onward_chain = *chunkptr;
+ }
+
+ /* Create the new chunk descriptor. */
+ heapdesc newfreechunk;
+ newfreechunk.offset = (uint32_t)((uintptr_t)addr - (uintptr_t)heap);
+ newfreechunk.size = (uint32_t)size;
+
+ /* Coalesce adjacent free chunks. */
+ if (newfreechunk.offset + size == chunk.offset)
+ {
+ /* Free chunk follows. */
+ newfreechunk.size += chunk.size;
+ chunk = onward_chain;
+ }
+ if (prev_chunkptr)
+ {
+ if (prev_chunk.offset + prev_chunk.size
+ == newfreechunk.offset)
+ {
+ /* Free chunk precedes. */
+ newfreechunk.offset = prev_chunk.offset;
+ newfreechunk.size += prev_chunk.size;
+ addr = heap + prev_chunk.offset;
+ prev_chunkptr = prevprev_chunkptr;
+ }
+ }
+
+ /* Update the free chain in the new and previous chunks. */
+ *(heapdesc*)addr = chunk;
+ if (prev_chunkptr)
+ *prev_chunkptr = newfreechunk;
+ else
+ root = newfreechunk;
+
+ /* Update the free chain root and release the lock. */
+ root_ptr->size = root.size;
+ __atomic_store_n (&root_ptr->offset, root.offset, MEMMODEL_RELEASE);
+
+}
+
+static void *
+basic_alloc_realloc (char *heap, void *addr, size_t oldsize,
+ size_t size)
+{
+ /* Memory is allocated in N-byte granularity. */
+ oldsize = ALIGN (oldsize);
+ size = ALIGN (size);
+
+ if (oldsize == size)
+ return addr;
+
+ /* Acquire a lock on the low-latency heap. */
+ heapdesc root, *root_ptr = (heapdesc*)heap;
+ do
+ {
+ root.offset = __atomic_exchange_n (&root_ptr->offset, 0xffffffff,
+ MEMMODEL_ACQUIRE);
+ if (root.offset != 0xffffffff)
+ {
+ root.size = root_ptr->size;
+ break;
+ }
+ /* Spin. */
+ }
+ while (1);
+
+ /* Walk the free chain. */
+ heapdesc chunk = root;
+ heapdesc *prev_chunkptr = NULL;
+ heapdesc *chunkptr = (heapdesc*)(heap + chunk.offset);
+ heapdesc onward_chain = *chunkptr;
+ while (chunk.size != 0 && (void*)chunkptr < addr)
+ {
+ chunk = onward_chain;
+ prev_chunkptr = chunkptr;
+ chunkptr = (heapdesc*)(heap + chunk.offset);
+ onward_chain = *chunkptr;
+ }
+
+ void *result = NULL;
+ if (size < oldsize)
+ {
+ /* The new allocation is smaller than the old; we can always
+ shrink an allocation in place. */
+ result = addr;
+
+ heapdesc *nowfreeptr = (heapdesc*)(addr + size);
+
+ /* Update the free chain. */
+ heapdesc nowfree;
+ nowfree.offset = (char*)nowfreeptr - heap;
+ nowfree.size = oldsize - size;
+
+ if (nowfree.offset + size == chunk.offset)
+ {
+ /* Coalesce following free chunk. */
+ nowfree.size += chunk.size;
+ *nowfreeptr = onward_chain;
+ }
+ else
+ *nowfreeptr = chunk;
+
+ /* The previous free slot or root now points to nowfree. */
+ if (prev_chunkptr)
+ *prev_chunkptr = nowfree;
+ else
+ root = nowfree;
+ }
+ else if (chunk.size != 0
+ && (char *)addr + oldsize == (char *)chunkptr
+ && chunk.size >= size-oldsize)
+ {
+ /* The new allocation is larger than the old, and we found a
+ large enough free block right after the existing block,
+ so we extend into that space. */
+ result = addr;
+
+ uint32_t delta = size-oldsize;
+
+ /* Update the free chain. */
+ heapdesc stillfree = chunk;
+ stillfree.offset += delta;
+ stillfree.size -= delta;
+ heapdesc *stillfreeptr = (heapdesc*)(heap + stillfree.offset);
+
+ if (stillfree.size == 0)
+ /* The whole chunk was used. */
+ stillfree = onward_chain;
+ else
+ /* The chunk was split, so restore the onward chain. */
+ *stillfreeptr = onward_chain;
+
+ /* The previous free slot or root now points to stillfree. */
+ if (prev_chunkptr)
+ *prev_chunkptr = stillfree;
+ else
+ root = stillfree;
+ }
+ /* Else realloc in-place has failed and result remains NULL. */
+
+ /* Update the free chain root and release the lock. */
+ root_ptr->size = root.size;
+ __atomic_store_n (&root_ptr->offset, root.offset, MEMMODEL_RELEASE);
+
+ if (result == NULL)
+ {
+ /* The allocation could not be extended in place, so we simply
+ allocate fresh memory and move the data. If we can't allocate
+ from low-latency memory then we leave the original alloaction
+ intact and return NULL.
+ We could do a fall-back to main memory, but we don't know what
+ the fall-back trait said to do. */
+ result = basic_alloc_alloc (heap, size);
+ if (result != NULL)
+ {
+ /* Inline memcpy in which we know oldsize is a multiple of 8. */
+ uint64_t *from = addr, *to = result;
+ for (unsigned i = 0; i < (unsigned)oldsize/8; i++)
+ to[i] = from[i];
+
+ basic_alloc_free (heap, addr, oldsize);
+ }
+ }
+
+ return result;
+}
+
+#undef ALIGN
+#undef fn1
+#undef fn
+#undef basic_alloc_init
+#undef basic_alloc_alloc
+#undef basic_alloc_free
+#undef basic_alloc_realloc
diff --git a/libgomp/config/nvptx/allocator.c b/libgomp/config/nvptx/allocator.c
index c1a73511623..7c2a7463bf7 100644
--- a/libgomp/config/nvptx/allocator.c
+++ b/libgomp/config/nvptx/allocator.c
@@ -44,20 +44,13 @@
#include "libgomp.h"
#include <stdlib.h>
+#define BASIC_ALLOC_PREFIX __nvptx_lowlat
+#include "../../basic-allocator.c"
+
/* There should be some .shared space reserved for us. There's no way to
express this magic extern sizeless array in C so use asm. */
asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
-extern uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
-
-typedef union {
- uint32_t raw;
- struct {
- uint16_t offset;
- uint16_t size;
- } desc;
-} heapdesc;
-
static void *
nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
{
@@ -66,64 +59,7 @@ nvptx_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
char *shared_pool;
asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
- /* Memory is allocated in 8-byte granularity. */
- size = (size + 7) & ~7;
-
- /* Acquire a lock on the low-latency heap. */
- heapdesc root;
- do
- {
- root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
- 0xffffffff, MEMMODEL_ACQUIRE);
- if (root.raw != 0xffffffff)
- break;
- /* Spin. */
- }
- while (1);
-
- /* Walk the free chain. */
- heapdesc chunk = {root.raw};
- uint32_t *prev_chunkptr = NULL;
- uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- heapdesc onward_chain = {chunkptr[0]};
- while (chunk.desc.size != 0 && (uint32_t)size > chunk.desc.size)
- {
- chunk.raw = onward_chain.raw;
- prev_chunkptr = chunkptr;
- chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- onward_chain.raw = chunkptr[0];
- }
-
- void *result = NULL;
- if (chunk.desc.size != 0)
- {
- /* Allocation successful. */
- result = chunkptr;
-
- /* Update the free chain. */
- heapdesc stillfree = {chunk.raw};
- stillfree.desc.offset += size;
- stillfree.desc.size -= size;
- uint32_t *stillfreeptr = (uint32_t*)(shared_pool
- + stillfree.desc.offset);
-
- if (stillfree.desc.size == 0)
- /* The whole chunk was used. */
- stillfree.raw = onward_chain.raw;
- else
- /* The chunk was split, so restore the onward chain. */
- stillfreeptr[0] = onward_chain.raw;
-
- /* The previous free slot or root now points to stillfree. */
- if (prev_chunkptr)
- prev_chunkptr[0] = stillfree.raw;
- else
- root.raw = stillfree.raw;
- }
-
- /* Update the free chain root and release the lock. */
- __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
- return result;
+ return __nvptx_lowlat_alloc (shared_pool, size);
}
else if (memspace == ompx_host_mem_space)
return NULL;
@@ -136,16 +72,10 @@ nvptx_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
{
if (memspace == omp_low_lat_mem_space)
{
- /* Memory is allocated in 8-byte granularity. */
- size = (size + 7) & ~7;
-
- uint64_t *result = nvptx_memspace_alloc (memspace, size);
- if (result)
- /* Inline memset in which we know size is a multiple of 8. */
- for (unsigned i = 0; i < (unsigned)size/8; i++)
- result[i] = 0;
+ char *shared_pool;
+ asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
- return result;
+ return __nvptx_lowlat_calloc (shared_pool, size);
}
else if (memspace == ompx_host_mem_space)
return NULL;
@@ -161,71 +91,7 @@ nvptx_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
char *shared_pool;
asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
- /* Memory is allocated in 8-byte granularity. */
- size = (size + 7) & ~7;
-
- /* Acquire a lock on the low-latency heap. */
- heapdesc root;
- do
- {
- root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
- 0xffffffff, MEMMODEL_ACQUIRE);
- if (root.raw != 0xffffffff)
- break;
- /* Spin. */
- }
- while (1);
-
- /* Walk the free chain to find where to insert a new entry. */
- heapdesc chunk = {root.raw}, prev_chunk;
- uint32_t *prev_chunkptr = NULL, *prevprev_chunkptr = NULL;
- uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- heapdesc onward_chain = {chunkptr[0]};
- while (chunk.desc.size != 0 && addr > (void*)chunkptr)
- {
- prev_chunk.raw = chunk.raw;
- chunk.raw = onward_chain.raw;
- prevprev_chunkptr = prev_chunkptr;
- prev_chunkptr = chunkptr;
- chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- onward_chain.raw = chunkptr[0];
- }
-
- /* Create the new chunk descriptor. */
- heapdesc newfreechunk;
- newfreechunk.desc.offset = (uint16_t)((uintptr_t)addr
- - (uintptr_t)shared_pool);
- newfreechunk.desc.size = (uint16_t)size;
-
- /* Coalesce adjacent free chunks. */
- if (newfreechunk.desc.offset + size == chunk.desc.offset)
- {
- /* Free chunk follows. */
- newfreechunk.desc.size += chunk.desc.size;
- chunk.raw = onward_chain.raw;
- }
- if (prev_chunkptr)
- {
- if (prev_chunk.desc.offset + prev_chunk.desc.size
- == newfreechunk.desc.offset)
- {
- /* Free chunk precedes. */
- newfreechunk.desc.offset = prev_chunk.desc.offset;
- newfreechunk.desc.size += prev_chunk.desc.size;
- addr = shared_pool + prev_chunk.desc.offset;
- prev_chunkptr = prevprev_chunkptr;
- }
- }
-
- /* Update the free chain in the new and previous chunks. */
- ((uint32_t*)addr)[0] = chunk.raw;
- if (prev_chunkptr)
- prev_chunkptr[0] = newfreechunk.raw;
- else
- root.raw = newfreechunk.raw;
-
- /* Update the free chain root and release the lock. */
- __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
+ __nvptx_lowlat_free (shared_pool, addr, size);
}
else
free (addr);
@@ -240,123 +106,7 @@ nvptx_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
char *shared_pool;
asm ("cvta.shared.u64\t%0, __nvptx_lowlat_pool;" : "=r"(shared_pool));
- /* Memory is allocated in 8-byte granularity. */
- oldsize = (oldsize + 7) & ~7;
- size = (size + 7) & ~7;
-
- if (oldsize == size)
- return addr;
-
- /* Acquire a lock on the low-latency heap. */
- heapdesc root;
- do
- {
- root.raw = __atomic_exchange_n (&__nvptx_lowlat_heap_root,
- 0xffffffff, MEMMODEL_ACQUIRE);
- if (root.raw != 0xffffffff)
- break;
- /* Spin. */
- }
- while (1);
-
- /* Walk the free chain. */
- heapdesc chunk = {root.raw};
- uint32_t *prev_chunkptr = NULL;
- uint32_t *chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- heapdesc onward_chain = {chunkptr[0]};
- while (chunk.desc.size != 0 && (void*)chunkptr < addr)
- {
- chunk.raw = onward_chain.raw;
- prev_chunkptr = chunkptr;
- chunkptr = (uint32_t*)(shared_pool + chunk.desc.offset);
- onward_chain.raw = chunkptr[0];
- }
-
- void *result = NULL;
- if (size < oldsize)
- {
- /* The new allocation is smaller than the old; we can always
- shrink an allocation in place. */
- result = addr;
-
- uint32_t *nowfreeptr = (uint32_t*)(addr + size);
-
- /* Update the free chain. */
- heapdesc nowfree;
- nowfree.desc.offset = (char*)nowfreeptr - shared_pool;
- nowfree.desc.size = oldsize - size;
-
- if (nowfree.desc.offset + size == chunk.desc.offset)
- {
- /* Coalesce following free chunk. */
- nowfree.desc.size += chunk.desc.size;
- nowfreeptr[0] = onward_chain.raw;
- }
- else
- nowfreeptr[0] = chunk.raw;
-
- /* The previous free slot or root now points to nowfree. */
- if (prev_chunkptr)
- prev_chunkptr[0] = nowfree.raw;
- else
- root.raw = nowfree.raw;
- }
- else if (chunk.desc.size != 0
- && (char *)addr + oldsize == (char *)chunkptr
- && chunk.desc.size >= size-oldsize)
- {
- /* The new allocation is larger than the old, and we found a
- large enough free block right after the existing block,
- so we extend into that space. */
- result = addr;
-
- uint16_t delta = size-oldsize;
-
- /* Update the free chain. */
- heapdesc stillfree = {chunk.raw};
- stillfree.desc.offset += delta;
- stillfree.desc.size -= delta;
- uint32_t *stillfreeptr = (uint32_t*)(shared_pool
- + stillfree.desc.offset);
-
- if (stillfree.desc.size == 0)
- /* The whole chunk was used. */
- stillfree.raw = onward_chain.raw;
- else
- /* The chunk was split, so restore the onward chain. */
- stillfreeptr[0] = onward_chain.raw;
-
- /* The previous free slot or root now points to stillfree. */
- if (prev_chunkptr)
- prev_chunkptr[0] = stillfree.raw;
- else
- root.raw = stillfree.raw;
- }
- /* Else realloc in-place has failed and result remains NULL. */
-
- /* Update the free chain root and release the lock. */
- __atomic_store_n (&__nvptx_lowlat_heap_root, root.raw, MEMMODEL_RELEASE);
-
- if (result == NULL)
- {
- /* The allocation could not be extended in place, so we simply
- allocate fresh memory and move the data. If we can't allocate
- from low-latency memory then we leave the original alloaction
- intact and return NULL.
- We could do a fall-back to main memory, but we don't know what
- the fall-back trait said to do. */
- result = nvptx_memspace_alloc (memspace, size);
- if (result != NULL)
- {
- /* Inline memcpy in which we know oldsize is a multiple of 8. */
- uint64_t *from = addr, *to = result;
- for (unsigned i = 0; i < (unsigned)oldsize/8; i++)
- to[i] = from[i];
-
- nvptx_memspace_free (memspace, addr, oldsize);
- }
- }
- return result;
+ return __nvptx_lowlat_realloc (shared_pool, addr, oldsize, size);
}
else if (memspace == ompx_host_mem_space)
return NULL;
diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 685610e00be..b30b8df178d 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -33,7 +33,6 @@
struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
int __gomp_team_num __attribute__((shared,nocommon));
-uint32_t __nvptx_lowlat_heap_root __attribute__((shared,nocommon));
static void gomp_thread_start (struct gomp_thread_pool *);
@@ -41,6 +40,9 @@ static void gomp_thread_start (struct gomp_thread_pool *);
express this magic extern sizeless array in C so use asm. */
asm (".extern .shared .u8 __nvptx_lowlat_pool[];\n");
+/* Defined in basic-allocator.c via config/nvptx/allocator.c. */
+void __nvptx_lowlat_init (void *heap, size_t size);
+
/* This externally visible function handles target region entry. It
sets up a per-team thread pool and transfers control by calling FN (FN_DATA)
in the master thread or gomp_thread_start in other threads.
@@ -76,19 +78,7 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
asm ("mov.u32\t%0, %%dynamic_smem_size;\n"
: "=r"(shared_pool_size));
#endif
-
- /* ... and initialize it with an empty free-chain. */
- union {
- uint32_t raw;
- struct {
- uint16_t offset;
- uint16_t size;
- } desc;
- } root;
- root.desc.offset = 0; /* The first byte is free. */
- root.desc.size = shared_pool_size; /* The whole space is free. */
- __nvptx_lowlat_heap_root = root.raw;
- shared_pool[0] = 0; /* Terminate free chain. */
+ __nvptx_lowlat_init (shared_pool, shared_pool_size);
/* Initialize the thread pool. */
struct gomp_thread_pool *pool = alloca (sizeof (*pool));
[-- Attachment #3: 230216-amd-low-lat.patch --]
[-- Type: text/plain, Size: 13659 bytes --]
amdgcn, libgomp: low-latency allocator
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/allocators-7.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
diff --git a/gcc/config/gcn/gcn-builtins.def b/gcc/config/gcn/gcn-builtins.def
index f1cf30bbc94..3619cab4402 100644
--- a/gcc/config/gcn/gcn-builtins.def
+++ b/gcc/config/gcn/gcn-builtins.def
@@ -164,6 +164,8 @@ DEF_BUILTIN (FIRST_CALL_THIS_THREAD_P, -1, "first_call_this_thread_p", B_INSN,
_A1 (GCN_BTI_BOOL), gcn_expand_builtin_1)
DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
gcn_expand_builtin_1)
+DEF_BUILTIN (DISPATCH_PTR, -1, "dispatch_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
+ gcn_expand_builtin_1)
DEF_BUILTIN (GET_STACK_LIMIT, -1, "get_stack_limit", B_INSN,
_A1 (GCN_BTI_VOIDPTR), gcn_expand_builtin_1)
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 0b21dbd256e..8e487b94e95 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -114,7 +114,8 @@ gcn_init_machine_status (void)
f = ggc_cleared_alloc<machine_function> ();
- if (TARGET_GCN3)
+ // FIXME: re-enable global addressing with safety for LDS-flat addresses
+ //if (TARGET_GCN3)
f->use_flat_addressing = true;
return f;
@@ -4626,6 +4627,19 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
}
return ptr;
}
+ case GCN_BUILTIN_DISPATCH_PTR:
+ {
+ rtx ptr;
+ if (cfun->machine->args.reg[DISPATCH_PTR_ARG] >= 0)
+ ptr = gen_rtx_REG (DImode,
+ cfun->machine->args.reg[DISPATCH_PTR_ARG]);
+ else
+ {
+ ptr = gen_reg_rtx (DImode);
+ emit_move_insn (ptr, const0_rtx);
+ }
+ return ptr;
+ }
case GCN_BUILTIN_FIRST_CALL_THIS_THREAD_P:
{
/* Stash a marker in the unused upper 16 bits of s[0:1] to indicate
diff --git a/libgomp/config/gcn/allocator.c b/libgomp/config/gcn/allocator.c
new file mode 100644
index 00000000000..001de89ffe0
--- /dev/null
+++ b/libgomp/config/gcn/allocator.c
@@ -0,0 +1,129 @@
+/* Copyright (C) 2023 Free Software Foundation, Inc.
+
+ This file is part of the GNU Offloading and Multi Processing Library
+ (libgomp).
+
+ Libgomp is free software; you can redistribute it and/or modify it
+ under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 3, or (at your option)
+ any later version.
+
+ Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+ WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+ FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ more details.
+
+ Under Section 7 of GPL version 3, you are granted additional
+ permissions described in the GCC Runtime Library Exception, version
+ 3.1, as published by the Free Software Foundation.
+
+ You should have received a copy of the GNU General Public License and
+ a copy of the GCC Runtime Library Exception along with this program;
+ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
+ <http://www.gnu.org/licenses/>. */
+
+/* The low-latency allocators use space reserved in LDS memory when the
+ kernel is launched. The heap is initialized in gomp_gcn_enter_kernel and
+ all allocations are forgotten when the kernel exits. Allocations to other
+ memory spaces all use the system malloc syscall.
+
+ The pointers returned are 64-bit "Flat" addresses indistinguishable from
+ regular pointers, but only compatible with the "flat_load/store"
+ instructions. The compiler has been coded to assign default address
+ spaces accordingly.
+
+ LDS memory is not visible to other teams, and therefore may only be used
+ when the memspace access trait is set accordingly. */
+
+#include "libgomp.h"
+#include <stdlib.h>
+
+#define BASIC_ALLOC_PREFIX __gcn_lowlat
+#define BASIC_ALLOC_YIELD asm("s_sleep 1" ::: "memory")
+#include "../../basic-allocator.c"
+
+/* The low-latency heap is located in LDS memory, but we need the __flat
+ address space for compatibility reasons. */
+#define FLAT_HEAP_PTR \
+ ((void*)(uintptr_t)(void __flat*)(void __lds *)GCN_LOWLAT_HEAP)
+
+static void *
+gcn_memspace_alloc (omp_memspace_handle_t memspace, size_t size)
+{
+ if (memspace == omp_low_lat_mem_space)
+ {
+ char *shared_pool = FLAT_HEAP_PTR;
+
+ return __gcn_lowlat_alloc (shared_pool, size);
+ }
+ else if (memspace == ompx_host_mem_space)
+ return NULL;
+ else
+ return malloc (size);
+}
+
+static void *
+gcn_memspace_calloc (omp_memspace_handle_t memspace, size_t size)
+{
+ if (memspace == omp_low_lat_mem_space)
+ {
+ char *shared_pool = FLAT_HEAP_PTR;
+
+ return __gcn_lowlat_calloc (shared_pool, size);
+ }
+ else if (memspace == ompx_host_mem_space)
+ return NULL;
+ else
+ return calloc (1, size);
+}
+
+static void
+gcn_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
+{
+ if (memspace == omp_low_lat_mem_space)
+ {
+ char *shared_pool = FLAT_HEAP_PTR;
+
+ __gcn_lowlat_free (shared_pool, addr, size);
+ }
+ else
+ free (addr);
+}
+
+static void *
+gcn_memspace_realloc (omp_memspace_handle_t memspace, void *addr,
+ size_t oldsize, size_t size)
+{
+ if (memspace == omp_low_lat_mem_space)
+ {
+ char *shared_pool = FLAT_HEAP_PTR;
+
+ return __gcn_lowlat_realloc (shared_pool, addr, oldsize, size);
+ }
+ else if (memspace == ompx_host_mem_space)
+ return NULL;
+ else
+ return realloc (addr, size);
+}
+
+static inline int
+gcn_memspace_validate (omp_memspace_handle_t memspace, unsigned access)
+{
+ /* Disallow use of low-latency memory when it must be accessible by
+ all threads. */
+ return (memspace != omp_low_lat_mem_space
+ || access != omp_atv_all);
+}
+
+#define MEMSPACE_ALLOC(MEMSPACE, SIZE, PIN) \
+ gcn_memspace_alloc (MEMSPACE, SIZE)
+#define MEMSPACE_CALLOC(MEMSPACE, SIZE, PIN) \
+ gcn_memspace_calloc (MEMSPACE, SIZE)
+#define MEMSPACE_REALLOC(MEMSPACE, ADDR, OLDSIZE, SIZE, OLDPIN, PIN) \
+ gcn_memspace_realloc (MEMSPACE, ADDR, OLDSIZE, SIZE)
+#define MEMSPACE_FREE(MEMSPACE, ADDR, SIZE, PIN) \
+ gcn_memspace_free (MEMSPACE, ADDR, SIZE)
+#define MEMSPACE_VALIDATE(MEMSPACE, ACCESS) \
+ gcn_memspace_validate (MEMSPACE, ACCESS)
+
+#include "../../allocator.c"
diff --git a/libgomp/config/gcn/libgomp-gcn.h b/libgomp/config/gcn/libgomp-gcn.h
index 1521166baa3..3e8d7451453 100644
--- a/libgomp/config/gcn/libgomp-gcn.h
+++ b/libgomp/config/gcn/libgomp-gcn.h
@@ -33,6 +33,12 @@
#define DEFAULT_GCN_STACK_SIZE (32*1024)
#define DEFAULT_TEAM_ARENA_SIZE (64*1024)
+/* These define the LDS location of data needed by OpenMP. */
+#define TEAM_ARENA_START 16 /* LDS offset of free pointer. */
+#define TEAM_ARENA_FREE 24 /* LDS offset of free pointer. */
+#define TEAM_ARENA_END 32 /* LDS offset of end pointer. */
+#define GCN_LOWLAT_HEAP 40 /* LDS offset of the OpenMP low-latency heap. */
+
struct heap
{
int64_t size;
diff --git a/libgomp/config/gcn/team.c b/libgomp/config/gcn/team.c
index ffdc09b7f35..13641a4702c 100644
--- a/libgomp/config/gcn/team.c
+++ b/libgomp/config/gcn/team.c
@@ -29,6 +29,12 @@
#include <stdlib.h>
#include <string.h>
+#define LITTLEENDIAN_CPU
+#include "hsa.h"
+
+/* Defined in basic-allocator.c via config/amdgcn/allocator.c. */
+void __gcn_lowlat_init (void *heap, size_t size);
+
static void gomp_thread_start (struct gomp_thread_pool *);
/* This externally visible function handles target region entry. It
@@ -71,6 +77,12 @@ gomp_gcn_enter_kernel (void)
*arena_free = team_arena;
*arena_end = team_arena + kernargs->arena_size_per_team;
+ /* Initialize the low-latency heap. The header is the size. */
+ void __lds *lowlat = (void __lds *)GCN_LOWLAT_HEAP;
+ hsa_kernel_dispatch_packet_t *queue_ptr = __builtin_gcn_dispatch_ptr ();
+ __gcn_lowlat_init ((void*)(uintptr_t)(void __flat*)lowlat,
+ queue_ptr->group_segment_size - GCN_LOWLAT_HEAP);
+
/* Allocate and initialize the team-local-storage data. */
struct gomp_thread *thrs = team_malloc_cleared (sizeof (*thrs)
* numthreads);
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index a0af66e396b..d1e45cc584e 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -114,9 +114,6 @@ extern void gomp_aligned_free (void *);
#ifdef __AMDGCN__
#include "libgomp-gcn.h"
/* The arena is initialized in config/gcn/team.c. */
-#define TEAM_ARENA_START 16 /* LDS offset of free pointer. */
-#define TEAM_ARENA_FREE 24 /* LDS offset of free pointer. */
-#define TEAM_ARENA_END 32 /* LDS offset of end pointer. */
static inline void * __attribute__((malloc))
team_malloc (size_t size)
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 70a555a24a2..ca89ba658fd 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -563,6 +563,7 @@ static size_t gcn_kernel_heap_size = DEFAULT_GCN_HEAP_SIZE;
static int team_arena_size = DEFAULT_TEAM_ARENA_SIZE;
static int stack_size = DEFAULT_GCN_STACK_SIZE;
+static int lowlat_size = -1;
/* Flag to decide whether print to stderr information about what is going on.
Set in init_debug depending on environment variables. */
@@ -1047,8 +1048,8 @@ print_kernel_dispatch (struct kernel_dispatch *dispatch, unsigned indent)
fprintf (stderr, "%*sobject: %lu\n", indent, "", dispatch->object);
fprintf (stderr, "%*sprivate_segment_size: %u\n", indent, "",
dispatch->private_segment_size);
- fprintf (stderr, "%*sgroup_segment_size: %u\n", indent, "",
- dispatch->group_segment_size);
+ fprintf (stderr, "%*sgroup_segment_size: %u (low-latency pool)\n", indent,
+ "", dispatch->group_segment_size);
fprintf (stderr, "\n");
}
@@ -1119,6 +1120,10 @@ init_environment_variables (void)
if (tmp)
stack_size = tmp;;
}
+
+ const char *lowlat = secure_getenv ("GOMP_GCN_LOWLAT_POOL");
+ if (lowlat)
+ lowlat_size = atoi (lowlat);
}
/* Return malloc'd string with name of SYMBOL. */
@@ -1946,7 +1951,25 @@ create_kernel_dispatch (struct kernel_info *kernel, int num_teams,
shadow->signal = sync_signal.handle;
shadow->private_segment_size = kernel->private_segment_size;
- shadow->group_segment_size = kernel->group_segment_size;
+
+ if (lowlat_size < 0)
+ {
+ /* Divide the LDS between the number of running teams.
+ Allocate not less than is defined in the kernel metadata. */
+ int teams_per_cu = num_teams / get_cu_count (agent);
+ int LDS_per_team = (teams_per_cu ? 65536 / teams_per_cu : 65536);
+ shadow->group_segment_size
+ = (kernel->group_segment_size > LDS_per_team
+ ? kernel->group_segment_size
+ : LDS_per_team);;
+ }
+ else if (lowlat_size < GCN_LOWLAT_HEAP+8)
+ /* Ensure that there's space for the OpenMP libgomp data. */
+ shadow->group_segment_size = GCN_LOWLAT_HEAP+8;
+ else
+ shadow->group_segment_size = (lowlat_size > 65536
+ ? 65536
+ : lowlat_size);
/* We expect kernels to request a single pointer, explicitly, and the
rest of struct kernargs, implicitly. If they request anything else
@@ -2305,9 +2328,9 @@ run_kernel (struct kernel_info *kernel, void *vars,
print_kernel_dispatch (shadow, 2);
}
- packet->private_segment_size = kernel->private_segment_size;
- packet->group_segment_size = kernel->group_segment_size;
- packet->kernel_object = kernel->object;
+ packet->private_segment_size = shadow->private_segment_size;
+ packet->group_segment_size = shadow->group_segment_size;
+ packet->kernel_object = shadow->object;
packet->kernarg_address = shadow->kernarg_address;
hsa_signal_t s;
s.handle = shadow->signal;
diff --git a/libgomp/testsuite/libgomp.c/allocators-7.c b/libgomp/testsuite/libgomp.c/allocators-7.c
index a0a738b1d1d..5ef0c5cb3e3 100644
--- a/libgomp/testsuite/libgomp.c/allocators-7.c
+++ b/libgomp/testsuite/libgomp.c/allocators-7.c
@@ -1,7 +1,7 @@
/* { dg-do run } */
/* { dg-require-effective-target offload_device } */
-/* { dg-xfail-if "not implemented" { ! offload_target_nvptx } } */
+/* { dg-xfail-if "not implemented" { ! { offload_target_nvptx || offload_target_amdgcn } } } */
/* Test that GPU low-latency allocation is limited to team access. */
^ permalink raw reply [flat|nested] 4+ messages in thread
* [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)
2023-02-16 18:06 [OG12][committed] amdgcn: OpenMP low-latency allocator Andrew Stubbs
@ 2023-02-16 21:11 ` Thomas Schwinge
2023-02-20 9:34 ` Andrew Stubbs
2023-03-24 16:30 ` [OG12][committed] amdgcn: OpenMP low-latency allocator Thomas Schwinge
1 sibling, 1 reply; 4+ messages in thread
From: Thomas Schwinge @ 2023-02-16 21:11 UTC (permalink / raw)
To: Andrew Stubbs, gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1251 bytes --]
Hi!
On 2023-02-16T18:06:41+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> 1. 230216-basic-allocator.patch
>
> Separate the allocator from NVPTX so the code can be shared.
Yay!
> nvptx, libgomp: Move the low-latency allocator code
>
> There shouldn't be a functionality change; this is just so AMD can share
> the code.
I've quickly observed one "functionality" change:
> --- /dev/null
> +++ b/libgomp/basic-allocator.c
> +#ifndef BASIC_ALLOC_YIELD
> +#deine BASIC_ALLOC_YIELD
> +#endif
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
52 | #deine BASIC_ALLOC_YIELD
| ^~~~~
| define
Yes, indeed.
I've pushed to devel/omp/gcc-12 branch
commit 6cc0e7bebf1b3ad6aacf75419e7f06942409f90c
"Un-break nvptx libgomp build", see attached.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Un-break-nvptx-libgomp-build.patch --]
[-- Type: text/x-diff, Size: 1622 bytes --]
From 6cc0e7bebf1b3ad6aacf75419e7f06942409f90c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 16 Feb 2023 21:59:55 +0100
Subject: [PATCH] Un-break nvptx libgomp build
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
52 | #deine BASIC_ALLOC_YIELD
| ^~~~~
| define
Yes, indeed.
Fix-up for og12 commit 9583738a62a33a276b2aad980a27e77097f95924
"nvptx, libgomp: Move the low-latency allocator code".
libgomp/
* basic-allocator.c (BASIC_ALLOC_YIELD): instead of '#deine',
'#define' it.
---
libgomp/ChangeLog.omp | 3 +++
libgomp/basic-allocator.c | 2 +-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index ecc14b4f537..b667c72b8ca 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
2023-02-16 Thomas Schwinge <thomas@codesourcery.com>
+ * basic-allocator.c (BASIC_ALLOC_YIELD): instead of '#deine',
+ '#define' it.
+
* testsuite/libgomp.c/usm-1.c: Re-enable non-GCN offloading
compilation.
* testsuite/libgomp.c/usm-2.c: Likewise.
diff --git a/libgomp/basic-allocator.c b/libgomp/basic-allocator.c
index 94b99a89e0b..b4b9e4ba13a 100644
--- a/libgomp/basic-allocator.c
+++ b/libgomp/basic-allocator.c
@@ -49,7 +49,7 @@
#endif
#ifndef BASIC_ALLOC_YIELD
-#deine BASIC_ALLOC_YIELD
+#define BASIC_ALLOC_YIELD
#endif
#define ALIGN(VAR) (((VAR) + 7) & ~7) /* 8-byte granularity. */
--
2.25.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator)
2023-02-16 21:11 ` [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator) Thomas Schwinge
@ 2023-02-20 9:34 ` Andrew Stubbs
0 siblings, 0 replies; 4+ messages in thread
From: Andrew Stubbs @ 2023-02-20 9:34 UTC (permalink / raw)
To: Thomas Schwinge, gcc-patches
On 16/02/2023 21:11, Thomas Schwinge wrote:
>> --- /dev/null
>> +++ b/libgomp/basic-allocator.c
>
>> +#ifndef BASIC_ALLOC_YIELD
>> +#deine BASIC_ALLOC_YIELD
>> +#endif
>
> In file included from [...]/libgomp/config/nvptx/allocator.c:49:
> [...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
> 52 | #deine BASIC_ALLOC_YIELD
> | ^~~~~
> | define
>
> Yes, indeed.
>
> I've pushed to devel/omp/gcc-12 branch
> commit 6cc0e7bebf1b3ad6aacf75419e7f06942409f90c
> "Un-break nvptx libgomp build", see attached.
Oops, thanks Thomas.
Andrew
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [OG12][committed] amdgcn: OpenMP low-latency allocator
2023-02-16 18:06 [OG12][committed] amdgcn: OpenMP low-latency allocator Andrew Stubbs
2023-02-16 21:11 ` [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator) Thomas Schwinge
@ 2023-03-24 16:30 ` Thomas Schwinge
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Schwinge @ 2023-03-24 16:30 UTC (permalink / raw)
To: gcc-patches; +Cc: Andrew Stubbs
[-- Attachment #1: Type: text/plain, Size: 889 bytes --]
Hi!
On 2023-02-16T18:06:41+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> 2. 230216-amd-low-lat.patch
>
> Allocate the memory, adjust the default address space, and hook up the
> allocator.
Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'",
I've now pushed the corresponding GCN 'ompx_host_mem_space' thing to
devel/omp/gcc-12 branch in commit b39e4bbab59f5e4b551c44dbce0ce3acf4afc22a
"Miscellaneous clean-up re OpenMP 'ompx_host_mem_space'", see attached.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Miscellaneous-clean-up-re-OpenMP-ompx_host_mem_space.patch --]
[-- Type: text/x-diff, Size: 1883 bytes --]
From b39e4bbab59f5e4b551c44dbce0ce3acf4afc22a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 17 Feb 2023 14:13:15 +0100
Subject: [PATCH] Miscellaneous clean-up re OpenMP 'ompx_host_mem_space'
Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'".
Clean-up for og12 commit c77c45a641fedc3fe770e909cc010fb1735bdbbd
"amdgcn, libgomp: low-latency allocator". No functional change.
libgomp/
* config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
'memspace == ompx_host_mem_space'.
---
libgomp/ChangeLog.omp | 3 +++
libgomp/config/gcn/allocator.c | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/libgomp/ChangeLog.omp b/libgomp/ChangeLog.omp
index 63d1f563d5d..ef957e3d2d8 100644
--- a/libgomp/ChangeLog.omp
+++ b/libgomp/ChangeLog.omp
@@ -1,5 +1,8 @@
2023-03-24 Thomas Schwinge <thomas@codesourcery.com>
+ * config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
+ 'memspace == ompx_host_mem_space'.
+
Backported from master:
2023-03-24 Thomas Schwinge <thomas@codesourcery.com>
diff --git a/libgomp/config/gcn/allocator.c b/libgomp/config/gcn/allocator.c
index 001de89ffe0..e9980f6f98e 100644
--- a/libgomp/config/gcn/allocator.c
+++ b/libgomp/config/gcn/allocator.c
@@ -36,6 +36,7 @@
when the memspace access trait is set accordingly. */
#include "libgomp.h"
+#include <assert.h>
#include <stdlib.h>
#define BASIC_ALLOC_PREFIX __gcn_lowlat
@@ -86,6 +87,9 @@ gcn_memspace_free (omp_memspace_handle_t memspace, void *addr, size_t size)
__gcn_lowlat_free (shared_pool, addr, size);
}
+ else if (memspace == ompx_host_mem_space)
+ /* Just verify what all allocator functions return. */
+ assert (addr == NULL);
else
free (addr);
}
--
2.25.1
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-03-24 16:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-16 18:06 [OG12][committed] amdgcn: OpenMP low-latency allocator Andrew Stubbs
2023-02-16 21:11 ` [og12] Un-break nvptx libgomp build (was: [OG12][committed] amdgcn: OpenMP low-latency allocator) Thomas Schwinge
2023-02-20 9:34 ` Andrew Stubbs
2023-03-24 16:30 ` [OG12][committed] amdgcn: OpenMP low-latency allocator Thomas Schwinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).