* [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory
@ 2013-10-18 3:22 Yao Qi
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
2013-10-18 3:22 ` [PATCH 1/2] Separate read and write " Yao Qi
0 siblings, 2 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18 3:22 UTC (permalink / raw)
To: gdb-patches
Hi,
This patch series is to improve the dcache on reading from target memory.
The rationale of this patch series is to reduce the number of calling
target_read, and read as much as possible in one time. Details can be
found in patch 2/2.
Patch 1/2 simply splits the code on reading and writing in
dcache_xfer_memory, and patch 2/2 focuses on optimizing reading target
memory, because GDB reads a lot of data from target, but writes few.
The performance gain is measured and listed in patch 2/2. The whole
series is regression tested on x86_64-linux. Comments?
*** BLURB HERE ***
Yao Qi (2):
Separate read and write in dcache_xfer_memory
Read memory in multiple lines in dcache_xfer_memory.
gdb/dcache.c | 356 ++++++++++++++++++++++++++++++++++++++++++----------------
1 files changed, 260 insertions(+), 96 deletions(-)
--
1.7.7.6
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-18 3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
@ 2013-10-18 3:22 ` Yao Qi
2013-10-21 20:38 ` Doug Evans
` (2 more replies)
2013-10-18 3:22 ` [PATCH 1/2] Separate read and write " Yao Qi
1 sibling, 3 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18 3:22 UTC (permalink / raw)
To: gdb-patches
Hi, this is an optimization to dcache reading contents from target
memory. Nowadays, when GDB requests to read target memory and
requests go through dcache, dcache will read one cache line in on
time, regardless the size of the requested data. If GDB read a large
amount of data from target, dcache will read multiple times from
target memory (read in one cache line per time). In remote debugging,
it means multiple RSP packets to transfer memory from GDBserver, which
is slow.
This patch is to teach dcache to read continuous target memory as much
as possible in one time, and update the multiple cache lines when the
contents are read in. It can be done by several steps:
1. When GDB requests to read data [memaddr, memaddr + len), a
collection of ranges is created to record readable ranges, because
some memory may be marked as write-only.
2. Then, we'll check the cache state of these readable ranges. Some
of them are cached, and some are not. We record the uncached ranges.
3. Iterate the collection of uncached ranges, and issue target_read
to read these uncached ranges from the target memory and update cache
lines. For cached ranges, read from cache lines directly.
I am using a perf test case backtrace to measure the speed-up of this
patch. Every time, 'set dcache line-size N' and
'set dcache size 4096 * 64 / N', to make sure the total size of dcache
is unchanged.
With this patch, the number of 'm' RSP packet is reduced
dramatically:
cache line size: Original Patched
2 4657894 31224
4 2317896 28616
8 158948 21462
16 579474 14308
32 293314 14308
64 150234 14308
128 78694 10738
256 42938 8960
512 25046 8064
1024 16100 7616
2048 9184 7392
Performance comparison:
cache line size Patched Original
backtrace cpu_time 2 4.44 33.83
backtrace cpu_time 4 3.88 14.27
backtrace cpu_time 8 3.1 7.92
backtrace cpu_time 16 2.48 4.79
backtrace cpu_time 32 2.25 2.51
backtrace cpu_time 64 1.16 1.93
backtrace cpu_time 128 1.02 1.69
backtrace cpu_time 256 1.06 1.37
backtrace cpu_time 512 1.11 1.17
backtrace cpu_time 1024 1.1 1.22
backtrace cpu_time 2048 1.13 1.17
backtrace wall_time 2 5.49653506279 74.0839848518
backtrace wall_time 4 4.70916986465 29.94830513
backtrace wall_time 8 4.11279582977 15.6743021011
backtrace wall_time 16 3.68633985519 8.83114910126
backtrace wall_time 32 3.63511800766 5.79059791565
backtrace wall_time 64 1.61371517181 3.67003703117
backtrace wall_time 128 1.50599694252 2.60381913185
backtrace wall_time 256 1.47533297539 2.05611109734
backtrace wall_time 512 1.48193001747 1.80505800247
backtrace wall_time 1024 1.50955080986 1.69646501541
backtrace wall_time 2048 1.54235315323 1.61461496353
backtrace vmsize 2 104568 104576
backtrace vmsize 4 100556 102388
backtrace vmsize 8 95384 97540
backtrace vmsize 16 94092 94092
backtrace vmsize 32 93348 93276
backtrace vmsize 64 93148 92928
backtrace vmsize 128 93148 93100
backtrace vmsize 256 93148 93100
backtrace vmsize 512 93148 93100
backtrace vmsize 1024 93148 93100
backtrace vmsize 2048 93148 93100
gdb:
2013-10-18 Yao Qi <yao@codesourcery.com>
* dcache.c: Include "memrange.h".
Update comments.
(dcache_read_line): Remove.
(dcache_peek_byte): Remove.
(dcache_ranges_readable): New function.
(dcache_ranges_uncached): New function.
(dcache_xfer_memory): Read multiple cache lines from target
memory in one time.
---
gdb/dcache.c | 331 ++++++++++++++++++++++++++++++++++++++++++---------------
1 files changed, 244 insertions(+), 87 deletions(-)
diff --git a/gdb/dcache.c b/gdb/dcache.c
index 316f3dd..65bbad1 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -25,6 +25,7 @@
#include "target.h"
#include "inferior.h"
#include "splay-tree.h"
+#include "memrange.h"
/* Commands with a prefix of `{set,show} dcache'. */
static struct cmd_list_element *dcache_set_list = NULL;
@@ -60,8 +61,8 @@ static struct cmd_list_element *dcache_show_list = NULL;
/* NOTE: Interaction of dcache and memory region attributes
As there is no requirement that memory region attributes be aligned
- to or be a multiple of the dcache page size, dcache_read_line() and
- dcache_write_line() must break up the page by memory region. If a
+ to or be a multiple of the dcache page size, dcache_xfer_memory must
+ break up the page by memory region. If a
chunk does not have the cache attribute set, an invalid memory type
is set, etc., then the chunk is skipped. Those chunks are handled
in target_xfer_memory() (or target_xfer_memory_partial()).
@@ -122,8 +123,6 @@ typedef void (block_func) (struct dcache_block *block, void *param);
static struct dcache_block *dcache_hit (DCACHE *dcache, CORE_ADDR addr);
-static int dcache_read_line (DCACHE *dcache, struct dcache_block *db);
-
static struct dcache_block *dcache_alloc (DCACHE *dcache, CORE_ADDR addr);
static void dcache_info (char *exp, int tty);
@@ -305,56 +304,6 @@ dcache_hit (DCACHE *dcache, CORE_ADDR addr)
return db;
}
-/* Fill a cache line from target memory.
- The result is 1 for success, 0 if the (entire) cache line
- wasn't readable. */
-
-static int
-dcache_read_line (DCACHE *dcache, struct dcache_block *db)
-{
- CORE_ADDR memaddr;
- gdb_byte *myaddr;
- int len;
- int res;
- int reg_len;
- struct mem_region *region;
-
- len = dcache->line_size;
- memaddr = db->addr;
- myaddr = db->data;
-
- while (len > 0)
- {
- /* Don't overrun if this block is right at the end of the region. */
- region = lookup_mem_region (memaddr);
- if (region->hi == 0 || memaddr + len < region->hi)
- reg_len = len;
- else
- reg_len = region->hi - memaddr;
-
- /* Skip non-readable regions. The cache attribute can be ignored,
- since we may be loading this for a stack access. */
- if (region->attrib.mode == MEM_WO)
- {
- memaddr += reg_len;
- myaddr += reg_len;
- len -= reg_len;
- continue;
- }
-
- res = target_read (¤t_target, TARGET_OBJECT_RAW_MEMORY,
- NULL, myaddr, memaddr, reg_len);
- if (res < reg_len)
- return 0;
-
- memaddr += res;
- myaddr += res;
- len -= res;
- }
-
- return 1;
-}
-
/* Get a free cache block, put or keep it on the valid list,
and return its address. */
@@ -395,28 +344,6 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
return db;
}
-/* Using the data cache DCACHE, store in *PTR the contents of the byte at
- address ADDR in the remote machine.
-
- Returns 1 for success, 0 for error. */
-
-static int
-dcache_peek_byte (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr)
-{
- struct dcache_block *db = dcache_hit (dcache, addr);
-
- if (!db)
- {
- db = dcache_alloc (dcache, addr);
-
- if (!dcache_read_line (dcache, db))
- return 0;
- }
-
- *ptr = db->data[XFORM (dcache, addr)];
- return 1;
-}
-
/* Write the byte at PTR into ADDR in the data cache.
The caller is responsible for also promptly writing the data
@@ -473,6 +400,105 @@ dcache_init (void)
return dcache;
}
+/* Check the readability of memory range [MEMORY, MEMORY + LEN) and
+ return the readable ranges and caller is responsible to release it. */
+
+static VEC(mem_range_s) *
+dcache_ranges_readable (CORE_ADDR memaddr, int len)
+{
+ VEC(mem_range_s) *readable_memory = NULL;
+
+ while (len > 0)
+ {
+ struct mem_range *r;
+ int reg_len;
+ /* Don't overrun if this block is right at the end of the region. */
+ struct mem_region *region = lookup_mem_region (memaddr);
+
+ if (region->hi == 0 || memaddr + len < region->hi)
+ reg_len = len;
+ else
+ reg_len = region->hi - memaddr;
+
+ /* Skip non-readable regions. The cache attribute can be ignored,
+ since we may be loading this for a stack access. */
+ if (region->attrib.mode == MEM_WO)
+ {
+ memaddr += reg_len;
+ len -= reg_len;
+ continue;
+ }
+
+ r = VEC_safe_push (mem_range_s, readable_memory, NULL);
+ r->start = memaddr;
+ r->length = reg_len;
+
+ memaddr += reg_len;
+ len -= reg_len;
+ }
+
+ return readable_memory;
+}
+
+/* Return the uncached ranges from RANGES. */
+
+static VEC(mem_range_s) *
+dcache_ranges_uncached (DCACHE *dcache, VEC(mem_range_s) *ranges)
+{
+ int b;
+ struct mem_range *rb;
+ VEC(mem_range_s) *uncached = NULL;
+
+ for (b = 0; VEC_iterate (mem_range_s, ranges, b, rb); b++)
+ {
+ CORE_ADDR memaddr_start = rb->start;
+ CORE_ADDR memaddr_end = rb->start;
+
+ while (memaddr_end < rb->start + rb->length)
+ {
+ struct dcache_block *db = dcache_hit (dcache, memaddr_end);
+
+ if (db != NULL)
+ {
+ /* Set MEMADDR_END to the start address of this cache line. */
+ memaddr_end = align_down (memaddr_end, dcache->line_size);
+
+ if (memaddr_end > memaddr_start)
+ {
+ struct mem_range *r;
+
+ r = VEC_safe_push (mem_range_s, uncached, NULL);
+ r->start = memaddr_start;
+ r->length = memaddr_end - memaddr_start;
+ }
+ }
+
+ /* Increase memaddr_end to a dcache->line_size-aligned value. */
+ if (memaddr_end < align_up (memaddr_end, dcache->line_size))
+ memaddr_end = align_up (memaddr_end, dcache->line_size);
+ else
+ memaddr_end += dcache->line_size;
+
+ if (db != NULL)
+ memaddr_start = memaddr_end;
+ }
+
+ if (memaddr_end > rb->start + rb->length)
+ memaddr_end = rb->start + rb->length;
+
+ if (memaddr_start < memaddr_end)
+ {
+ struct mem_range *r;
+
+ r = VEC_safe_push (mem_range_s, uncached, NULL);
+
+ r->start = memaddr_start;
+ r->length = memaddr_end - memaddr_start;
+ }
+ }
+
+ return uncached;
+}
/* Read or write LEN bytes from inferior memory at MEMADDR, transferring
to or from debugger address MYADDR. Write to inferior if SHOULD_WRITE is
@@ -489,9 +515,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
CORE_ADDR memaddr, gdb_byte *myaddr,
int len, int should_write)
{
- int i;
- int res;
-
/* If this is a different inferior from what we've recorded,
flush the cache. */
@@ -506,8 +529,10 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
if (should_write)
{
- res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
- NULL, myaddr, memaddr, len);
+ int res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
+ NULL, myaddr, memaddr, len);
+ int i;
+
if (res <= 0)
return res;
/* Update LEN to what was actually written. */
@@ -527,16 +552,148 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
}
else
{
- for (i = 0; i < len; i++)
+ int i;
+ struct mem_range *r;
+ /* The starting address of each cached range. */
+ CORE_ADDR cached_addr = memaddr;
+
+ VEC(mem_range_s) *memory;
+ VEC(mem_range_s) *uncached = NULL;
+
+ /* Find readable ranges in range [MEMADDR, MEMADDR + LEN),
+ supposing write-only regions are wo1 and wo2. Then,
+ readable ranges are r1, r2 and r3.
+
+ MEMADDR MEMADDR + LEN
+ |<------------------------------------------------->|
+ |<-- wo1 -->| |<-- wo2 -->|
+
+ |<-r1->| |<--r2-->| |<---r3---->| */
+ memory = dcache_ranges_readable (memaddr, len);
+
+ /* GDB will read from these three readable ranges, r1, r2 and r3.
+ GDB has to check the corresponding cache lines' state (cached
+ or uncached) to determine whether to read from the target
+ memory or the cache lines.
+
+ MEMADDR MEMADDR + LEN
+ |<------------------------------------------------->|
+ |<-- wo1 -->| |<-- wo2 -->|
+
+ |<-r1->| |<--r2-->| |<---r3---->|
+
+ -u-|-----c----|-----u----|-----c----|-----c----|--u--
+ 'u' stands for unchaced 'c' stands for cached.
+
+ |u1|-c1-| | u2 |c2| |--c3--| u3 |
+
+ Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+ c2 and c3. */
+ uncached = dcache_ranges_uncached (dcache, memory);
+
+ VEC_free (mem_range_s, memory);
+
+ /* Iterate each uncached range. Read memory from cache lines if
+ memory address is not within the uncached range, otherwise, read
+ from the target memory and update corresponding cache lines. */
+
+ for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
{
- if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+ int j;
+
+ if (cached_addr < r->start)
{
- /* That failed. Discard its cache line so we don't have a
- partially read line. */
- dcache_invalidate_line (dcache, memaddr + i);
- return i;
+ /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
+ from cache lines. */
+
+ for (; cached_addr < r->start && cached_addr < (memaddr + len);
+ cached_addr++)
+ {
+ struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+ gdb_assert (db != NULL);
+
+ myaddr[cached_addr - memaddr]
+ = db->data[XFORM (dcache, cached_addr)];
+ }
+ }
+ cached_addr = r->start + r->length;
+
+ /* Part of the memory range [MEMADDR, MEMADDR + LEN) is
+ not cached. */
+ if (r->start < len + memaddr)
+ {
+ /* MEMADDR_START and MEMADDR_END are aligned on
+ dcache->line_size, because dcache->line_size is the
+ minimal unit to update cache and fetch from the target
+ memory. */
+ CORE_ADDR memaddr_start
+ = align_down (r->start, dcache->line_size);
+ CORE_ADDR memaddr_end
+ = align_up (r->start + r->length, dcache->line_size);
+ int res;
+ int len1 = memaddr_end - memaddr_start;
+ int len2;
+ gdb_byte *buf = xmalloc (len1);
+
+ /* Read multiple cache lines to cover memory range
+ [r->start, r->start + MIN (r->length,
+ LEN + MEMADDR - r->start)) from target. */
+
+ res = target_read (¤t_target, TARGET_OBJECT_RAW_MEMORY,
+ NULL, buf, memaddr_start, len1);
+
+ if (res == -1)
+ {
+ VEC_free (mem_range_s, uncached);
+ xfree (buf);
+ return r->start - memaddr;
+ }
+
+ /* Copy contents to MYADDR. */
+ len2 = r->length;
+ if (len2 > len + memaddr - r->start)
+ len2 = len + memaddr - r->start;
+
+ memcpy ((r->start - memaddr) + myaddr,
+ buf + (r->start - memaddr_start),
+ len2);
+
+ /* Update cache lines in range
+ [MEMADDR_START, MEMADDR_START + LEN1). */
+ for (j = 0; j < (len1 / dcache->line_size); j++)
+ {
+ struct dcache_block *db
+ = dcache_hit (dcache, memaddr_start + j * dcache->line_size);
+
+ gdb_assert (db == NULL);
+
+ db = dcache_alloc (dcache, memaddr_start + j * dcache->line_size);
+
+ memcpy (db->data, &buf[j * dcache->line_size], dcache->line_size);
+ }
+
+ xfree (buf);
+
+ if (res < len1)
+ {
+ VEC_free (mem_range_s, uncached);
+ return r->start - memaddr + res;
+ }
}
}
+
+ VEC_free (mem_range_s, uncached);
+
+ for (; cached_addr < (memaddr + len); cached_addr++)
+ {
+ struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+ gdb_assert (db != NULL);
+
+ myaddr[cached_addr - memaddr]
+ = db->data[XFORM (dcache, cached_addr)];
+ }
}
return len;
--
1.7.7.6
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] Separate read and write in dcache_xfer_memory
2013-10-18 3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
@ 2013-10-18 3:22 ` Yao Qi
1 sibling, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-18 3:22 UTC (permalink / raw)
To: gdb-patches
This patch is to refactor dcache_xfer_memory to separate the code
reading and wring target memory.
gdb:
2013-10-18 Yao Qi <yao@codesourcery.com>
* dcache.c (dcache_xfer_memory): Separate memory read and
write.
---
gdb/dcache.c | 37 ++++++++++++++++++++++---------------
1 files changed, 22 insertions(+), 15 deletions(-)
diff --git a/gdb/dcache.c b/gdb/dcache.c
index acb9de4..316f3dd 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -491,9 +491,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
{
int i;
int res;
- int (*xfunc) (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr);
-
- xfunc = should_write ? dcache_poke_byte : dcache_peek_byte;
/* If this is a different inferior from what we've recorded,
flush the cache. */
@@ -515,23 +512,33 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
return res;
/* Update LEN to what was actually written. */
len = res;
+
+ for (i = 0; i < len; i++)
+ {
+ if (!dcache_poke_byte (dcache, memaddr + i, myaddr + i))
+ {
+ /* That failed. Discard its cache line so we don't have a
+ partially read line. */
+ dcache_invalidate_line (dcache, memaddr + i);
+ /* We still wrote LEN bytes. */
+ return len;
+ }
+ }
}
-
- for (i = 0; i < len; i++)
+ else
{
- if (!xfunc (dcache, memaddr + i, myaddr + i))
+ for (i = 0; i < len; i++)
{
- /* That failed. Discard its cache line so we don't have a
- partially read line. */
- dcache_invalidate_line (dcache, memaddr + i);
- /* If we're writing, we still wrote LEN bytes. */
- if (should_write)
- return len;
- else
- return i;
+ if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+ {
+ /* That failed. Discard its cache line so we don't have a
+ partially read line. */
+ dcache_invalidate_line (dcache, memaddr + i);
+ return i;
+ }
}
}
-
+
return len;
}
--
1.7.7.6
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
@ 2013-10-21 20:38 ` Doug Evans
2013-10-22 0:50 ` Yao Qi
2013-10-21 23:16 ` Doug Evans
2013-11-18 18:17 ` Pedro Alves
2 siblings, 1 reply; 11+ messages in thread
From: Doug Evans @ 2013-10-21 20:38 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
Yao Qi writes:
> Hi, this is an optimization to dcache reading contents from target
> memory. Nowadays, when GDB requests to read target memory and
> requests go through dcache, dcache will read one cache line in on
> time, regardless the size of the requested data. If GDB read a large
> amount of data from target, dcache will read multiple times from
> target memory (read in one cache line per time). In remote debugging,
> it means multiple RSP packets to transfer memory from GDBserver, which
> is slow.
>
> This patch is to teach dcache to read continuous target memory as much
> as possible in one time, and update the multiple cache lines when the
> contents are read in. It can be done by several steps:
>
> 1. When GDB requests to read data [memaddr, memaddr + len), a
> collection of ranges is created to record readable ranges, because
> some memory may be marked as write-only.
> 2. Then, we'll check the cache state of these readable ranges. Some
> of them are cached, and some are not. We record the uncached ranges.
> 3. Iterate the collection of uncached ranges, and issue target_read
> to read these uncached ranges from the target memory and update cache
> lines. For cached ranges, read from cache lines directly.
>
> I am using a perf test case backtrace to measure the speed-up of this
> patch. Every time, 'set dcache line-size N' and
> 'set dcache size 4096 * 64 / N', to make sure the total size of dcache
> is unchanged.
>
> With this patch, the number of 'm' RSP packet is reduced
> dramatically:
>
> cache line size: Original Patched
> 2 4657894 31224
> 4 2317896 28616
> 8 158948 21462
> 16 579474 14308
> 32 293314 14308
> 64 150234 14308
> 128 78694 10738
> 256 42938 8960
> 512 25046 8064
> 1024 16100 7616
> 2048 9184 7392
>
> Performance comparison:
>
> cache line size Patched Original
> backtrace cpu_time 2 4.44 33.83
> backtrace cpu_time 4 3.88 14.27
> backtrace cpu_time 8 3.1 7.92
> backtrace cpu_time 16 2.48 4.79
> backtrace cpu_time 32 2.25 2.51
> backtrace cpu_time 64 1.16 1.93
> backtrace cpu_time 128 1.02 1.69
> backtrace cpu_time 256 1.06 1.37
> backtrace cpu_time 512 1.11 1.17
> backtrace cpu_time 1024 1.1 1.22
> backtrace cpu_time 2048 1.13 1.17
> backtrace wall_time 2 5.49653506279 74.0839848518
> backtrace wall_time 4 4.70916986465 29.94830513
> backtrace wall_time 8 4.11279582977 15.6743021011
> backtrace wall_time 16 3.68633985519 8.83114910126
> backtrace wall_time 32 3.63511800766 5.79059791565
> backtrace wall_time 64 1.61371517181 3.67003703117
> backtrace wall_time 128 1.50599694252 2.60381913185
> backtrace wall_time 256 1.47533297539 2.05611109734
> backtrace wall_time 512 1.48193001747 1.80505800247
> backtrace wall_time 1024 1.50955080986 1.69646501541
> backtrace wall_time 2048 1.54235315323 1.61461496353
> backtrace vmsize 2 104568 104576
> backtrace vmsize 4 100556 102388
> backtrace vmsize 8 95384 97540
> backtrace vmsize 16 94092 94092
> backtrace vmsize 32 93348 93276
> backtrace vmsize 64 93148 92928
> backtrace vmsize 128 93148 93100
> backtrace vmsize 256 93148 93100
> backtrace vmsize 512 93148 93100
> backtrace vmsize 1024 93148 93100
> backtrace vmsize 2048 93148 93100
>
> gdb:
>
> 2013-10-18 Yao Qi <yao@codesourcery.com>
>
> * dcache.c: Include "memrange.h".
> Update comments.
> (dcache_read_line): Remove.
> (dcache_peek_byte): Remove.
> (dcache_ranges_readable): New function.
> (dcache_ranges_uncached): New function.
> (dcache_xfer_memory): Read multiple cache lines from target
> memory in one time.
Hi.
While I understand the improvement is nice, it does add complexity to the code.
[I'm not suggesting this patch isn't justified.
But I think we do need to weigh complexity vs actual speed-ups, and
I'm in the "Need more data." state at the moment.]
The data for cache line sizes less than the default (64 bytes)
is nice but not compelling.
E.g., What if we just increase the cache line size to 256, say?
Or 1024? And/or maybe auto-adjust it based on
"remote memory-read-packet-size" or some such?
[I'm not arguing for anything in particular.
Again, I'm just collecting data.]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
2013-10-21 20:38 ` Doug Evans
@ 2013-10-21 23:16 ` Doug Evans
2013-10-22 12:27 ` Yao Qi
2013-11-18 18:17 ` Pedro Alves
2 siblings, 1 reply; 11+ messages in thread
From: Doug Evans @ 2013-10-21 23:16 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
Yao Qi writes:
> [...]
> 2013-10-18 Yao Qi <yao@codesourcery.com>
>
> * dcache.c: Include "memrange.h".
> Update comments.
> (dcache_read_line): Remove.
> (dcache_peek_byte): Remove.
> (dcache_ranges_readable): New function.
> (dcache_ranges_uncached): New function.
> (dcache_xfer_memory): Read multiple cache lines from target
> memory in one time.
> [...]
> + for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
> {
> - if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
> + int j;
> +
> + if (cached_addr < r->start)
> {
> - /* That failed. Discard its cache line so we don't have a
> - partially read line. */
> - dcache_invalidate_line (dcache, memaddr + i);
> - return i;
> + /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
> + from cache lines. */
> +
> + for (; cached_addr < r->start && cached_addr < (memaddr + len);
> + cached_addr++)
> + {
> + struct dcache_block *db = dcache_hit (dcache, cached_addr);
> +
> + gdb_assert (db != NULL);
Hi.
Going through the code ...
This assert will trigger if the request ends up filling the cache
and existing lines get pushed out, right?
[There's a similar assert later in the function too.]
Could be missing something of course.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-21 20:38 ` Doug Evans
@ 2013-10-22 0:50 ` Yao Qi
2013-10-25 17:08 ` Doug Evans
0 siblings, 1 reply; 11+ messages in thread
From: Yao Qi @ 2013-10-22 0:50 UTC (permalink / raw)
To: Doug Evans; +Cc: gdb-patches
On 10/22/2013 04:38 AM, Doug Evans wrote:
> While I understand the improvement is nice, it does add complexity to the code.
> [I'm not suggesting this patch isn't justified.
> But I think we do need to weigh complexity vs actual speed-ups, and
Agreed. The code is more complex than I expected. It takes me two days
to fix a bug.
> I'm in the "Need more data." state at the moment.]
>
> The data for cache line sizes less than the default (64 bytes)
> is nice but not compelling.
> E.g., What if we just increase the cache line size to 256, say?
> Or 1024? And/or maybe auto-adjust it based on
In this case, it is useful to increase the cache line size. Increase
the cache line size to 256 can save 0.1 (1.16 -> 1.06) in my experiment,
which is a good improvement.
When people are doing remote debugging, and find it is slow. The reason
of slowness is usually unclear, so it may not be useful advise them to
increase data cache line size.
> "remote memory-read-packet-size" or some such?
"remote memory-read-packet-size" is not useful in this case, I am
afraid. GDB reads one data cache line by a single 'm' packet, and the
size (data cache line size) is less than the memory read packet size.
Changing "remote memory-read-packet-size" doesn't have effects.
--
Yao (é½å°§)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-21 23:16 ` Doug Evans
@ 2013-10-22 12:27 ` Yao Qi
0 siblings, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-10-22 12:27 UTC (permalink / raw)
To: Doug Evans; +Cc: gdb-patches
On 10/22/2013 07:15 AM, Doug Evans wrote:
> This assert will trigger if the request ends up filling the cache
> and existing lines get pushed out, right?
> [There's a similar assert later in the function too.]
> Could be missing something of course.
I can reproduce it by setting "dcache size" to a small number, 2, for
example, and type commands "up" and "down".
In dcache_xfer_memory, we assume that the cache lines in DCACHE are not
pushed out. IOW, new cache lines can be added but existing lines can't
be pushed out. In the updated patch, when adding a new cache line, if
DCACHE is not full, add it, otherwise, save the line in
POSTPONE_INSERT. When the loop is finished, move lines from
POST_INSERT to DCACHE. In this way, we don't break the order of
evicting cache lines.
Regression tested on x86_64-linux. Performance shouldn't be affected.
--
Yao (é½å°§)
gdb:
2013-10-22 Yao Qi <yao@codesourcery.com>
* dcache.c: Include "memrange.h".
Update comments.
(dcache_is_full): New function.
(dcache_read_line): Remove.
(dcache_peek_byte): Remove.
(dcache_ranges_readable): New function.
(dcache_ranges_uncached): New function.
(copy_blokc): New function.
(dcache_xfer_memory): Read multiple cache lines from target
memory in one time.
---
gdb/dcache.c | 375 +++++++++++++++++++++++++++++++++++++++++++++-------------
1 files changed, 292 insertions(+), 83 deletions(-)
diff --git a/gdb/dcache.c b/gdb/dcache.c
index 316f3dd..a089b46 100644
--- a/gdb/dcache.c
+++ b/gdb/dcache.c
@@ -25,6 +25,7 @@
#include "target.h"
#include "inferior.h"
#include "splay-tree.h"
+#include "memrange.h"
/* Commands with a prefix of `{set,show} dcache'. */
static struct cmd_list_element *dcache_set_list = NULL;
@@ -60,8 +61,8 @@ static struct cmd_list_element *dcache_show_list = NULL;
/* NOTE: Interaction of dcache and memory region attributes
As there is no requirement that memory region attributes be aligned
- to or be a multiple of the dcache page size, dcache_read_line() and
- dcache_write_line() must break up the page by memory region. If a
+ to or be a multiple of the dcache page size, dcache_xfer_memory must
+ break up the page by memory region. If a
chunk does not have the cache attribute set, an invalid memory type
is set, etc., then the chunk is skipped. Those chunks are handled
in target_xfer_memory() (or target_xfer_memory_partial()).
@@ -122,8 +123,6 @@ typedef void (block_func) (struct dcache_block *block, void *param);
static struct dcache_block *dcache_hit (DCACHE *dcache, CORE_ADDR addr);
-static int dcache_read_line (DCACHE *dcache, struct dcache_block *db);
-
static struct dcache_block *dcache_alloc (DCACHE *dcache, CORE_ADDR addr);
static void dcache_info (char *exp, int tty);
@@ -305,54 +304,12 @@ dcache_hit (DCACHE *dcache, CORE_ADDR addr)
return db;
}
-/* Fill a cache line from target memory.
- The result is 1 for success, 0 if the (entire) cache line
- wasn't readable. */
+/* Return true if DCACHE is full. */
static int
-dcache_read_line (DCACHE *dcache, struct dcache_block *db)
+dcache_is_full (DCACHE *dcache)
{
- CORE_ADDR memaddr;
- gdb_byte *myaddr;
- int len;
- int res;
- int reg_len;
- struct mem_region *region;
-
- len = dcache->line_size;
- memaddr = db->addr;
- myaddr = db->data;
-
- while (len > 0)
- {
- /* Don't overrun if this block is right at the end of the region. */
- region = lookup_mem_region (memaddr);
- if (region->hi == 0 || memaddr + len < region->hi)
- reg_len = len;
- else
- reg_len = region->hi - memaddr;
-
- /* Skip non-readable regions. The cache attribute can be ignored,
- since we may be loading this for a stack access. */
- if (region->attrib.mode == MEM_WO)
- {
- memaddr += reg_len;
- myaddr += reg_len;
- len -= reg_len;
- continue;
- }
-
- res = target_read (¤t_target, TARGET_OBJECT_RAW_MEMORY,
- NULL, myaddr, memaddr, reg_len);
- if (res < reg_len)
- return 0;
-
- memaddr += res;
- myaddr += res;
- len -= res;
- }
-
- return 1;
+ return dcache->size >= dcache_size;
}
/* Get a free cache block, put or keep it on the valid list,
@@ -363,7 +320,7 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
{
struct dcache_block *db;
- if (dcache->size >= dcache_size)
+ if (dcache_is_full (dcache))
{
/* Evict the least recently allocated line. */
db = dcache->oldest;
@@ -395,28 +352,6 @@ dcache_alloc (DCACHE *dcache, CORE_ADDR addr)
return db;
}
-/* Using the data cache DCACHE, store in *PTR the contents of the byte at
- address ADDR in the remote machine.
-
- Returns 1 for success, 0 for error. */
-
-static int
-dcache_peek_byte (DCACHE *dcache, CORE_ADDR addr, gdb_byte *ptr)
-{
- struct dcache_block *db = dcache_hit (dcache, addr);
-
- if (!db)
- {
- db = dcache_alloc (dcache, addr);
-
- if (!dcache_read_line (dcache, db))
- return 0;
- }
-
- *ptr = db->data[XFORM (dcache, addr)];
- return 1;
-}
-
/* Write the byte at PTR into ADDR in the data cache.
The caller is responsible for also promptly writing the data
@@ -473,6 +408,122 @@ dcache_init (void)
return dcache;
}
+/* Check the readability of memory range [MEMORY, MEMORY + LEN) and
+ return the readable ranges and caller is responsible to release it. */
+
+static VEC(mem_range_s) *
+dcache_ranges_readable (CORE_ADDR memaddr, int len)
+{
+ VEC(mem_range_s) *readable_memory = NULL;
+
+ while (len > 0)
+ {
+ struct mem_range *r;
+ int reg_len;
+ /* Don't overrun if this block is right at the end of the region. */
+ struct mem_region *region = lookup_mem_region (memaddr);
+
+ if (region->hi == 0 || memaddr + len < region->hi)
+ reg_len = len;
+ else
+ reg_len = region->hi - memaddr;
+
+ /* Skip non-readable regions. The cache attribute can be ignored,
+ since we may be loading this for a stack access. */
+ if (region->attrib.mode == MEM_WO)
+ {
+ memaddr += reg_len;
+ len -= reg_len;
+ continue;
+ }
+
+ r = VEC_safe_push (mem_range_s, readable_memory, NULL);
+ r->start = memaddr;
+ r->length = reg_len;
+
+ memaddr += reg_len;
+ len -= reg_len;
+ }
+
+ return readable_memory;
+}
+
+/* Return the uncached ranges from RANGES. */
+
+static VEC(mem_range_s) *
+dcache_ranges_uncached (DCACHE *dcache, VEC(mem_range_s) *ranges)
+{
+ int b;
+ struct mem_range *rb;
+ VEC(mem_range_s) *uncached = NULL;
+
+ for (b = 0; VEC_iterate (mem_range_s, ranges, b, rb); b++)
+ {
+ CORE_ADDR memaddr_start = rb->start;
+ CORE_ADDR memaddr_end = rb->start;
+
+ while (memaddr_end < rb->start + rb->length)
+ {
+ struct dcache_block *db = dcache_hit (dcache, memaddr_end);
+
+ if (db != NULL)
+ {
+ /* Set MEMADDR_END to the start address of this cache line. */
+ memaddr_end = align_down (memaddr_end, dcache->line_size);
+
+ if (memaddr_end > memaddr_start)
+ {
+ struct mem_range *r;
+
+ r = VEC_safe_push (mem_range_s, uncached, NULL);
+ r->start = memaddr_start;
+ r->length = memaddr_end - memaddr_start;
+ }
+ }
+
+ /* Increase memaddr_end to a dcache->line_size-aligned value. */
+ if (memaddr_end < align_up (memaddr_end, dcache->line_size))
+ memaddr_end = align_up (memaddr_end, dcache->line_size);
+ else
+ memaddr_end += dcache->line_size;
+
+ if (db != NULL)
+ memaddr_start = memaddr_end;
+ }
+
+ if (memaddr_end > rb->start + rb->length)
+ memaddr_end = rb->start + rb->length;
+
+ if (memaddr_start < memaddr_end)
+ {
+ struct mem_range *r;
+
+ r = VEC_safe_push (mem_range_s, uncached, NULL);
+
+ r->start = memaddr_start;
+ r->length = memaddr_end - memaddr_start;
+ }
+ }
+
+ return uncached;
+}
+
+/* Helper function to copy BLOCK to dcache represented by PARAM. */
+
+static void
+copy_block (struct dcache_block *block, void *param)
+{
+ DCACHE *dcache = param;
+ struct dcache_block *db = dcache_hit (dcache, block->addr);
+
+ /* This function is only used when DCACHE is full and BLOCK doesn't
+ exist in DCACHE. */
+ gdb_assert (dcache_is_full (dcache));
+ gdb_assert (db == NULL);
+
+ db = dcache_alloc (dcache, block->addr);
+ memcpy (db->data, block->data, dcache->line_size);
+}
/* Read or write LEN bytes from inferior memory at MEMADDR, transferring
to or from debugger address MYADDR. Write to inferior if SHOULD_WRITE is
@@ -489,9 +540,6 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
CORE_ADDR memaddr, gdb_byte *myaddr,
int len, int should_write)
{
- int i;
- int res;
-
/* If this is a different inferior from what we've recorded,
flush the cache. */
@@ -506,8 +554,10 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
if (should_write)
{
- res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
- NULL, myaddr, memaddr, len);
+ int res = target_write (ops, TARGET_OBJECT_RAW_MEMORY,
+ NULL, myaddr, memaddr, len);
+ int i;
+
if (res <= 0)
return res;
/* Update LEN to what was actually written. */
@@ -527,16 +577,175 @@ dcache_xfer_memory (struct target_ops *ops, DCACHE *dcache,
}
else
{
- for (i = 0; i < len; i++)
+ int i;
+ struct mem_range *r;
+ /* The starting address of each cached range. */
+ CORE_ADDR cached_addr = memaddr;
+ /* A list of cache blocks should be postponed inserting to
+ dcache. */
+ struct dcache_block *postpone_insert = NULL;
+
+ VEC(mem_range_s) *memory;
+ VEC(mem_range_s) *uncached = NULL;
+
+ /* Find readable ranges in range [MEMADDR, MEMADDR + LEN),
+ supposing write-only regions are wo1 and wo2. Then,
+ readable ranges are r1, r2 and r3.
+
+ MEMADDR MEMADDR + LEN
+ |<------------------------------------------------->|
+ |<-- wo1 -->| |<-- wo2 -->|
+
+ |<-r1->| |<--r2-->| |<---r3---->| */
+ memory = dcache_ranges_readable (memaddr, len);
+
+ /* GDB will read from these three readable ranges, r1, r2 and r3.
+ GDB has to check the corresponding cache lines' state (cached
+ or uncached) to determine whether to read from the target
+ memory or the cache lines.
+
+ MEMADDR MEMADDR + LEN
+ |<------------------------------------------------->|
+ |<-- wo1 -->| |<-- wo2 -->|
+
+ |<-r1->| |<--r2-->| |<---r3---->|
+
+ -u-|-----c----|-----u----|-----c----|-----c----|--u--
+ 'u' stands for unchaced 'c' stands for cached.
+
+ |u1|-c1-| | u2 |c2| |--c3--| u3 |
+
+ Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+ c2 and c3. */
+ uncached = dcache_ranges_uncached (dcache, memory);
+
+ VEC_free (mem_range_s, memory);
+
+ /* Iterate each uncached range. Read memory from cache lines if
+ memory address is not within the uncached range, otherwise, read
+ from the target memory and update corresponding cache lines. */
+
+ for (i = 0; VEC_iterate (mem_range_s, uncached, i, r); i++)
{
- if (!dcache_peek_byte (dcache, memaddr + i, myaddr + i))
+ int j;
+
+ if (cached_addr < r->start)
{
- /* That failed. Discard its cache line so we don't have a
- partially read line. */
- dcache_invalidate_line (dcache, memaddr + i);
- return i;
+ /* Read memory [cached_addr, MIN (r->start, MEMADDR + LEN))
+ from cache lines. */
+
+ for (; cached_addr < r->start && cached_addr < (memaddr + len);
+ cached_addr++)
+ {
+ struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+ gdb_assert (db != NULL);
+
+ myaddr[cached_addr - memaddr]
+ = db->data[XFORM (dcache, cached_addr)];
+ }
+ }
+ cached_addr = r->start + r->length;
+
+ /* Part of the memory range [MEMADDR, MEMADDR + LEN) is
+ not cached. */
+ if (r->start < len + memaddr)
+ {
+ /* MEMADDR_START and MEMADDR_END are aligned on
+ dcache->line_size, because dcache->line_size is the
+ minimal unit to update cache and fetch from the target
+ memory. */
+ CORE_ADDR memaddr_start
+ = align_down (r->start, dcache->line_size);
+ CORE_ADDR memaddr_end
+ = align_up (r->start + r->length, dcache->line_size);
+ int res;
+ int len1 = memaddr_end - memaddr_start;
+ int len2;
+ gdb_byte *buf = xmalloc (len1);
+
+ /* Read multiple cache lines to cover memory range
+ [r->start, r->start + MIN (r->length,
+ LEN + MEMADDR - r->start)) from target. */
+
+ res = target_read (¤t_target, TARGET_OBJECT_RAW_MEMORY,
+ NULL, buf, memaddr_start, len1);
+
+ if (res == -1)
+ {
+ VEC_free (mem_range_s, uncached);
+ xfree (buf);
+ return r->start - memaddr;
+ }
+
+ /* Copy contents to MYADDR. */
+ len2 = r->length;
+ if (len2 > len + memaddr - r->start)
+ len2 = len + memaddr - r->start;
+
+ memcpy ((r->start - memaddr) + myaddr,
+ buf + (r->start - memaddr_start),
+ len2);
+
+ /* Update cache lines in range
+ [MEMADDR_START, MEMADDR_START + LEN1). */
+ for (j = 0; j < (len1 / dcache->line_size); j++)
+ {
+ struct dcache_block *db = NULL;
+
+ if (dcache_is_full (dcache))
+ {
+ /* DCACHE is full, don't put dcache_block in it,
+ otherwise one dcache_block is evicted, which
+ may be used later in the loop. Save the
+ dcache_block temporarily in POSTPONE_INSERT
+ and use it to update DCACHE out side the
+ loop. */
+ db = xmalloc (offsetof (struct dcache_block, data)
+ + dcache->line_size);
+ db->addr = MASK (dcache,
+ memaddr_start + j * dcache->line_size);
+ append_block (&postpone_insert, db);
+ }
+ else
+ {
+ /* DCACHE is not full, it is OK to put dcache_block in
+ it. */
+ db = dcache_hit (dcache, memaddr_start + j * dcache->line_size);
+
+ gdb_assert (db == NULL);
+
+ db = dcache_alloc (dcache, memaddr_start + j * dcache->line_size);
+ }
+
+ memcpy (db->data, &buf[j * dcache->line_size], dcache->line_size);
+ }
+
+ xfree (buf);
+
+ if (res < len1)
+ {
+ VEC_free (mem_range_s, uncached);
+ return r->start - memaddr + res;
+ }
}
}
+
+ VEC_free (mem_range_s, uncached);
+
+ for (; cached_addr < (memaddr + len); cached_addr++)
+ {
+ struct dcache_block *db = dcache_hit (dcache, cached_addr);
+
+ gdb_assert (db != NULL);
+
+ myaddr[cached_addr - memaddr]
+ = db->data[XFORM (dcache, cached_addr)];
+ }
+
+ /* Copy dcache_blocks in POSTPONE_INSERT to DCACHE. */
+ for_each_block (&postpone_insert, copy_block, dcache);
+ for_each_block (&postpone_insert, free_block, NULL);
}
return len;
--
1.7.7.6
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-22 0:50 ` Yao Qi
@ 2013-10-25 17:08 ` Doug Evans
2013-11-06 9:24 ` Yao Qi
2013-11-18 17:08 ` Pedro Alves
0 siblings, 2 replies; 11+ messages in thread
From: Doug Evans @ 2013-10-25 17:08 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
On Mon, Oct 21, 2013 at 5:48 PM, Yao Qi <yao@codesourcery.com> wrote:
>> The data for cache line sizes less than the default (64 bytes)
>> is nice but not compelling.
>> E.g., What if we just increase the cache line size to 256, say?
>> Or 1024? And/or maybe auto-adjust it based on
>
>
> In this case, it is useful to increase the cache line size. Increase the
> cache line size to 256 can save 0.1 (1.16 -> 1.06) in my experiment, which
> is a good improvement.
>
> When people are doing remote debugging, and find it is slow. The reason of
> slowness is usually unclear, so it may not be useful advise them to increase
> data cache line size.
If the cache line size is set based on the remote packet size (and all
else being equal, that's a reasonable thing to do),
then we don't have to tell them anything.
OTOH, I'm not entirely thrilled with changing the cache-line size automagically.
OTOOH, how many users even know that they *can* change the cache line size?
We don't have to advise them to change the cache line size (with the
word "advise" implying the end result is necessarily an improvement),
but I'm totally ok with suggesting it as something to *try*.
>> "remote memory-read-packet-size" or some such?
>
>
> "remote memory-read-packet-size" is not useful in this case, I am afraid.
> GDB reads one data cache line by a single 'm' packet, and the size (data
> cache line size) is less than the memory read packet size. Changing "remote
> memory-read-packet-size" doesn't have effects.
You misparsed what I wrote, sorry for being unclear.
Putting the sentence back together:
And/or maybe auto-adjust it based on "remote memory-read-packet-size"
or some such?
["it" being the cache line size]
When I connect to gdbserver on my amd64-linux box the cache line size
is 64 but the memory-read-packet-size is 16383.
Those are wildly different numbers. I'm not suggesting setting the
cache line size to 16K, but it does suggest that 64 is a bit small.
OTOH, there's no point in having a 256 byte line size of the
memory-read-packet-size is 64.
OTOOH, with your patch to read multiple cache lines at once the cost
of smaller cache line sizes is mitigated.
It's not my favorite choice, I just mentioned it for discussion's sake.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-25 17:08 ` Doug Evans
@ 2013-11-06 9:24 ` Yao Qi
2013-11-18 17:08 ` Pedro Alves
1 sibling, 0 replies; 11+ messages in thread
From: Yao Qi @ 2013-11-06 9:24 UTC (permalink / raw)
To: Doug Evans; +Cc: gdb-patches
On 10/26/2013 01:08 AM, Doug Evans wrote:
> If the cache line size is set based on the remote packet size (and all
> else being equal, that's a reasonable thing to do),
> then we don't have to tell them anything.
> OTOH, I'm not entirely thrilled with changing the cache-line size automagically.
>
Data cache line size should be related to the pattern GDB reads target
data, instead of remote packet size. Data cache is quite general, and
we may not have to connect it with remote packet size.
"Setting cache line size based on the remote packet size" helps on
performance, but IMO it is sort of different optimization from what I
proposed here.
> OTOOH, how many users even know that they*can* change the cache line size?
> We don't have to advise them to change the cache line size (with the
> word "advise" implying the end result is necessarily an improvement),
> but I'm totally ok with suggesting it as something to*try*.
>
Of course. However, I still prefer to get GDB smarter to avoid
performance issues as much as possible :).
>
>>> >>"remote memory-read-packet-size" or some such?
>> >
>> >
>> >"remote memory-read-packet-size" is not useful in this case, I am afraid.
>> >GDB reads one data cache line by a single 'm' packet, and the size (data
>> >cache line size) is less than the memory read packet size. Changing "remote
>> >memory-read-packet-size" doesn't have effects.
> You misparsed what I wrote, sorry for being unclear.
>
> Putting the sentence back together:
> And/or maybe auto-adjust it based on "remote memory-read-packet-size"
> or some such?
> ["it" being the cache line size]
>
> When I connect to gdbserver on my amd64-linux box the cache line size
> is 64 but the memory-read-packet-size is 16383.
> Those are wildly different numbers. I'm not suggesting setting the
> cache line size to 16K, but it does suggest that 64 is a bit small.
> OTOH, there's no point in having a 256 byte line size of the
> memory-read-packet-size is 64.
Yeah, I agree that these two sizes should match each other to some extent.
> OTOOH, with your patch to read multiple cache lines at once the cost
> of smaller cache line sizes is mitigated.
Right.
--
Yao (é½å°§)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-25 17:08 ` Doug Evans
2013-11-06 9:24 ` Yao Qi
@ 2013-11-18 17:08 ` Pedro Alves
1 sibling, 0 replies; 11+ messages in thread
From: Pedro Alves @ 2013-11-18 17:08 UTC (permalink / raw)
To: Doug Evans; +Cc: Yao Qi, gdb-patches
On 10/25/2013 06:08 PM, Doug Evans wrote:
> OTOH, there's no point in having a 256 byte line size of the
> memory-read-packet-size is 64.
That's an excellent point. I think that we could also address that
in another way, without having the dcache code query the target
(or targets!) for maximum sizes -- currently, each cache line has a
fixed size, but instead, dcache could treat the line
size as the _maximum_ line size. That is, 'struct dcache_block' would
gain a new field that recorded much much data actually is stored
in the block. Then, when filling the block, dcache would use
target_xfer_partial instead of target_read/write, and if if the target
returns less than the block size, record how much it was read in the
block, and return to the dcache caller. Further partial transfers
would fill the rest of the dcache line (up until it was full). In
essence, dcache works at the target_xfer level, but has its own
loop that sort of "breaks" the partial transfer logic.
--
Pedro Alves
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] Read memory in multiple lines in dcache_xfer_memory.
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
2013-10-21 20:38 ` Doug Evans
2013-10-21 23:16 ` Doug Evans
@ 2013-11-18 18:17 ` Pedro Alves
2 siblings, 0 replies; 11+ messages in thread
From: Pedro Alves @ 2013-11-18 18:17 UTC (permalink / raw)
To: Yao Qi; +Cc: gdb-patches
Hmm, do we really this all this memory range merging and stuff?
+ MEMADDR MEMADDR + LEN
+ |<------------------------------------------------->|
+ |<-- wo1 -->| |<-- wo2 -->|
+
+ |<-r1->| |<--r2-->| |<---r3---->|
+
+ -u-|-----c----|-----u----|-----c----|-----c----|--u--
+ 'u' stands for unchaced 'c' stands for cached.
+
+ |u1|-c1-| | u2 |c2| |--c3--| u3 |
+
+ Uncached ranges are u1, u2 and u3, and cached ranges are c1,
+ c2 and c3. */
I feels like the dcache code only needs to find the next
boundary, and read/write till that. That is, say, for a read,
the first dcache_xfer_memory would find the u1/c1 boundary,
then read [MEMADDR,u1/c1], and return. memory_xfer_partial
will then request another chunk, starting at u1/c1. That starts
at a cached range, so dcache would return c1. memory_xfer_partial
then requests more data, starting at the end of c1. That hits
a cached, non-readable range, so dcache returns error. Etc.
I think finding the next cached/uncached boundary address should
be simpler than getting vecs of ranges and merging them all,
and handling reads/writes separately. If the next boundary
spans multiple dcache lines, then you'd still read multiple
lines in one go.
--
Pedro Alves
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2013-11-18 17:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-18 3:22 [RFC 0/2] Read memory in multiple lines in dcache_xfer_memory Yao Qi
2013-10-18 3:22 ` [PATCH 2/2] " Yao Qi
2013-10-21 20:38 ` Doug Evans
2013-10-22 0:50 ` Yao Qi
2013-10-25 17:08 ` Doug Evans
2013-11-06 9:24 ` Yao Qi
2013-11-18 17:08 ` Pedro Alves
2013-10-21 23:16 ` Doug Evans
2013-10-22 12:27 ` Yao Qi
2013-11-18 18:17 ` Pedro Alves
2013-10-18 3:22 ` [PATCH 1/2] Separate read and write " Yao Qi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).