[PATCH 1/2] Fast tracepoint for powerpc64le

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH 1/2] Fast tracepoint for powerpc64le
@ 2015-02-20 18:04 Wei-cheng Wang
  2015-02-25 15:20 ` [PATCH 1/3 v2] " Wei-cheng Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Wei-cheng Wang @ 2015-02-20 18:04 UTC (permalink / raw)
  To: uweigand, gdb-patches

Hi,

These patches implement fast tracepoint for PowerPC64.

The first part includes required porting for PowerPC64 (and 32-bit) target.
Including
* Install fast tracepoint jump pad
* Agent expression bytecode compilation for powerpc64 only.
   For 32-bit, bytecode interpreter is used instead.
* IPA (libinproctrace.so)
* Implement required gdbarch hooks.
* Enable tracepoint testing for powerpc.

The second part fixes where the jump pad buffer is mapped in memory.
Current jump pad is mapped at a very low address, which is too far
for powerpc executables.  The second patch adds a function, jump_pad_area_hint,
for target to give a hint for where to map the buffer.

Unfortunately, this implementation currently only enable tracepoint for
powerpc64le, because there are some issues that PowerPC64 big-endian
and PowerPC32 cannot work properly.  I am not sure how to fix them.
I will explain in more detail below.

Tested on Ubuntu 14.04.1 LTS ppc64le
with advanced toolchain 8.0.3 (gcc 4.9.3 20150113)

# of expected passes            2581
# of unexpected failures        61
# of expected failures          16
# of untested testcases         1
# of unsupported tests          2

The failed cases are
  27 FAIL: gdb.trace/collection.exp
   4 FAIL: gdb.trace/entry-values.exp
   2 FAIL: gdb.trace/ftrace.exp
   1 FAIL: gdb.trace/no-attach-trace.exp
  18 FAIL: gdb.trace/tfind.exp
   1 FAIL: gdb.trace/tspeed.exp
   8 FAIL: gdb.trace/unavailable.exp

* collection.exp fails are DWARF issues.  x86 failed too.
   (https://sourceware.org/bugzilla/show_bug.cgi?id=15081)
* entry-values fails: The casea try to backtrace at
   an inline-asm-inserted symbol without debug information.
   The prologue analyzer is confused.
* ftrace.exp: x86 has the same issue (KFAIL in x86)
   (https://sourceware.org/bugzilla/show_bug.cgi?id=13808)
* no-attach-trace.exp: x86 has the same issue.
   Tracepoint is not supported when target is `exec'.
   I think this should be XFAIL?
* tfind.exp: One of the tracepoint is inserted at
   `*gdb_recursion_test'.  It's not hit because local-entry is called
   instead.  The 18 FAILs are off-by-one error.
* tspeed.exp:  This case is used to test whether fast tracepoints
   are *faster* than regular tracepoints.  The case itself uses
   sys+user time to find a proper iteration count for measurement.
   (quote: "Total test time should be between 2 and 5 seconds.")
   However, in my environment, 2 seconds of sys+user time means
   2 minutes wall clock, so this case failed due to timeout.
* unavailable.exp: x86 has the same issue.

The main reason why PowerPC64 big-endian doesn't work is
calling convention (function descriptors) issue.
   When installing a tracepoint in inferior memory, gdbserver
asks the address of "gdb_collect" (and etc.) using qSymbol packet,
and it generate a sequence of instructions to calling that address.
   However, gdb-client "return the start of code instead of
any data function descriptor."
   See commenting in remote_check_symbols/remote.c,
https://sourceware.org/ml/gdb-patches/2007-06/msg00389.html
and gen_call() in this patch.
   In order for powerpc64be to work, qSymbol packet should be
extend for function descriptors.

For powerpc32 to work, some data structure/function in tracepoint.c
need to be fixed.  For example,

* write_inferior_data_ptr should be fixed for big-endian.
   If sizeof (CORE_ADDR) is larger than sizeof (void*), zeros are written.
   BTW, I thnink write_inferior_data_pointer provides the same functionality
   without this issue.  I'm not sure why write_inferior_data_ptr is needed?

* Data structure layout between gdbserver and IPA is not consistent.

   There are two versions of tracepoint_action one for gdbserver,
   and antoher for inferior (IPA side).

   -    struct tracepoint_action
   |    {
   |    #ifndef IN_PROCESS_AGENT
   |      const struct tracepoint_action_ops *ops;
   | -  #endif
   | |    char type;
   - -  };

   It is the base object for action objects.

   struct collect_memory_action
   {
     struct tracepoint_action base;  <--
     {
       const struct tracepoint_action_ops *ops;
   -   char type;
   | }
   |
   | ULONGEST addr;
   | ULONGEST len;
   - int32_t basereg;
   };

   When gdbserver downloading the action object to inferior,
   it copies the object from offsetof(type) to the end.
   (See m_tracepoint_action_download/tracepoint.c for example)
   Howevery, the object layouts may not be consistent between
   the two versions (with or without ops fields.)
   It depends the the alignment requirement of addr (first data member
   after base object), and the padding of tracepoint_action.

   In this case, the distance from "type" to "addr" changes

      Wihtout ops           with ops
      0   1   2   3         0   1   2   3
    0 type| PADDING...    0 ops-------------|
    4 ................    4 type|PADDING....|
    8 addr------------    8 addr-------------
    c ---------------|    c ----------------|
   10 len-------------   10 len--------------
   14 ---------------|   14 ----------------|
   18 basereg--------|   18 basereg---------|

   Hope I can get some advices before fixing this :)

Thanks,
Wei-cheng

---
gdb/ChangeLog

2015-02-20  Wei-cheng Wang  <cole945@gmail.com>

	* rs6000-tdep (ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction, ppc_gen_return_address): New functions.
	(rs6000_frame_cache, rs6000_frame_this_id): Handle unavailable PC/SP
	to build unavailable frame.
	(rs6000_gdbarch_init): Hook ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction, and ppc_gen_return_address.

gdb/gdbserver/ChangeLog

2015-02-20  Wei-cheng Wang  <cole945@gmail.com>

	* Makefile.in (linux-ppc-ipa.o, powerpc-64l-ipa.o,
	powerpc-32l-ipa.o): New rules.
	* configure.srv (powerpc*-*-linux*): Add powerpc-64l-ipa.o,
	powerpc-32l-ipa.o, and linux-ppc-ipa.o in ipa_obj
	* linux-ppc-ipa.c: New file.
	* linux-ppc-low.c (ppc_supports_z_point_type, ppc_insert_point,
	ppc_remove_point, put_i32, get_i32, gen_ds_form, gen_d_form,
	gen_xfx_form, gen_x_form, gen_md_form, gen_i_form, gen_b_form,
	gen_limm, gen_atomic_xchg, gen_call, ppc_supports_tracepoints,
	ppc_install_fast_tracepoint_jump_pad,
	ppc_get_min_fast_tracepoint_insn_len, emit_insns, ppc64_emit_prologue,
	ppc64_emit_epilogue, ppc64_emit_add, ppc64_emit_sub, ppc64_emit_mul,
	ppc64_emit_lsh, ppc64_emit_rsh_signed, ppc64_emit_rsh_unsigned,
	ppc64_emit_ext, ppc64_emit_zero_ext, ppc64_emit_log_not,
	ppc64_emit_bit_and, ppc64_emit_bit_or, ppc64_emit_bit_xor,
	ppc64_emit_bit_not, ppc64_emit_equal, ppc64_emit_less_signed,
	ppc64_emit_less_unsigned, ppc64_emit_ref, ppc64_emit_const,
	ppc64_emit_reg, ppc64_emit_pop, ppc64_emit_stack_flush,
	ppc64_emit_swap, ppc64_emit_stack_adjust, ppc64_emit_call,
	ppc64_emit_int_call_1, ppc64_emit_void_call_2, ppc64_emit_if_goto,
	ppc64_emit_goto, ppc64_emit_eq_goto, ppc64_emit_ne_goto,
	ppc64_emit_lt_goto, ppc64_emit_le_goto, ppc64_emit_gt_goto,
	ppc64_emit_ge_goto, ppc_write_goto_address, ppc_emit_ops,
	ppc_supports_range_stepping, ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction): New functions.
	(ppc64_emit_ops_vector): New struct for bytecode compilation.
	(the_low_target): Add target ops - ppc_supports_z_point_type,
	ppc_insert_point, ppc_remove_point, ppc_supports_tracepoints,
	ppc_install_fast_tracepoint_jump_pad, ppc_emit_ops,
	ppc_get_min_fast_tracepoint_insn_len, ppc_supports_range_stepping.

gdb/testsuite/ChangeLog

	* gdb.trace/backtrace.exp: Set registers for powerpc*-*-*.
	* gdb.trace/collection.exp: Ditto.
	* gdb.trace/entry-values.exp: Ditto.
	* gdb.trace/mi-trace-frame-collected.exp: Ditto.
	* gdb.trace/mi-trace-unavailable.exp: Ditto.
	* gdb.trace/pending.exp: Ditto.
	* gdb.trace/report.exp: Ditto.
	* gdb.trace/trace-break.exp: Ditto.
	* gdb.trace/while-dyn.exp: Ditto.
	* gdb.trace/ftrace.exp: Enable testing for powerpc*-*-*.
	* gdb.trace/change-loc.h: set_point for powerpc.
	* gdb.trace/ftrace.c: Ditto
	* gdb.trace/pendshr1.c: Ditto.
	* gdb.trace/pendshr2.c: Ditto.
	* gdb.trace/range-stepping.c: Ditto.
	* gdb.trace/trace-break.c: Ditto.
	* gdb.trace/trace-mt.c: Ditto.
---
  gdb/gdbserver/Makefile.in                          |    9 +
  gdb/gdbserver/configure.srv                        |    1 +
  gdb/gdbserver/linux-ppc-ipa.c                      |  120 ++
  gdb/gdbserver/linux-ppc-low.c                      | 1291 +++++++++++++++++++-
  gdb/rs6000-tdep.c                                  |  179 ++-
  gdb/testsuite/gdb.trace/backtrace.exp              |    3 +
  gdb/testsuite/gdb.trace/change-loc.h               |    2 +
  gdb/testsuite/gdb.trace/collection.exp             |    4 +
  gdb/testsuite/gdb.trace/entry-values.exp           |    2 +
  gdb/testsuite/gdb.trace/ftrace.c                   |    4 +
  gdb/testsuite/gdb.trace/ftrace.exp                 |    3 +-
  .../gdb.trace/mi-trace-frame-collected.exp         |    2 +
  gdb/testsuite/gdb.trace/mi-trace-unavailable.exp   |    2 +
  gdb/testsuite/gdb.trace/pending.exp                |    2 +
  gdb/testsuite/gdb.trace/pendshr1.c                 |    2 +
  gdb/testsuite/gdb.trace/pendshr2.c                 |    2 +
  gdb/testsuite/gdb.trace/range-stepping.c           |    2 +
  gdb/testsuite/gdb.trace/report.exp                 |    4 +
  gdb/testsuite/gdb.trace/trace-break.c              |    4 +
  gdb/testsuite/gdb.trace/trace-break.exp            |    4 +
  gdb/testsuite/gdb.trace/trace-mt.c                 |    2 +
  gdb/testsuite/gdb.trace/while-dyn.exp              |    2 +
  22 files changed, 1637 insertions(+), 9 deletions(-)
  create mode 100644 gdb/gdbserver/linux-ppc-ipa.c

diff --git a/gdb/gdbserver/Makefile.in b/gdb/gdbserver/Makefile.in
index e479c7c..6bdac3e 100644
--- a/gdb/gdbserver/Makefile.in
+++ b/gdb/gdbserver/Makefile.in
@@ -491,6 +491,15 @@ linux-amd64-ipa.o: linux-amd64-ipa.c
  amd64-linux-ipa.o: amd64-linux.c
  	$(IPAGENT_COMPILE) $<
  	$(POSTCOMPILE)
+linux-ppc-ipa.o: linux-ppc-ipa.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
+powerpc-64l-ipa.o: powerpc-64l.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
+powerpc-32l-ipa.o: powerpc-32l.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
  tdesc-ipa.o: tdesc.c
  	$(IPAGENT_COMPILE) $<
  	$(POSTCOMPILE)
diff --git a/gdb/gdbserver/configure.srv b/gdb/gdbserver/configure.srv
index 127786e..e13daf1 100644
--- a/gdb/gdbserver/configure.srv
+++ b/gdb/gdbserver/configure.srv
@@ -245,6 +245,7 @@ case "${target}" in
  			srv_linux_usrregs=yes
  			srv_linux_regsets=yes
  			srv_linux_thread_db=yes
+			ipa_obj="powerpc-64l-ipa.o powerpc-32l-ipa.o linux-ppc-ipa.o"
  			;;
    powerpc-*-lynxos*)	srv_regobj="powerpc-32.o"
  			srv_tgtobj="lynx-low.o lynx-ppc-low.o"
diff --git a/gdb/gdbserver/linux-ppc-ipa.c b/gdb/gdbserver/linux-ppc-ipa.c
new file mode 100644
index 0000000..34b26d0
--- /dev/null
+++ b/gdb/gdbserver/linux-ppc-ipa.c
@@ -0,0 +1,120 @@
+/* GNU/Linux/PowerPC specific low level interface, for the in-process
+   agent library for GDB.
+
+   Copyright (C) 2010-2015 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "server.h"
+#include "tracepoint.h"
+
+#if defined __PPC64__
+void init_registers_powerpc_64l (void);
+extern const struct target_desc *tdesc_powerpc_64l;
+#define REGSZ		8
+#else
+void init_registers_powerpc_32l (void);
+extern const struct target_desc *tdesc_powerpc_32l;
+#define REGSZ		4
+#endif
+
+/* These macros define the position of registers in the buffer collected
+   by the fast tracepoint jump pad.  */
+#define FT_CR_PC	0
+#define FT_CR_R0	1
+#define FT_CR_CR	33
+#define FT_CR_XER	34
+#define FT_CR_LR	35
+#define FT_CR_CTR	36
+#define FT_CR_GPR(n)	(FT_CR_R0 + (n))
+
+static const int ppc_ft_collect_regmap[] = {
+  /* GPRs */
+  FT_CR_GPR (0), FT_CR_GPR (1), FT_CR_GPR (2),
+  FT_CR_GPR (3), FT_CR_GPR (4), FT_CR_GPR (5),
+  FT_CR_GPR (6), FT_CR_GPR (7), FT_CR_GPR (8),
+  FT_CR_GPR (9), FT_CR_GPR (10), FT_CR_GPR (11),
+  FT_CR_GPR (12), FT_CR_GPR (13), FT_CR_GPR (14),
+  FT_CR_GPR (15), FT_CR_GPR (16), FT_CR_GPR (17),
+  FT_CR_GPR (18), FT_CR_GPR (19), FT_CR_GPR (20),
+  FT_CR_GPR (21), FT_CR_GPR (22), FT_CR_GPR (23),
+  FT_CR_GPR (24), FT_CR_GPR (25), FT_CR_GPR (26),
+  FT_CR_GPR (27), FT_CR_GPR (28), FT_CR_GPR (29),
+  FT_CR_GPR (30), FT_CR_GPR (31),
+  /* FPRs - not collected.  */
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  FT_CR_PC, /* PC */
+  -1, /* MSR */
+  FT_CR_CR, /* CR */
+  FT_CR_LR, /* LR */
+  FT_CR_CTR, /* CTR */
+  FT_CR_XER, /* XER */
+  -1, /* FPSCR */
+};
+
+#define PPC_NUM_FT_COLLECT_GREGS \
+  (sizeof (ppc_ft_collect_regmap) / sizeof(ppc_ft_collect_regmap[0]))
+
+/* Supply registers collected by the fast tracepoint jump pad.
+   BUF is the second argument we pass to gdb_collect in jump pad.  */
+
+void
+supply_fast_tracepoint_registers (struct regcache *regcache,
+				  const unsigned char *buf)
+{
+  int i;
+
+  for (i = 0; i < PPC_NUM_FT_COLLECT_GREGS; i++)
+    {
+      if (ppc_ft_collect_regmap[i] == -1)
+	continue;
+      supply_register (regcache, i,
+		       ((char *) buf)
+			+ ppc_ft_collect_regmap[i] * REGSZ);
+    }
+}
+
+/* Return the value of register REGNUM.  RAW_REGS is collected buffer
+   by jump pad.  This function is called by emit_reg.  */
+
+ULONGEST __attribute__ ((visibility("default"), used))
+gdb_agent_get_raw_reg (const unsigned char *raw_regs, int regnum)
+{
+  if (regnum >= PPC_NUM_FT_COLLECT_GREGS)
+    return 0;
+  if (ppc_ft_collect_regmap[regnum] == -1)
+    return 0;
+
+  return *(ULONGEST *) (raw_regs
+			+ ppc_ft_collect_regmap[regnum] * REGSZ);
+}
+
+/* Initialize ipa_tdesc and others.  */
+
+void
+initialize_low_tracepoint (void)
+{
+#if defined __PPC64__
+  init_registers_powerpc_64l ();
+  ipa_tdesc = tdesc_powerpc_64l;
+#else
+  init_registers_powerpc_32l ();
+  ipa_tdesc = tdesc_powerpc_32l;
+#endif
+}
diff --git a/gdb/gdbserver/linux-ppc-low.c b/gdb/gdbserver/linux-ppc-low.c
index 188fac0..fce15b2 100644
--- a/gdb/gdbserver/linux-ppc-low.c
+++ b/gdb/gdbserver/linux-ppc-low.c
@@ -24,6 +24,8 @@
  #include <asm/ptrace.h>

  #include "nat/ppc-linux.h"
+#include "ax.h"
+#include "tracepoint.h"

  static unsigned long ppc_hwcap;

@@ -512,6 +514,1266 @@ ppc_breakpoint_at (CORE_ADDR where)
    return 0;
  }

+/* Implement supports_z_point_type target-ops.
+   Returns true if type Z_TYPE breakpoint is supported.
+
+   Handling software breakpoint at server side, so tracepoints
+   and breakpoints can be inserted at the same location.  */
+
+static int
+ppc_supports_z_point_type (char z_type)
+{
+  switch (z_type)
+    {
+    case Z_PACKET_SW_BP:
+      return 1;
+    case Z_PACKET_HW_BP:
+    case Z_PACKET_WRITE_WP:
+    case Z_PACKET_ACCESS_WP:
+    default:
+      return 0;
+    }
+}
+
+/* Implement insert_point target-ops.
+   Returns 0 on success, -1 on failure and 1 on unsupported.  */
+
+static int
+ppc_insert_point (enum raw_bkpt_type type, CORE_ADDR addr,
+		  int size, struct raw_breakpoint *bp)
+{
+  switch (type)
+    {
+    case raw_bkpt_type_sw:
+      return insert_memory_breakpoint (bp);
+
+    case raw_bkpt_type_hw:
+    case raw_bkpt_type_write_wp:
+    case raw_bkpt_type_access_wp:
+    default:
+      /* Unsupported.  */
+      return 1;
+    }
+}
+
+/* Implement remove_point target-ops.
+   Returns 0 on success, -1 on failure and 1 on unsupported.  */
+
+static int
+ppc_remove_point (enum raw_bkpt_type type, CORE_ADDR addr,
+		  int size, struct raw_breakpoint *bp)
+{
+  switch (type)
+    {
+    case raw_bkpt_type_sw:
+      return remove_memory_breakpoint (bp);
+
+    case raw_bkpt_type_hw:
+    case raw_bkpt_type_write_wp:
+    case raw_bkpt_type_access_wp:
+    default:
+      /* Unsupported.  */
+      return 1;
+    }
+}
+
+/* Put a 32-bit INSN instruction in BUF in target endian.  */
+
+static int
+put_i32 (unsigned char *buf, uint32_t insn)
+{
+  if (__BYTE_ORDER == __LITTLE_ENDIAN)
+    {
+      buf[3] = (insn >> 24) & 0xff;
+      buf[2] = (insn >> 16) & 0xff;
+      buf[1] = (insn >> 8) & 0xff;
+      buf[0] = insn & 0xff;
+    }
+  else
+    {
+      buf[0] = (insn >> 24) & 0xff;
+      buf[1] = (insn >> 16) & 0xff;
+      buf[2] = (insn >> 8) & 0xff;
+      buf[3] = insn & 0xff;
+    }
+
+  return 4;
+}
+
+/* return a 32-bit value in target endian in BUF.  */
+
+__attribute__((unused)) /* Maybe unused due to conditional compilation.  */
+static uint32_t
+get_i32 (unsigned char *buf)
+{
+  uint32_t r;
+
+  if (__BYTE_ORDER == __LITTLE_ENDIAN)
+    r = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
+  else
+    r = (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3];
+
+  return r;
+}
+
+/* Generate a ds-form instruction in BUF and return the number of bytes written
+
+   0      6     11   16          30 32
+   | OPCD | RST | RA |     DS    |XO|  */
+
+__attribute__((unused)) /* Maybe unused due to conditional compilation.  */
+static int
+gen_ds_form (unsigned char *buf, int opcd, int rst, int ra, int ds, int xo)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (ds & 0xfffc) | (xo & 0x3);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used ds-form instructions.  */
+
+#define GEN_STD(buf, rs, ra, offset)	gen_ds_form (buf, 62, rs, ra, offset, 0)
+#define GEN_STDU(buf, rs, ra, offset)	gen_ds_form (buf, 62, rs, ra, offset, 1)
+#define GEN_LD(buf, rt, ra, offset)	gen_ds_form (buf, 58, rt, ra, offset, 0)
+#define GEN_LDU(buf, rt, ra, offset)	gen_ds_form (buf, 58, rt, ra, offset, 1)
+
+/* Generate a d-form instruction in BUF.
+
+   0      6     11   16             32
+   | OPCD | RST | RA |       D      |  */
+
+static int
+gen_d_form (unsigned char *buf, int opcd, int rst, int ra, int si)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (si & 0xffff);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used d-form instructions.  */
+
+#define GEN_ADDI(buf, rt, ra, si)	gen_d_form (buf, 14, rt, ra, si)
+#define GEN_ADDIS(buf, rt, ra, si)	gen_d_form (buf, 15, rt, ra, si)
+#define GEN_LI(buf, rt, si)		GEN_ADDI (buf, rt, 0, si)
+#define GEN_LIS(buf, rt, si)		GEN_ADDIS (buf, rt, 0, si)
+#define GEN_ORI(buf, rt, ra, si)	gen_d_form (buf, 24, rt, ra, si)
+#define GEN_ORIS(buf, rt, ra, si)	gen_d_form (buf, 25, rt, ra, si)
+#define GEN_LWZ(buf, rt, ra, si)	gen_d_form (buf, 32, rt, ra, si)
+#define GEN_STW(buf, rt, ra, si)	gen_d_form (buf, 36, rt, ra, si)
+
+/* Generate a xfx-form instruction in BUF and return the number of bytes
+   written.
+
+   0      6     11         21        31 32
+   | OPCD | RST |    RI    |    XO   |/|  */
+
+static int
+gen_xfx_form (unsigned char *buf, int opcd, int rst, int ri, int xo)
+{
+  uint32_t insn = opcd << 26;
+  unsigned int n = ((ri & 0x1f) << 5) | ((ri >> 5) & 0x1f);
+
+  insn |= (rst << 21) | (n << 11) | (xo << 1);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used xfx-form instructions.  */
+
+#define GEN_MFSPR(buf, rt, spr)		gen_xfx_form (buf, 31, rt, spr, 339)
+#define GEN_MTSPR(buf, rt, spr)		gen_xfx_form (buf, 31, rt, spr, 467)
+
+/* Generate a x-form instruction in BUF and return the number of bytes written.
+
+   0      6     11   16   21       31 32
+   | OPCD | RST | RA | RB |   XO   |RC|  */
+
+static int
+gen_x_form (unsigned char *buf, int opcd, int rst, int ra, int rb,
+	    int xo, int rc)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (rb << 11) | (xo << 1) | rc;
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used x-form instructions.  */
+
+#define GEN_OR(buf, ra, rs, rb)		gen_x_form (buf, 31, rs, ra, rb, 444, 0)
+#define GEN_MR(buf, ra, rs)		GEN_OR (buf, ra, rs, rs)
+#define GEN_LWARX(buf, rt, ra, rb)	gen_x_form (buf, 31, rt, ra, rb, 20, 0)
+#define GEN_STWCX(buf, rs, ra, rb)	gen_x_form (buf, 31, rs, ra, rb, 150, 1)
+/* Assume bf = cr7.  */
+#define GEN_CMPW(buf, ra, rb)    gen_x_form (buf, 31, 28, ra, rb, 0, 0)
+
+/* Generate a md-form instruction in BUF and return the number of bytes written.
+
+   0      6    11   16   21   27   30 31 32
+   | OPCD | RS | RA | sh | mb | XO |sh|Rc|  */
+
+static int
+gen_md_form (unsigned char *buf, int opcd, int rs, int ra, int sh, int mb,
+	     int xo, int rc)
+{
+  uint32_t insn = opcd << 26;
+  unsigned int n = ((mb & 0x1f) << 1) | ((mb >> 5) & 0x1);
+  unsigned int sh0_4 = sh & 0x1f;
+  unsigned int sh5 = (sh >> 5) & 1;
+
+  insn |= (rs << 21) | (ra << 16) | (sh0_4 << 11) | (n << 5) | (sh5 << 1)
+	  | (xo << 2);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used md-form instructions.  */
+
+#define GEN_RLDICL(buf, ra, rs ,sh, mb) \
+				gen_md_form (buf, 30, rs, ra, sh, mb, 0, 0)
+#define GEN_RLDICR(buf, ra, rs ,sh, mb) \
+				gen_md_form (buf, 30, rs, ra, sh, mb, 1, 0)
+
+/* Generate a i-form instruction in BUF and return the number of bytes written.
+
+   0      6                          30 31 32
+   | OPCD |            LI            |AA|LK|  */
+
+static int
+gen_i_form (unsigned char *buf, int opcd, int li, int aa, int lk)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (li & 0x3fffffc) | (aa & 1) | (lk & 1);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used i-form instructions.  */
+
+#define GEN_B(buf, li)		gen_i_form (buf, 18, li, 0, 0)
+#define GEN_BL(buf, li)		gen_i_form (buf, 18, li, 0, 1)
+
+/* Generate a b-form instruction in BUF and return the number of bytes written.
+
+   0      6    11   16               30 31 32
+   | OPCD | BO | BI |      BD        |AA|LK|  */
+
+static int
+gen_b_form (unsigned char *buf, int opcd, int bo, int bi, int bd,
+	    int aa, int lk)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (bo << 21) | (bi << 16) | (bd & 0xfffc) | (aa & 1) | (lk & 1);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used b-form instructions.  */
+/* Assume bi = cr7.  */
+#define GEN_BNE(buf, bd)  gen_b_form (buf, 16, 0x4, (7 << 2) | 2, bd, 0 ,0)
+
+/* GEN_LOAD and GEN_STORE generate 64- or 32-bit load/store for ppc64 or ppc32
+   respectively.  They are primary used for save/restore GPRs in jump-pad,
+   not used for bytecode compiling.  */
+
+#if defined __PPC64__
+#define GEN_LOAD(buf, rt, ra, si)	GEN_LD (buf, rt, ra, si)
+#define GEN_STORE(buf, rt, ra, si)	GEN_STD (buf, rt, ra, si)
+#else
+#define GEN_LOAD(buf, rt, ra, si)	GEN_LWZ (buf, rt, ra, si)
+#define GEN_STORE(buf, rt, ra, si)	GEN_STW (buf, rt, ra, si)
+#endif
+
+/* Generate a sequence of instructions to load IMM in the register REG.
+   Write the instructions in BUF and return the number of bytes written.  */
+
+static int
+gen_limm (unsigned char *buf, int reg, uint64_t imm)
+{
+  int i = 0;
+
+  if ((imm >> 8) == 0)
+    {
+      /* li	reg, imm[7:0] */
+      i += GEN_LI (buf + i, reg, imm);
+    }
+  else if ((imm >> 16) == 0)
+    {
+      /* li	reg, 0
+	 ori	reg, reg, imm[15:0] */
+      i += GEN_LI (buf + i, reg, 0);
+      i += GEN_ORI (buf + i, reg, reg, imm);
+    }
+  else if ((imm >> 32) == 0)
+    {
+      /* lis	reg, imm[31:16]
+	 ori	reg, reg, imm[15:0]
+	 rldicl	reg, reg, 0, 32 */
+      i += GEN_LIS (buf + i, reg, (imm >> 16) & 0xffff);
+      i += GEN_ORI (buf + i, reg, reg, imm & 0xffff);
+      i += GEN_RLDICL (buf + i, reg, reg, 0, 32);
+    }
+  else
+    {
+      /* lis    reg, <imm[63:48]>
+	 ori    reg, reg, <imm[48:32]>
+	 rldicr reg, reg, 32, 31
+	 oris   reg, reg, <imm[31:16]>
+	 ori    reg, reg, <imm[15:0]> */
+      i += GEN_LIS (buf + i, reg, ((imm >> 48) & 0xffff));
+      i += GEN_ORI (buf + i, reg, reg, ((imm >> 32) & 0xffff));
+      i += GEN_RLDICR (buf + i, reg, reg, 32, 31);
+      i += GEN_ORIS (buf + i, reg, reg, ((imm >> 16) & 0xffff));
+      i += GEN_ORI (buf + i, reg, reg, (imm & 0xffff));
+    }
+
+  return i;
+}
+
+/* Generate a sequence for atomically exchange at location LOCK.
+   This code sequence clobbers r6, r7, r8, r9.  */
+
+static int
+gen_atomic_xchg (unsigned char *buf, CORE_ADDR lock, int old_value, int new_value)
+{
+  int i = 0;
+  const int r_lock = 6;
+  const int r_old = 7;
+  const int r_new = 8;
+  const int r_tmp = 9;
+
+  /*
+  1: lwsync
+  2: lwarx   TMP, 0, LOCK
+     cmpwi   TMP, OLD
+     bne     1b
+     stwcx.  NEW, 0, LOCK
+     bne     2b */
+
+  i += gen_limm (buf + i, r_lock, lock);
+  i += gen_limm (buf + i, r_new, new_value);
+  i += gen_limm (buf + i, r_old, old_value);
+
+  i += put_i32 (buf + i, 0x7c2004ac);	/* lwsync */
+  i += GEN_LWARX (buf + i, r_tmp, 0, r_lock);
+  i += GEN_CMPW (buf + i, r_tmp, r_old);
+  i += GEN_BNE (buf + i, -12);
+  i += GEN_STWCX (buf + i, r_new, 0, r_lock);
+  i += GEN_BNE (buf + i, -16);
+
+  return i;
+}
+
+/* Generate a sequence of instructions for calling a function
+   at address of FN.  Return the number of bytes are written in BUF.
+
+   FIXME: For ppc64be, FN should be the address to the function
+   descriptor, so we should load 8(FN) to R2, 16(FN) to R11
+   and then call the function-entry at 0(FN).  However, current GDB
+   implicitly convert the address from function descriptor to the actual
+   function address. See qSymbol handling in remote.c.  Although it
+   seems we can successfully call however, things go wrong when callee
+   trying to access global variable.  */
+
+static int
+gen_call (unsigned char *buf, CORE_ADDR fn)
+{
+  int i = 0;
+
+  /* Must be called by r12 for caller to calculate TOC address. */
+  i += gen_limm (buf + i, 12, fn);
+  i += GEN_MTSPR (buf + i, 12, 9);		/* mtctr  r12 */
+  i += put_i32 (buf + i, 0x4e800421);		/* bctrl */
+
+  return i;
+}
+
+/* Implement supports_tracepoints hook of target_ops.
+   Always return true.  */
+
+static int
+ppc_supports_tracepoints (void)
+{
+#if defined (__PPC64__) && _CALL_ELF == 2
+  return 1;
+#else
+  return 0;
+#endif
+}
+
+/* Implement install_fast_tracepoint_jump_pad of target_ops.
+   See target.h for details.  */
+
+static int
+ppc_install_fast_tracepoint_jump_pad (CORE_ADDR tpoint, CORE_ADDR tpaddr,
+				      CORE_ADDR collector,
+				      CORE_ADDR lockaddr,
+				      ULONGEST orig_size,
+				      CORE_ADDR *jump_entry,
+				      CORE_ADDR *trampoline,
+				      ULONGEST *trampoline_size,
+				      unsigned char *jjump_pad_insn,
+				      ULONGEST *jjump_pad_insn_size,
+				      CORE_ADDR *adjusted_insn_addr,
+				      CORE_ADDR *adjusted_insn_addr_end,
+				      char *err)
+{
+  unsigned char buf[1028];
+  int i, j, offset;
+  CORE_ADDR buildaddr = *jump_entry;
+#if __PPC64__
+  const int rsz = 8;
+#else
+  const int rsz = 4;
+#endif
+  const int frame_size = (((37 * rsz) + 112) + 0xf) & ~0xf;
+
+  /* Stack frame layout for this jump pad,
+
+     High	CTR   -8(sp)
+		LR   -16(sp)
+		XER
+		CR
+		R31
+		R29
+		...
+		R1
+		R0
+     Low	PC/<tpaddr>
+
+     The code flow of thie jump pad,
+
+     1. Save GPR and SPR
+     3. Adjust SP
+     4. Prepare argument
+     5. Call gdb_collector
+     6. Restore SP
+     7. Restore GPR and SPR
+     8. Build a jump for back to the program
+     9. Copy/relocate original instruction
+    10. Build a jump for replacing orignal instruction.  */
+
+  i = 0;
+  for (j = 0; j < 32; j++)
+    i += GEN_STORE (buf + i, j, 1, (-rsz * 36 + j * rsz));
+
+  /* Save PC<tpaddr>  */
+  i += gen_limm (buf + i, 3, tpaddr);
+  i += GEN_STORE (buf + i, 3, 1, (-rsz * 37));
+
+  /* Save CR, XER, LR, and CTR.  */
+  i += put_i32 (buf + i, 0x7c600026);		/* mfcr   r3 */
+  i += GEN_MFSPR (buf + i, 4, 1);		/* mfxer  r4 */
+  i += GEN_MFSPR (buf + i, 5, 8);		/* mflr   r5 */
+  i += GEN_MFSPR (buf + i, 6, 9);		/* mfctr  r6 */
+  i += GEN_STORE (buf + i, 3, 1, -4 * rsz);	/* std    r3, -32(r1) */
+  i += GEN_STORE (buf + i, 4, 1, -3 * rsz);	/* std    r4, -24(r1) */
+  i += GEN_STORE (buf + i, 5, 1, -2 * rsz);	/* std    r5, -16(r1) */
+  i += GEN_STORE (buf + i, 6, 1, -1 * rsz);	/* std    r6, -8(r1) */
+
+  /* Adjust stack pointer.  */
+  i += GEN_ADDI (buf + i, 1, 1, -frame_size);	/* subi   r1,r1,FRAME_SIZE */
+
+  /* Setup arguments to collector.  */
+
+  /* Set r4 to collected registers.  */
+  i += GEN_ADDI (buf + i, 4, 1, frame_size - rsz * 37);
+  /* Set r3 to TPOINT.  */
+  i += gen_limm (buf + i, 3, tpoint);
+
+  i += gen_atomic_xchg (buf + i, lockaddr, 0, 1);
+  /* Call to collector.  */
+  i += gen_call (buf + i, collector);
+  i += gen_atomic_xchg (buf + i, lockaddr, 1, 0);
+
+  /* Restore stack and registers.  */
+  i += GEN_ADDI (buf + i, 1, 1, frame_size);	/* addi	r1,r1,FRAME_SIZE */
+  i += GEN_LOAD (buf + i, 3, 1, -4 * rsz);	/* ld	r3, -32(r1) */
+  i += GEN_LOAD (buf + i, 4, 1, -3 * rsz);	/* ld	r4, -24(r1) */
+  i += GEN_LOAD (buf + i, 5, 1, -2 * rsz);	/* ld	r5, -16(r1) */
+  i += GEN_LOAD (buf + i, 6, 1, -1 * rsz);	/* ld	r6, -8(r1) */
+  i += put_i32 (buf + i, 0x7c6ff120);		/* mtcr	r3 */
+  i += GEN_MTSPR (buf + i, 4, 1);		/* mtxer  r4 */
+  i += GEN_MTSPR (buf + i, 5, 8);		/* mtlr   r5 */
+  i += GEN_MTSPR (buf + i, 6, 9);		/* mtctr  r6 */
+  for (j = 0; j < 32; j++)
+    i += GEN_LOAD (buf + i, j, 1, (-rsz * 36 + j * rsz));
+
+  /* Flush instructions to inferior memory.  */
+  write_inferior_memory (buildaddr, buf, i);
+
+  /* Now, insert the original instruction to execute in the jump pad.  */
+  *adjusted_insn_addr = buildaddr + i;
+  *adjusted_insn_addr_end = *adjusted_insn_addr;
+  relocate_instruction (adjusted_insn_addr_end, tpaddr);
+
+  /* Verify the relocation size.  If should be 4 for normal copy, or 8
+     for some conditional branch.  */
+  if ((*adjusted_insn_addr_end - *adjusted_insn_addr == 0)
+      || (*adjusted_insn_addr_end - *adjusted_insn_addr > 8))
+    {
+      sprintf (err, "E.Unexpected instruction length = %d"
+		    "when relocate instruction.",
+		    (int) (*adjusted_insn_addr_end - *adjusted_insn_addr));
+      return 1;
+    }
+
+  buildaddr = *adjusted_insn_addr_end;
+  i = 0;
+  /* Finally, write a jump back to the program.  */
+  offset = (tpaddr + 4) - (buildaddr + i);
+  if (offset >= (1 << 26) || offset < -(1 << 26))
+    {
+      sprintf (err, "E.Jump back from jump pad too far from tracepoint "
+		    "(offset 0x%x > 26-bit).", offset);
+      return 1;
+    }
+  /* b <tpaddr+4> */
+  i += GEN_B (buf + i, offset);
+  write_inferior_memory (buildaddr, buf, i);
+
+  /* The jump pad is now built.  Wire in a jump to our jump pad.  This
+     is always done last (by our caller actually), so that we can
+     install fast tracepoints with threads running.  This relies on
+     the agent's atomic write support.  */
+  offset = *jump_entry - tpaddr;
+  if (offset >= (1 << 25) || offset < -(1 << 25))
+    {
+      sprintf (err, "E.Jump back from jump pad too far from tracepoint "
+		    "(offset 0x%x > 26-bit).", offset);
+      return 1;
+    }
+  /* b <jentry> */
+  i += GEN_B (jjump_pad_insn, offset);
+  *jjump_pad_insn_size = 4;
+
+  *jump_entry = buildaddr + i;
+
+  gdb_assert (i < sizeof (buf));
+  return 0;
+}
+
+/* Returns the minimum instruction length for installing a tracepoint.  */
+
+static int
+ppc_get_min_fast_tracepoint_insn_len ()
+{
+  return 4;
+}
+
+#if __PPC64__
+
+static void
+emit_insns (unsigned char *buf, int n)
+{
+  write_inferior_memory (current_insn_ptr, buf, n);
+  current_insn_ptr += n;
+}
+
+#define EMIT_ASM(NAME, INSNS)					\
+  do								\
+    {								\
+      extern unsigned char start_bcax_ ## NAME [];		\
+      extern unsigned char end_bcax_ ## NAME [];		\
+      emit_insns (start_bcax_ ## NAME,				\
+		  end_bcax_ ## NAME - start_bcax_ ## NAME);	\
+      __asm__ (".section .text.__ppcbcax\n\t"			\
+	       "start_bcax_" #NAME ":\n\t"			\
+	       INSNS "\n\t"					\
+	       "end_bcax_" #NAME ":\n\t"			\
+	       ".previous\n\t");				\
+    } while (0)
+
+/*
+
+  Bytecode execution stack frame
+
+	|  Parameter save area    (SP + 48) [8 doublewords]
+	|  TOC save area          (SP + 40)
+	|  link editor doubleword (SP + 32)
+	|  compiler doubleword    (SP + 24)  save TOP here during call
+	|  LR save area           (SP + 16)
+	|  CR save area           (SP + 8)
+ SP' -> +- Back chain             (SP + 0)
+	|  Save r31
+	|  Save r30
+	|  Save r4    for *value
+	|  Save r3    for CTX
+ r30 -> +- Bytecode execution stack
+	|
+	|  64-byte (8 doublewords) at initial.  Expand stack as needed.
+	|
+ r31 -> +-
+
+  initial frame size
+  = (48 + 8 * 8) + (4 * 8) + 64
+  = 112 + 96
+  = 208
+
+   r31 is the frame-base for restoring stack-pointer.
+   r30 is the stack-pointer for bytecode machine.
+       It should point to next-empty, so we can use LDU for pop.
+   r3  is used for cache of TOP value.
+       It is the first argument, pointer to CTX.
+   r4  is the second argument, pointer to the result.
+   SP+24 is used for saving TOP during call.
+
+ Note:
+ * To restore stack at epilogue
+   => sp = r31 + 208
+ * To check stack is big enough for bytecode execution.
+   => r30 - 8 > SP + 112
+ * To return execution result.
+   => 0(r4) = TOP
+
+ */
+
+enum { bc_framesz = 208 };
+
+/* Emit prologue in inferior memory.  See above comments.  */
+
+static void
+ppc64_emit_prologue (void)
+{
+  EMIT_ASM (ppc64_prologue,
+	    "mflr  0		\n"
+	    "std   0, 16(1)	\n"
+	    "std   31, -8(1)	\n"
+	    "std   30, -16(1)	\n"
+	    "std   4, -24(1)	\n"
+	    "std   3, -32(1)	\n"
+	    "addi  30, 1, -40	\n"
+	    "li	   3, 0		\n"
+	    "stdu  1, -208(1)	\n"
+	    "mr	   31, 1	\n");
+}
+
+/* Emit epilogue in inferior memory.  See above comments.  */
+
+static void
+ppc64_emit_epilogue (void)
+{
+  EMIT_ASM (ppc64_epilogue,
+	    /* Restore SP.  */
+	    "addi  1, 31, 208	\n"
+	    /* *result = TOP */
+	    "ld    4, -24(1)	\n"
+	    "std   3, 0(4)	\n"
+	    /* Return 0 for no-erro.  */
+	    "li    3, 0		\n"
+	    "ld    0, 16(1)	\n"
+	    "ld    31, -8(1)	\n"
+	    "ld    30, -16(1)	\n"
+	    "mtlr  0		\n"
+	    "blr		\n");
+}
+
+/* TOP = stack[--sp] + TOP  */
+
+static void
+ppc64_emit_add (void)
+{
+  EMIT_ASM (ppc64_add,
+	    "ldu  4, 8(30)	\n"
+	    "add  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] - TOP  */
+
+static void
+ppc64_emit_sub (void)
+{
+  EMIT_ASM (ppc64_sub,
+	    "ldu  4, 8(30)	\n"
+	    "sub  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] * TOP  */
+
+static void
+ppc64_emit_mul (void)
+{
+  EMIT_ASM (ppc64_mul,
+	    "ldu    4, 8(30)	\n"
+	    "mulld  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] << TOP  */
+
+static void
+ppc64_emit_lsh (void)
+{
+  EMIT_ASM (ppc64_lsh,
+	    "ldu  4, 8(30)	\n"
+	    "sld  3, 4, 3	\n");
+}
+
+/* Top = stack[--sp] >> TOP
+   (Arithmetic shift right)  */
+
+static void
+ppc64_emit_rsh_signed (void)
+{
+  EMIT_ASM (ppc64_rsha,
+	    "ldu   4, 8(30)	\n"
+	    "srad  3, 4, 3	\n");
+}
+
+/* Top = stack[--sp] >> TOP
+   (Logical shift right)  */
+
+static void
+ppc64_emit_rsh_unsigned (void)
+{
+  EMIT_ASM (ppc64_rshl,
+	    "ldu  4, 8(30)	\n"
+	    "srd  3, 4, 3	\n");
+}
+
+/* Emit code for signed-extension specified by ARG.  */
+
+static void
+ppc64_emit_ext (int arg)
+{
+  switch (arg)
+    {
+    case 8:
+      EMIT_ASM (ppc64_ext8, "extsb  3, 3	\n");
+      break;
+    case 16:
+      EMIT_ASM (ppc64_ext16, "extsh  3, 3	\n");
+      break;
+    case 32:
+      EMIT_ASM (ppc64_ext32, "extsw  3, 3	\n");
+      break;
+    default:
+      emit_error = 1;
+    }
+}
+
+/* Emit code for zero-extension specified by ARG.  */
+
+static void
+ppc64_emit_zero_ext (int arg)
+{
+  switch (arg)
+    {
+    case 8:
+      EMIT_ASM (ppc64_zext8, "rldicl 3,3,0,56	\n");
+      break;
+    case 16:
+      EMIT_ASM (ppc64_zext16, "rldicl 3,3,0,48	\n");
+      break;
+    case 32:
+      EMIT_ASM (ppc64_zext32, "rldicl 3,3,0,32	\n");
+      break;
+    default:
+      emit_error = 1;
+    }
+}
+
+/* TOP = !TOP
+   i.e., TOP = (TOP == 0) ? 1 : 0;  */
+
+static void
+ppc64_emit_log_not (void)
+{
+  EMIT_ASM (ppc64_log_not,
+	    "cntlzd  3, 3	\n"
+	    "srdi    3, 3, 6	\n");
+}
+
+/* TOP = stack[--sp] & TOP  */
+
+static void
+ppc64_emit_bit_and (void)
+{
+  EMIT_ASM (ppc64_bit_and,
+	    "ldu  4, 8(30)	\n"
+	    "and  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] | TOP  */
+
+static void
+ppc64_emit_bit_or (void)
+{
+  EMIT_ASM (ppc64_bit_or,
+	    "ldu  4, 8(30)	\n"
+	    "or   3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] ^ TOP  */
+
+static void
+ppc64_emit_bit_xor (void)
+{
+  EMIT_ASM (ppc64_bit_xor,
+	    "ldu  4, 8(30)	\n"
+	    "xor  3, 4, 3	\n");
+}
+
+/* TOP = ~TOP
+   i.e., TOP = ~(TOP | TOP)  */
+
+static void
+ppc64_emit_bit_not (void)
+{
+  EMIT_ASM (ppc64_bit_not,
+	    "nor  3, 3, 3	\n");
+}
+
+/* TOP = stack[--sp] == TOP  */
+
+static void
+ppc64_emit_equal (void)
+{
+  EMIT_ASM (ppc64_equal,
+	    "ldu     4, 8(30)	\n"
+	    "xor     3, 3, 4	\n"
+	    "cntlzd  3, 3	\n"
+	    "srdi    3, 3, 6	\n");
+}
+
+/* TOP = stack[--sp] < TOP
+   (Signed comparison)  */
+
+static void
+ppc64_emit_less_signed (void)
+{
+  EMIT_ASM (ppc64_less_signed,
+	    "ldu     4, 8(30)		\n"
+	    "cmpd    7, 4, 3		\n"
+	    "mfocrf  3, 1		\n"
+	    "rlwinm  3, 3, 29, 31, 31	\n");
+}
+
+/* TOP = stack[--sp] < TOP
+   (Unsigned comparison)  */
+
+static void
+ppc64_emit_less_unsigned (void)
+{
+  EMIT_ASM (ppc64_less_unsigned,
+	    "ldu     4, 8(30)		\n"
+	    "cmpld   7, 4, 3		\n"
+	    "mfocrf  3, 1		\n"
+	    "rlwinm  3, 3, 29, 31, 31	\n");
+}
+
+/* Access the memory address in TOP in size of SIZE.
+   Zero-extend the read value.  */
+
+static void
+ppc64_emit_ref (int size)
+{
+  switch (size)
+    {
+    case 1:
+      EMIT_ASM (ppc64_ref8, "lbz   3, 0(3)	\n");
+      break;
+    case 2:
+      EMIT_ASM (ppc64_ref16, "lhz   3, 0(3)	\n");
+      break;
+    case 4:
+      EMIT_ASM (ppc64_ref32, "lwz   3, 0(3)	\n");
+      break;
+    case 8:
+      EMIT_ASM (ppc64_ref64, "ld    3, 0(3)	\n");
+      break;
+    }
+}
+
+/* TOP = NUM  */
+
+static void
+ppc64_emit_const (LONGEST num)
+{
+  unsigned char buf[5 * 4];
+  int i = 0;
+
+  i += gen_limm (buf + i, 3, num);
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+}
+
+/* Set TOP to the value of register REG by calling get_raw_reg function
+   with two argument, collected buffer and register number.  */
+
+static void
+ppc64_emit_reg (int reg)
+{
+  unsigned char buf[8 * 8];
+  int i = 0;
+
+  i += GEN_LD (buf + i, 3, 31, bc_framesz - 32);
+  i += GEN_LD (buf + i, 3, 3, 48);
+  i += GEN_LI (buf + i, 4, reg);	/* mr	r4, reg */
+  i += gen_call (buf + i, get_raw_reg_func_addr ());
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+}
+
+/* TOP = stack[--sp] */
+
+static void
+ppc64_emit_pop (void)
+{
+  EMIT_ASM (ppc64_pop, "ldu  3, 8(30) \n");
+}
+
+/* stack[sp++] = TOP
+
+   Because we may use up bytecode stack, expand 8 doublewords more
+   if needed.  */
+
+static void
+ppc64_emit_stack_flush (void)
+{
+  /* Make sure bytecode stack is big enough before push.
+     Otherwise, expand 64-byte more.  */
+
+  EMIT_ASM (ppc64_stack_flush,
+	    "  std   3, 0(30)		\n"
+	    "  addi  4, 30, -(112 + 8)	\n"
+	    "  cmpd  7, 4, 1		\n"
+	    "  bgt   1f			\n"
+	    "  ld    4, 0(1)		\n"
+	    "  addi  1, 1, -64		\n"
+	    "  std   4, 0(1)		\n"
+	    "1:addi  30, 30, -8		\n"
+	   );
+}
+
+/* Swap TOP and stack[sp-1]  */
+
+static void
+ppc64_emit_swap (void)
+{
+  EMIT_ASM (ppc64_swap,
+	    "ld   4, 8(30)	\n"
+	    "std  3, 8(30)	\n"
+	    "mr   3, 4		\n");
+}
+
+/* Discard N elements in the stack.  */
+
+static void
+ppc64_emit_stack_adjust (int n)
+{
+  unsigned char buf[4];
+  int i = 0;
+
+  i += GEN_ADDI (buf, 30, 30, n << 3);	/* addi	r30, r30, (n << 3) */
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+  gdb_assert (i <= sizeof (buf));
+}
+
+/* Call function FN.  */
+
+static void
+ppc64_emit_call (CORE_ADDR fn)
+{
+  unsigned char buf[8 * 4];
+  int i = 0;
+
+  i += gen_call (buf + i, fn);
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+  gdb_assert (i <= sizeof (buf));
+}
+
+/* FN's prototype is `LONGEST(*fn)(int)'.
+   TOP = fn (arg1)
+  */
+
+static void
+ppc64_emit_int_call_1 (CORE_ADDR fn, int arg1)
+{
+  unsigned char buf[8 * 4];
+  int i = 0;
+
+  /* Setup argument.  arg1 is a 16-bit value.  */
+  i += GEN_LI (buf, 3, arg1);		/* li	r3, arg1 */
+  i += gen_call (buf + i, fn);
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+  gdb_assert (i <= sizeof (buf));
+}
+
+/* FN's prototype is `void(*fn)(int,LONGEST)'.
+   fn (arg1, TOP)
+
+   TOP should be preserved/restored before/after the call.  */
+
+static void
+ppc64_emit_void_call_2 (CORE_ADDR fn, int arg1)
+{
+  unsigned char buf[12 * 4];
+  int i = 0;
+
+  /* Save TOP */
+  i += GEN_STD (buf, 3, 31, bc_framesz + 24);
+
+  /* Setup argument.  arg1 is a 16-bit value.  */
+  i += GEN_MR (buf + i, 4, 3);		/* mr	r4, r3 */
+  i += GEN_LI (buf + i, 3, arg1);	/* li	r3, arg1 */
+  i += gen_call (buf + i, fn);
+
+  /* Restore TOP */
+  i += GEN_LD (buf, 3, 31, bc_framesz + 24);
+
+  write_inferior_memory (current_insn_ptr, buf, i);
+  current_insn_ptr += i;
+  gdb_assert (i <= sizeof (buf));
+}
+
+/* Note in the following goto ops:
+
+   When emitting goto, the target address is later relocated by
+   write_goto_address.  OFFSET_P is the offset of the branch instruction
+   in the code sequence, and SIZE_P is how to relocate the instruction,
+   recognized by ppc_write_goto_address.  In current implementation,
+   SIZE can be either 24 or 14 for branch of conditional-branch instruction.
+ */
+
+/* If TOP is true, goto somewhere.  Otherwise, just fall-through.  */
+
+static void
+ppc64_emit_if_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_if_goto,
+	    "mr     4, 3	\n"
+	    "ldu    3, 8(30)	\n"
+	    "cmpdi  7, 4, 0	\n"
+	    "1:bne  7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Unconditional goto.  */
+
+static void
+ppc64_emit_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_goto,
+	    "1:b	1b	\n");
+
+  if (offset_p)
+    *offset_p = 0;
+  if (size_p)
+    *size_p = 24;
+}
+
+/* Goto if stack[--sp] == TOP  */
+
+static void
+ppc64_emit_eq_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_eq_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:beq   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] != TOP  */
+
+static void
+ppc64_emit_ne_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_ne_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bne   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] < TOP  */
+
+static void
+ppc64_emit_lt_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_lt_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:blt   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] <= TOP  */
+
+static void
+ppc64_emit_le_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_le_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:ble   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] > TOP  */
+
+static void
+ppc64_emit_gt_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_gt_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bgt   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] >= TOP  */
+
+static void
+ppc64_emit_ge_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM (ppc64_ge_goto,
+	    "ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bge   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Relocate previous emitted branch instruction.  FROM is the address
+   of the branch instruction, TO is the goto target address, and SIZE
+   if the value we set by *SIZE_P before.  Currently, it is either
+   24 or 14 of branch and conditional-branch instruction.  */
+
+static void
+ppc_write_goto_address (CORE_ADDR from, CORE_ADDR to, int size)
+{
+  int rel = to - from;
+  uint32_t insn;
+  int opcd;
+  unsigned char buf[4];
+
+  read_inferior_memory (from, buf, 4);
+  insn = get_i32 (buf);
+  opcd = (insn >> 26) & 0x3f;
+
+  switch (size)
+    {
+    case 14:
+      if (opcd != 16)
+	emit_error = 1;
+      insn = (insn & ~0xfffc) | (rel & 0xfffc);
+      break;
+    case 24:
+      if (opcd != 18)
+	emit_error = 1;
+      insn = (insn & ~0x3fffffc) | (rel & 0x3fffffc);
+      break;
+    default:
+      emit_error = 1;
+    }
+
+  put_i32 (buf, insn);
+  write_inferior_memory (from, buf, 4);
+}
+
+/* Vector of emit ops for PowerPC64.  */
+
+static struct emit_ops ppc64_emit_ops_vector =
+{
+  ppc64_emit_prologue,
+  ppc64_emit_epilogue,
+  ppc64_emit_add,
+  ppc64_emit_sub,
+  ppc64_emit_mul,
+  ppc64_emit_lsh,
+  ppc64_emit_rsh_signed,
+  ppc64_emit_rsh_unsigned,
+  ppc64_emit_ext,
+  ppc64_emit_log_not,
+  ppc64_emit_bit_and,
+  ppc64_emit_bit_or,
+  ppc64_emit_bit_xor,
+  ppc64_emit_bit_not,
+  ppc64_emit_equal,
+  ppc64_emit_less_signed,
+  ppc64_emit_less_unsigned,
+  ppc64_emit_ref,
+  ppc64_emit_if_goto,
+  ppc64_emit_goto,
+  ppc_write_goto_address,
+  ppc64_emit_const,
+  ppc64_emit_call,
+  ppc64_emit_reg,
+  ppc64_emit_pop,
+  ppc64_emit_stack_flush,
+  ppc64_emit_zero_ext,
+  ppc64_emit_swap,
+  ppc64_emit_stack_adjust,
+  ppc64_emit_int_call_1,
+  ppc64_emit_void_call_2,
+  ppc64_emit_eq_goto,
+  ppc64_emit_ne_goto,
+  ppc64_emit_lt_goto,
+  ppc64_emit_le_goto,
+  ppc64_emit_gt_goto,
+  ppc64_emit_ge_goto
+};
+
+/*  Implementation of emit_ops target ops.   */
+
+__attribute__ ((unused))
+static struct emit_ops *
+ppc_emit_ops (void)
+{
+  return &ppc64_emit_ops_vector;
+}
+#endif
+
+/* Returns true for supporting range-stepping.  */
+
+static int
+ppc_supports_range_stepping (void)
+{
+  return 1;
+}
+
  /* Provide only a fill function for the general register set.  ps_lgetregs
     will use this for NPTL support.  */

@@ -687,16 +1949,31 @@ struct linux_target_ops the_low_target = {
    ppc_set_pc,
    (const unsigned char *) &ppc_breakpoint,
    ppc_breakpoint_len,
-  NULL,
-  0,
+  NULL, /* breakpoint_reinsert_addr */
+  0, /* decr_pc_after_break */
    ppc_breakpoint_at,
-  NULL, /* supports_z_point_type */
-  NULL,
-  NULL,
-  NULL,
-  NULL,
+  ppc_supports_z_point_type, /* supports_z_point_type */
+  ppc_insert_point,
+  ppc_remove_point,
+  NULL, /* stopped_by_watchpoint */
+  NULL, /* stopped_data_address */
    ppc_collect_ptrace_register,
    ppc_supply_ptrace_register,
+  NULL, /* siginfo_fixup */
+  NULL, /* linux_new_process */
+  NULL, /* linux_new_thread */
+  NULL, /* linux_prepare_to_resume */
+  NULL, /* linux_process_qsupported */
+  ppc_supports_tracepoints,
+  NULL, /* get_thread_area */
+  ppc_install_fast_tracepoint_jump_pad,
+#if __PPC64__
+  ppc_emit_ops,
+#else
+  NULL, /* Use interpreter for ppc32.  */
+#endif
+  ppc_get_min_fast_tracepoint_insn_len,
+  ppc_supports_range_stepping,
  };

  void
diff --git a/gdb/rs6000-tdep.c b/gdb/rs6000-tdep.c
index ef94bba..4ca523a 100644
--- a/gdb/rs6000-tdep.c
+++ b/gdb/rs6000-tdep.c
@@ -83,6 +83,9 @@
  #include "features/rs6000/powerpc-e500.c"
  #include "features/rs6000/rs6000.c"

+#include "ax.h"
+#include "ax-gdb.h"
+
  /* Determine if regnum is an SPE pseudo-register.  */
  #define IS_SPE_PSEUDOREG(tdep, regnum) ((tdep)->ppc_ev0_regnum >= 0 \
      && (regnum) >= (tdep)->ppc_ev0_regnum \
@@ -966,6 +969,21 @@ rs6000_breakpoint_from_pc (struct gdbarch *gdbarch, CORE_ADDR *bp_addr,
      return little_breakpoint;
  }

+/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
+   to the number of bytes the target should copy elsewhere for the
+   tracepoint.  */
+
+static int
+ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
+			      CORE_ADDR addr, int *isize, char **msg)
+{
+  if (isize)
+    *isize = gdbarch_max_insn_length (gdbarch);
+  if (msg)
+    *msg = NULL;
+  return 1;
+}
+
  /* Instruction masks for displaced stepping.  */
  #define BRANCH_MASK 0xfc000000
  #define BP_MASK 0xFC0007FE
@@ -3139,6 +3157,7 @@ struct rs6000_frame_cache
  static struct rs6000_frame_cache *
  rs6000_frame_cache (struct frame_info *this_frame, void **this_cache)
  {
+  volatile struct gdb_exception ex;
    struct rs6000_frame_cache *cache;
    struct gdbarch *gdbarch = get_frame_arch (this_frame);
    struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
@@ -3153,7 +3172,14 @@ rs6000_frame_cache (struct frame_info *this_frame, void **this_cache)
    (*this_cache) = cache;
    cache->saved_regs = trad_frame_alloc_saved_regs (this_frame);

-  func = get_frame_func (this_frame);
+  TRY_CATCH (ex, RETURN_MASK_ERROR)
+    {
+      func = get_frame_func (this_frame);
+    }
+  if (ex.reason < 0 && ex.error != NOT_AVAILABLE_ERROR)
+    throw_exception (ex);
+  return (*this_cache);
+
    pc = get_frame_pc (this_frame);
    skip_prologue (gdbarch, func, pc, &fdata);

@@ -3323,6 +3349,11 @@ rs6000_frame_this_id (struct frame_info *this_frame, void **this_cache,
  {
    struct rs6000_frame_cache *info = rs6000_frame_cache (this_frame,
  							this_cache);
+  if (info->base == 0 && info->initial_sp == 0)
+    {
+      (*this_id) = frame_id_build_unavailable_stack (0);
+      return;
+    }
    /* This marks the outermost frame.  */
    if (info->base == 0)
      return;
@@ -3679,6 +3710,8 @@ bfd_uses_spe_extensions (bfd *abfd)
  #define PPC_LK(insn)	PPC_BIT (insn, 31)
  #define PPC_TX(insn)	PPC_BIT (insn, 31)
  #define PPC_LEV(insn)	PPC_FIELD (insn, 20, 7)
+#define PPC_LI(insn)	(PPC_SEXT (PPC_FIELD (insn, 6, 24), 24) << 2)
+#define PPC_BD(insn)	(PPC_SEXT (PPC_FIELD (insn, 16, 14), 14) << 2)

  #define PPC_XT(insn)	((PPC_TX (insn) << 5) | PPC_T (insn))
  #define PPC_XER_NB(xer)	(xer & 0x7f)
@@ -5332,6 +5365,146 @@ UNKNOWN_OP:
    return 0;
  }

+/* Copy the instruction from OLDLOC to *TO, and update *TO to *TO + size
+   of instruction.  This function is used to adjust pc-relative instructions
+   when copying.  */
+
+static void
+ppc_relocate_instruction (struct gdbarch *gdbarch,
+			  CORE_ADDR *to, CORE_ADDR oldloc)
+{
+  struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
+  enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
+  uint32_t insn;
+  int op6, rel, newrel;
+
+  insn = read_memory_unsigned_integer (oldloc, 4, byte_order);
+  op6 = PPC_OP6 (insn);
+
+  if (op6 == 18 && (insn & 2) == 0)
+    {
+      /* branch && AA = 0 */
+      rel = PPC_LI (insn);
+      newrel = (oldloc - *to) + rel;
+
+      /* Out of range. Cannot relocate instruction.  */
+      if (newrel >= (1 << 25) || newrel < -(1 << 25))
+	return;
+
+      insn = (insn & ~0x3fffffc) | (newrel & 0x3fffffc);
+    }
+  else if (op6 == 16 && (insn & 2) == 0)
+    {
+      /* conditional branch && AA = 0 */
+
+      rel = PPC_BD (insn);
+      newrel = (oldloc - *to) + rel;
+
+      if (newrel >= (1 << 25) || newrel < -(1 << 25))
+	return;
+
+      newrel -= 4;
+      if (newrel >= (1 << 15) || newrel < -(1 << 15))
+	{
+	   /* The offset of to big for conditional-branch (16-bit).
+	      Try to invert the condition and jump with 26-bit branch.
+	      For example,
+
+		beq  .Lgoto
+		INSN1
+
+	      =>
+
+		bne  1f
+		b    .Lgoto
+	      1:INSN1
+
+	    */
+
+	   /* Check whether BO is 001at or 011 at.  */
+	   if ((PPC_BO (insn) & 0x14) != 0x4)
+	     return;
+
+	   /* Invert condition.  */
+	   insn ^= (1 << 24);
+	   /* Jump over the unconditional branch.  */
+	   insn = (insn & ~0xfffc) | 0x8;
+	   write_memory_unsigned_integer (*to, 4, byte_order, insn);
+	   *to += 4;
+
+	   /* Copy LK bit.  */
+	   insn = (18 << 26) | (0x3fffffc & newrel) | (insn & 0x3);
+	   write_memory_unsigned_integer (*to, 4, byte_order, insn);
+	   *to += 4;
+
+	   return;
+	}
+      else
+	insn = (insn & ~0xfffc) | (newrel & 0xfffc);
+    }
+
+  write_memory_unsigned_integer (*to, 4, byte_order, insn);
+  *to += 4;
+}
+
+/* Implement gdbarch_gen_return_address.  Generate a bytecode expression
+   to get the value of the saved PC.  SCOPE is the address we want to
+   get return address for.  SCOPE maybe in the middle of a function.  */
+
+static void
+ppc_gen_return_address (struct gdbarch *gdbarch,
+			struct agent_expr *ax, struct axs_value *value,
+			CORE_ADDR scope)
+{
+  struct rs6000_framedata frame;
+  CORE_ADDR func_addr;
+
+  /* Try to find the start of the function and analyze the prologue.  */
+  if (find_pc_partial_function (scope, NULL, &func_addr, NULL))
+    {
+      skip_prologue (gdbarch, func_addr, scope, &frame);
+
+      if (frame.lr_offset == 0)
+	{
+	  value->type = register_type (gdbarch, PPC_LR_REGNUM);
+	  value->kind = axs_lvalue_register;
+	  value->u.reg = PPC_LR_REGNUM;
+	  return;
+	}
+    }
+  else
+    {
+      /* If we don't where the function starts, we cannot analyze it.
+	 Assuming it's not a leaf function, not frameless, and LR is
+	 saved at back-chain + 16.  */
+
+      frame.frameless = 0;
+      frame.lr_offset = 16;
+    }
+
+  /* if (frameless)
+       load 16(SP)
+     else
+       BC = 0(SP)
+       load 16(BC) */
+
+  ax_reg (ax, gdbarch_sp_regnum (gdbarch));
+
+  /* Load back-chain.  */
+  if (!frame.frameless)
+    {
+      if (register_size (gdbarch, PPC_LR_REGNUM) == 8)
+	ax_simple (ax, aop_ref64);
+      else
+	ax_simple (ax, aop_ref32);
+    }
+
+  ax_const_l (ax, frame.lr_offset);
+  ax_simple (ax, aop_add);
+  value->type = register_type (gdbarch, PPC_LR_REGNUM);
+  value->kind = axs_lvalue_memory;
+}
+
  /* Initialize the current architecture based on INFO.  If possible, re-use an
     architecture from ARCHES, which is a list of architectures already created
     during this debugging session.
@@ -5892,6 +6065,7 @@ rs6000_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)

    set_gdbarch_inner_than (gdbarch, core_addr_lessthan);
    set_gdbarch_breakpoint_from_pc (gdbarch, rs6000_breakpoint_from_pc);
+  set_gdbarch_fast_tracepoint_valid_at (gdbarch, ppc_fast_tracepoint_valid_at);

    /* The value of symbols of type N_SO and N_FUN maybe null when
       it shouldn't be.  */
@@ -5929,6 +6103,9 @@ rs6000_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
    set_gdbarch_displaced_step_location (gdbarch,
  				       displaced_step_at_entry_point);

+  set_gdbarch_relocate_instruction (gdbarch, ppc_relocate_instruction);
+  set_gdbarch_gen_return_address (gdbarch, ppc_gen_return_address);
+
    set_gdbarch_max_insn_length (gdbarch, PPC_INSN_SIZE);

    /* Hook in ABI-specific overrides, if they have been registered.  */
diff --git a/gdb/testsuite/gdb.trace/backtrace.exp b/gdb/testsuite/gdb.trace/backtrace.exp
index 045778e..3094074 100644
--- a/gdb/testsuite/gdb.trace/backtrace.exp
+++ b/gdb/testsuite/gdb.trace/backtrace.exp
@@ -146,6 +146,9 @@ if [is_amd64_regs_target] {
  } elseif [is_x86_like_target] {
      set fpreg "\$ebp"
      set spreg "\$esp"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "\$r31"
+    set spreg "\$r1"
  } else {
      set fpreg "\$fp"
      set spreg "\$sp"
diff --git a/gdb/testsuite/gdb.trace/change-loc.h b/gdb/testsuite/gdb.trace/change-loc.h
index e8e2e86..8efe12d 100644
--- a/gdb/testsuite/gdb.trace/change-loc.h
+++ b/gdb/testsuite/gdb.trace/change-loc.h
@@ -36,6 +36,8 @@ func4 (void)
         SYMBOL(set_tracepoint) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func5) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

diff --git a/gdb/testsuite/gdb.trace/collection.exp b/gdb/testsuite/gdb.trace/collection.exp
index bd42cfa..ed562c9 100644
--- a/gdb/testsuite/gdb.trace/collection.exp
+++ b/gdb/testsuite/gdb.trace/collection.exp
@@ -44,6 +44,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  } else {
      set fpreg "fp"
      set spreg "sp"
diff --git a/gdb/testsuite/gdb.trace/entry-values.exp b/gdb/testsuite/gdb.trace/entry-values.exp
index 0cf5615..f9928f1 100644
--- a/gdb/testsuite/gdb.trace/entry-values.exp
+++ b/gdb/testsuite/gdb.trace/entry-values.exp
@@ -218,6 +218,8 @@ if [is_amd64_regs_target] {
      set spreg "\$rsp"
  } elseif [is_x86_like_target] {
      set spreg "\$esp"
+} elseif [istarget "powerpc*-*-*"] {
+    set spreg "\$r1"
  } else {
      set spreg "\$sp"
  }
diff --git a/gdb/testsuite/gdb.trace/ftrace.c b/gdb/testsuite/gdb.trace/ftrace.c
index f522e6f..e509c7b 100644
--- a/gdb/testsuite/gdb.trace/ftrace.c
+++ b/gdb/testsuite/gdb.trace/ftrace.c
@@ -42,6 +42,8 @@ marker (int anarg)
         SYMBOL(set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

@@ -53,6 +55,8 @@ marker (int anarg)
         SYMBOL(four_byter) ":\n"
  #if (defined __i386__)
         "    cmpl $0x1,0x8(%ebp) \n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/ftrace.exp b/gdb/testsuite/gdb.trace/ftrace.exp
index f2d8002..28b8588 100644
--- a/gdb/testsuite/gdb.trace/ftrace.exp
+++ b/gdb/testsuite/gdb.trace/ftrace.exp
@@ -84,7 +84,8 @@ proc test_fast_tracepoints {} {

      gdb_test "print gdb_agent_gdb_trampoline_buffer_error" ".*" ""

-    if { [istarget "x86_64-*-*"] || [istarget "i\[34567\]86-*-*"] } {
+    if { [istarget "x86_64-*-*"] || [istarget "i\[34567\]86-*-*"] \
+	 || [istarget "powerpc*-*-*"] } {

  	gdb_test "ftrace set_point" "Fast tracepoint .*" \
  	    "fast tracepoint at a long insn"
diff --git a/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp b/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
index 51ed479..1df4d65 100644
--- a/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
+++ b/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
@@ -56,6 +56,8 @@ if [is_amd64_regs_target] {
      set pcreg "rip"
  } elseif [is_x86_like_target] {
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set pcreg "pc"
  } else {
      # Other ports that support tracepoints should set the name of pc
      # register here.
diff --git a/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp b/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
index 6b97d9d..1e6e541 100644
--- a/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
+++ b/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
@@ -135,6 +135,8 @@ proc test_trace_unavailable { data_source } {
  	    set pcnum 16
  	} elseif [is_x86_like_target] {
  	    set pcnum 8
+	} elseif [istarget "powerpc*-*-*"] {
+	    set pcnum 64
  	} else {
  	    # Other ports support tracepoint should define the number
  	    # of its own pc register.
diff --git a/gdb/testsuite/gdb.trace/pending.exp b/gdb/testsuite/gdb.trace/pending.exp
index 0399807..ed36cac 100644
--- a/gdb/testsuite/gdb.trace/pending.exp
+++ b/gdb/testsuite/gdb.trace/pending.exp
@@ -441,6 +441,8 @@ proc pending_tracepoint_with_action_resolved { trace_type } \
  	set pcreg "rip"
      } elseif [is_x86_like_target] {
  	set pcreg "eip"
+    } elseif [istarget "powerpc*-*-*"] {
+	set pcreg "pc"
      }

      gdb_trace_setactions "set action for pending tracepoint" "" \
diff --git a/gdb/testsuite/gdb.trace/pendshr1.c b/gdb/testsuite/gdb.trace/pendshr1.c
index d3b5463..2fd0fba 100644
--- a/gdb/testsuite/gdb.trace/pendshr1.c
+++ b/gdb/testsuite/gdb.trace/pendshr1.c
@@ -38,6 +38,8 @@ pendfunc (int x)
         SYMBOL(set_point1) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(pendfunc1) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/pendshr2.c b/gdb/testsuite/gdb.trace/pendshr2.c
index b8a51a5..3f40c76 100644
--- a/gdb/testsuite/gdb.trace/pendshr2.c
+++ b/gdb/testsuite/gdb.trace/pendshr2.c
@@ -35,6 +35,8 @@ pendfunc2 (int x)
         SYMBOL(set_point2) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(foo) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/range-stepping.c b/gdb/testsuite/gdb.trace/range-stepping.c
index 113f0e2..606db25 100644
--- a/gdb/testsuite/gdb.trace/range-stepping.c
+++ b/gdb/testsuite/gdb.trace/range-stepping.c
@@ -26,6 +26,8 @@
     tracepoint jump.  */
  #if (defined __x86_64__ || defined __i386__)
  #  define NOP "   .byte 0xe9,0x00,0x00,0x00,0x00\n" /* jmp $+5 (5-byte nop) */
+#elif (defined __PPC64__ || defined __PPC__)
+#  define NOP "    nop\n"
  #else
  #  define NOP "" /* port me */
  #endif
diff --git a/gdb/testsuite/gdb.trace/report.exp b/gdb/testsuite/gdb.trace/report.exp
index 2fa676b..e0160f7 100644
--- a/gdb/testsuite/gdb.trace/report.exp
+++ b/gdb/testsuite/gdb.trace/report.exp
@@ -158,6 +158,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  } else {
      set fpreg "fp"
      set spreg "sp"
diff --git a/gdb/testsuite/gdb.trace/trace-break.c b/gdb/testsuite/gdb.trace/trace-break.c
index f381ec6..ced0e92 100644
--- a/gdb/testsuite/gdb.trace/trace-break.c
+++ b/gdb/testsuite/gdb.trace/trace-break.c
@@ -41,6 +41,8 @@ marker (void)
         SYMBOL(set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

@@ -48,6 +50,8 @@ marker (void)
         SYMBOL(after_set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/trace-break.exp b/gdb/testsuite/gdb.trace/trace-break.exp
index 4283ca6..9d6551a 100644
--- a/gdb/testsuite/gdb.trace/trace-break.exp
+++ b/gdb/testsuite/gdb.trace/trace-break.exp
@@ -49,6 +49,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  }

  # Set breakpoint and tracepoint at the same address.
diff --git a/gdb/testsuite/gdb.trace/trace-mt.c b/gdb/testsuite/gdb.trace/trace-mt.c
index 38aeff5..855de54 100644
--- a/gdb/testsuite/gdb.trace/trace-mt.c
+++ b/gdb/testsuite/gdb.trace/trace-mt.c
@@ -37,6 +37,8 @@ thread_function(void *arg)
         SYMBOL(set_point1) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/while-dyn.exp b/gdb/testsuite/gdb.trace/while-dyn.exp
index 198421e..ef92b2d 100644
--- a/gdb/testsuite/gdb.trace/while-dyn.exp
+++ b/gdb/testsuite/gdb.trace/while-dyn.exp
@@ -47,6 +47,8 @@ if [is_amd64_regs_target] {
      set fpreg "\$rbp"
  } elseif [is_x86_like_target] {
      set fpreg "\$ebp"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "\$r31"
  } else {
      set fpreg "\$fp"
  }
-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3 v2] Fast tracepoint for powerpc64le
  2015-02-20 18:04 [PATCH 1/2] Fast tracepoint for powerpc64le Wei-cheng Wang
@ 2015-02-25 15:20 ` Wei-cheng Wang
  2015-03-17 13:34   ` Ulrich Weigand
  2015-02-27 19:53 ` [PATCH 1/2] " Ulrich Weigand
  2015-03-04 17:22 ` Pedro Alves
  2 siblings, 1 reply; 15+ messages in thread
From: Wei-cheng Wang @ 2015-02-25 15:20 UTC (permalink / raw)
  To: uweigand, gdb-patches

Hi Ulrich,

I just found my mail client the it to the wrong address.
Here are some detailed explanation in my previous mail,
in cases you've not read yet.
https://sourceware.org/ml/gdb-patches/2015-02/msg00604.html
https://sourceware.org/ml/gdb-patches/2015-02/msg00605.html

In this path,
1. Changes for "unavailable-stack frames" are removed.
    I will send a separate patch for this.
2. Add testcases for bytecode compilation in ftrace.exp
    It is used to testing various emit_OP functions.
3. Some minor bug fixes.

Thanks,
Wei-cheng

--

gdb/ChangeLog

2015-02-25  Wei-cheng Wang  <cole945@gmail.com>

	* rs6000-tdep.c (ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction, ppc_gen_return_address): New functions.
	(rs6000_gdbarch_init): Hook ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction, and ppc_gen_return_address.

gdb/gdbserver/ChangeLog

2015-02-25  Wei-cheng Wang  <cole945@gmail.com>

	* Makefile.in (linux-ppc-ipa.o, powerpc-64l-ipa.o,
	powerpc-32l-ipa.o): New rules.
	* configure.srv (powerpc*-*-linux*): Add powerpc-64l-ipa.o,
	powerpc-32l-ipa.o, and linux-ppc-ipa.o in ipa_obj
	* linux-ppc-ipa.c: New file.
	* linux-ppc-low.c (ppc_supports_z_point_type, ppc_insert_point,
	ppc_remove_point, put_i32, get_i32, gen_ds_form, gen_d_form,
	gen_xfx_form, gen_x_form, gen_md_form, gen_i_form, gen_b_form,
	gen_limm, gen_atomic_xchg, gen_call, ppc_supports_tracepoints,
	ppc_install_fast_tracepoint_jump_pad,
	ppc_get_min_fast_tracepoint_insn_len, emit_insns, ppc64_emit_prologue,
	ppc64_emit_epilogue, ppc64_emit_add, ppc64_emit_sub, ppc64_emit_mul,
	ppc64_emit_lsh, ppc64_emit_rsh_signed, ppc64_emit_rsh_unsigned,
	ppc64_emit_ext, ppc64_emit_zero_ext, ppc64_emit_log_not,
	ppc64_emit_bit_and, ppc64_emit_bit_or, ppc64_emit_bit_xor,
	ppc64_emit_bit_not, ppc64_emit_equal, ppc64_emit_less_signed,
	ppc64_emit_less_unsigned, ppc64_emit_ref, ppc64_emit_const,
	ppc64_emit_reg, ppc64_emit_pop, ppc64_emit_stack_flush,
	ppc64_emit_swap, ppc64_emit_stack_adjust, ppc64_emit_call,
	ppc64_emit_int_call_1, ppc64_emit_void_call_2, ppc64_emit_if_goto,
	ppc64_emit_goto, ppc64_emit_eq_goto, ppc64_emit_ne_goto,
	ppc64_emit_lt_goto, ppc64_emit_le_goto, ppc64_emit_gt_goto,
	ppc64_emit_ge_goto, ppc_write_goto_address, ppc_emit_ops,
	ppc_supports_range_stepping, ppc_fast_tracepoint_valid_at,
	ppc_relocate_instruction): New functions.
	(ppc64_emit_ops_vector): New struct for bytecode compilation.
	(the_low_target): Add target ops - ppc_supports_z_point_type,
	ppc_insert_point, ppc_remove_point, ppc_supports_tracepoints,
	ppc_install_fast_tracepoint_jump_pad, ppc_emit_ops,
	ppc_get_min_fast_tracepoint_insn_len, ppc_supports_range_stepping.

gdb/testsuite/ChangeLog

	* gdb.trace/backtrace.exp: Set registers for powerpc*-*-*.
	* gdb.trace/collection.exp: Ditto.
	* gdb.trace/entry-values.exp: Ditto.
	* gdb.trace/mi-trace-frame-collected.exp: Ditto.
	* gdb.trace/mi-trace-unavailable.exp: Ditto.
	* gdb.trace/pending.exp: Ditto.
	* gdb.trace/report.exp: Ditto.
	* gdb.trace/trace-break.exp: Ditto.
	* gdb.trace/while-dyn.exp: Ditto.
	* gdb.trace/change-loc.h: set_point for powerpc.
	* gdb.trace/ftrace.c: Ditto
	* gdb.trace/pendshr1.c: Ditto.
	* gdb.trace/pendshr2.c: Ditto.
	* gdb.trace/range-stepping.c: Ditto.
	* gdb.trace/trace-break.c: Ditto.
	* gdb.trace/trace-mt.c: Ditto.
	* gdb.trace/ftrace.exp: Enable testing for powerpc*-*-*.
	(test_ftrace_condition) New function for testing bytecode compilation.
---
  gdb/gdbserver/Makefile.in                          |    9 +
  gdb/gdbserver/configure.srv                        |    1 +
  gdb/gdbserver/linux-ppc-ipa.c                      |  120 ++
  gdb/gdbserver/linux-ppc-low.c                      | 1268 +++++++++++++++++++-
  gdb/rs6000-tdep.c                                  |  164 +++
  gdb/testsuite/gdb.trace/backtrace.exp              |    3 +
  gdb/testsuite/gdb.trace/change-loc.h               |    2 +
  gdb/testsuite/gdb.trace/collection.exp             |    4 +
  gdb/testsuite/gdb.trace/entry-values.exp           |    2 +
  gdb/testsuite/gdb.trace/ftrace.c                   |    4 +
  gdb/testsuite/gdb.trace/ftrace.exp                 |   61 +-
  .../gdb.trace/mi-trace-frame-collected.exp         |    2 +
  gdb/testsuite/gdb.trace/mi-trace-unavailable.exp   |    2 +
  gdb/testsuite/gdb.trace/pending.exp                |    2 +
  gdb/testsuite/gdb.trace/pendshr1.c                 |    2 +
  gdb/testsuite/gdb.trace/pendshr2.c                 |    2 +
  gdb/testsuite/gdb.trace/range-stepping.c           |    2 +
  gdb/testsuite/gdb.trace/report.exp                 |    4 +
  gdb/testsuite/gdb.trace/trace-break.c              |    4 +
  gdb/testsuite/gdb.trace/trace-break.exp            |    4 +
  gdb/testsuite/gdb.trace/trace-mt.c                 |    2 +
  gdb/testsuite/gdb.trace/while-dyn.exp              |    2 +
  22 files changed, 1658 insertions(+), 8 deletions(-)
  create mode 100644 gdb/gdbserver/linux-ppc-ipa.c

diff --git a/gdb/gdbserver/Makefile.in b/gdb/gdbserver/Makefile.in
index e479c7c..6bdac3e 100644
--- a/gdb/gdbserver/Makefile.in
+++ b/gdb/gdbserver/Makefile.in
@@ -491,6 +491,15 @@ linux-amd64-ipa.o: linux-amd64-ipa.c
  amd64-linux-ipa.o: amd64-linux.c
  	$(IPAGENT_COMPILE) $<
  	$(POSTCOMPILE)
+linux-ppc-ipa.o: linux-ppc-ipa.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
+powerpc-64l-ipa.o: powerpc-64l.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
+powerpc-32l-ipa.o: powerpc-32l.c
+	$(IPAGENT_COMPILE) $<
+	$(POSTCOMPILE)
  tdesc-ipa.o: tdesc.c
  	$(IPAGENT_COMPILE) $<
  	$(POSTCOMPILE)
diff --git a/gdb/gdbserver/configure.srv b/gdb/gdbserver/configure.srv
index 127786e..e13daf1 100644
--- a/gdb/gdbserver/configure.srv
+++ b/gdb/gdbserver/configure.srv
@@ -245,6 +245,7 @@ case "${target}" in
  			srv_linux_usrregs=yes
  			srv_linux_regsets=yes
  			srv_linux_thread_db=yes
+			ipa_obj="powerpc-64l-ipa.o powerpc-32l-ipa.o linux-ppc-ipa.o"
  			;;
    powerpc-*-lynxos*)	srv_regobj="powerpc-32.o"
  			srv_tgtobj="lynx-low.o lynx-ppc-low.o"
diff --git a/gdb/gdbserver/linux-ppc-ipa.c b/gdb/gdbserver/linux-ppc-ipa.c
new file mode 100644
index 0000000..34b26d0
--- /dev/null
+++ b/gdb/gdbserver/linux-ppc-ipa.c
@@ -0,0 +1,120 @@
+/* GNU/Linux/PowerPC specific low level interface, for the in-process
+   agent library for GDB.
+
+   Copyright (C) 2010-2015 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "server.h"
+#include "tracepoint.h"
+
+#if defined __PPC64__
+void init_registers_powerpc_64l (void);
+extern const struct target_desc *tdesc_powerpc_64l;
+#define REGSZ		8
+#else
+void init_registers_powerpc_32l (void);
+extern const struct target_desc *tdesc_powerpc_32l;
+#define REGSZ		4
+#endif
+
+/* These macros define the position of registers in the buffer collected
+   by the fast tracepoint jump pad.  */
+#define FT_CR_PC	0
+#define FT_CR_R0	1
+#define FT_CR_CR	33
+#define FT_CR_XER	34
+#define FT_CR_LR	35
+#define FT_CR_CTR	36
+#define FT_CR_GPR(n)	(FT_CR_R0 + (n))
+
+static const int ppc_ft_collect_regmap[] = {
+  /* GPRs */
+  FT_CR_GPR (0), FT_CR_GPR (1), FT_CR_GPR (2),
+  FT_CR_GPR (3), FT_CR_GPR (4), FT_CR_GPR (5),
+  FT_CR_GPR (6), FT_CR_GPR (7), FT_CR_GPR (8),
+  FT_CR_GPR (9), FT_CR_GPR (10), FT_CR_GPR (11),
+  FT_CR_GPR (12), FT_CR_GPR (13), FT_CR_GPR (14),
+  FT_CR_GPR (15), FT_CR_GPR (16), FT_CR_GPR (17),
+  FT_CR_GPR (18), FT_CR_GPR (19), FT_CR_GPR (20),
+  FT_CR_GPR (21), FT_CR_GPR (22), FT_CR_GPR (23),
+  FT_CR_GPR (24), FT_CR_GPR (25), FT_CR_GPR (26),
+  FT_CR_GPR (27), FT_CR_GPR (28), FT_CR_GPR (29),
+  FT_CR_GPR (30), FT_CR_GPR (31),
+  /* FPRs - not collected.  */
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  -1, -1, -1, -1, -1, -1, -1, -1,
+  FT_CR_PC, /* PC */
+  -1, /* MSR */
+  FT_CR_CR, /* CR */
+  FT_CR_LR, /* LR */
+  FT_CR_CTR, /* CTR */
+  FT_CR_XER, /* XER */
+  -1, /* FPSCR */
+};
+
+#define PPC_NUM_FT_COLLECT_GREGS \
+  (sizeof (ppc_ft_collect_regmap) / sizeof(ppc_ft_collect_regmap[0]))
+
+/* Supply registers collected by the fast tracepoint jump pad.
+   BUF is the second argument we pass to gdb_collect in jump pad.  */
+
+void
+supply_fast_tracepoint_registers (struct regcache *regcache,
+				  const unsigned char *buf)
+{
+  int i;
+
+  for (i = 0; i < PPC_NUM_FT_COLLECT_GREGS; i++)
+    {
+      if (ppc_ft_collect_regmap[i] == -1)
+	continue;
+      supply_register (regcache, i,
+		       ((char *) buf)
+			+ ppc_ft_collect_regmap[i] * REGSZ);
+    }
+}
+
+/* Return the value of register REGNUM.  RAW_REGS is collected buffer
+   by jump pad.  This function is called by emit_reg.  */
+
+ULONGEST __attribute__ ((visibility("default"), used))
+gdb_agent_get_raw_reg (const unsigned char *raw_regs, int regnum)
+{
+  if (regnum >= PPC_NUM_FT_COLLECT_GREGS)
+    return 0;
+  if (ppc_ft_collect_regmap[regnum] == -1)
+    return 0;
+
+  return *(ULONGEST *) (raw_regs
+			+ ppc_ft_collect_regmap[regnum] * REGSZ);
+}
+
+/* Initialize ipa_tdesc and others.  */
+
+void
+initialize_low_tracepoint (void)
+{
+#if defined __PPC64__
+  init_registers_powerpc_64l ();
+  ipa_tdesc = tdesc_powerpc_64l;
+#else
+  init_registers_powerpc_32l ();
+  ipa_tdesc = tdesc_powerpc_32l;
+#endif
+}
diff --git a/gdb/gdbserver/linux-ppc-low.c b/gdb/gdbserver/linux-ppc-low.c
index 188fac0..0b47543 100644
--- a/gdb/gdbserver/linux-ppc-low.c
+++ b/gdb/gdbserver/linux-ppc-low.c
@@ -24,6 +24,8 @@
  #include <asm/ptrace.h>

  #include "nat/ppc-linux.h"
+#include "ax.h"
+#include "tracepoint.h"

  static unsigned long ppc_hwcap;

@@ -512,6 +514,1243 @@ ppc_breakpoint_at (CORE_ADDR where)
    return 0;
  }

+/* Implement supports_z_point_type target-ops.
+   Returns true if type Z_TYPE breakpoint is supported.
+
+   Handling software breakpoint at server side, so tracepoints
+   and breakpoints can be inserted at the same location.  */
+
+static int
+ppc_supports_z_point_type (char z_type)
+{
+  switch (z_type)
+    {
+    case Z_PACKET_SW_BP:
+      return 1;
+    case Z_PACKET_HW_BP:
+    case Z_PACKET_WRITE_WP:
+    case Z_PACKET_ACCESS_WP:
+    default:
+      return 0;
+    }
+}
+
+/* Implement insert_point target-ops.
+   Returns 0 on success, -1 on failure and 1 on unsupported.  */
+
+static int
+ppc_insert_point (enum raw_bkpt_type type, CORE_ADDR addr,
+		  int size, struct raw_breakpoint *bp)
+{
+  switch (type)
+    {
+    case raw_bkpt_type_sw:
+      return insert_memory_breakpoint (bp);
+
+    case raw_bkpt_type_hw:
+    case raw_bkpt_type_write_wp:
+    case raw_bkpt_type_access_wp:
+    default:
+      /* Unsupported.  */
+      return 1;
+    }
+}
+
+/* Implement remove_point target-ops.
+   Returns 0 on success, -1 on failure and 1 on unsupported.  */
+
+static int
+ppc_remove_point (enum raw_bkpt_type type, CORE_ADDR addr,
+		  int size, struct raw_breakpoint *bp)
+{
+  switch (type)
+    {
+    case raw_bkpt_type_sw:
+      return remove_memory_breakpoint (bp);
+
+    case raw_bkpt_type_hw:
+    case raw_bkpt_type_write_wp:
+    case raw_bkpt_type_access_wp:
+    default:
+      /* Unsupported.  */
+      return 1;
+    }
+}
+
+/* Put a 32-bit INSN instruction in BUF in target endian.  */
+
+static int
+put_i32 (unsigned char *buf, uint32_t insn)
+{
+  if (__BYTE_ORDER == __LITTLE_ENDIAN)
+    {
+      buf[3] = (insn >> 24) & 0xff;
+      buf[2] = (insn >> 16) & 0xff;
+      buf[1] = (insn >> 8) & 0xff;
+      buf[0] = insn & 0xff;
+    }
+  else
+    {
+      buf[0] = (insn >> 24) & 0xff;
+      buf[1] = (insn >> 16) & 0xff;
+      buf[2] = (insn >> 8) & 0xff;
+      buf[3] = insn & 0xff;
+    }
+
+  return 4;
+}
+
+/* return a 32-bit value in target endian in BUF.  */
+
+__attribute__((unused)) /* Maybe unused due to conditional compilation.  */
+static uint32_t
+get_i32 (unsigned char *buf)
+{
+  uint32_t r;
+
+  if (__BYTE_ORDER == __LITTLE_ENDIAN)
+    r = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0];
+  else
+    r = (buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3];
+
+  return r;
+}
+
+/* Generate a ds-form instruction in BUF and return the number of bytes written
+
+   0      6     11   16          30 32
+   | OPCD | RST | RA |     DS    |XO|  */
+
+__attribute__((unused)) /* Maybe unused due to conditional compilation.  */
+static int
+gen_ds_form (unsigned char *buf, int opcd, int rst, int ra, int ds, int xo)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (ds & 0xfffc) | (xo & 0x3);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used ds-form instructions.  */
+
+#define GEN_STD(buf, rs, ra, offset)	gen_ds_form (buf, 62, rs, ra, offset, 0)
+#define GEN_STDU(buf, rs, ra, offset)	gen_ds_form (buf, 62, rs, ra, offset, 1)
+#define GEN_LD(buf, rt, ra, offset)	gen_ds_form (buf, 58, rt, ra, offset, 0)
+#define GEN_LDU(buf, rt, ra, offset)	gen_ds_form (buf, 58, rt, ra, offset, 1)
+
+/* Generate a d-form instruction in BUF.
+
+   0      6     11   16             32
+   | OPCD | RST | RA |       D      |  */
+
+static int
+gen_d_form (unsigned char *buf, int opcd, int rst, int ra, int si)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (si & 0xffff);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used d-form instructions.  */
+
+#define GEN_ADDI(buf, rt, ra, si)	gen_d_form (buf, 14, rt, ra, si)
+#define GEN_ADDIS(buf, rt, ra, si)	gen_d_form (buf, 15, rt, ra, si)
+#define GEN_LI(buf, rt, si)		GEN_ADDI (buf, rt, 0, si)
+#define GEN_LIS(buf, rt, si)		GEN_ADDIS (buf, rt, 0, si)
+#define GEN_ORI(buf, rt, ra, si)	gen_d_form (buf, 24, rt, ra, si)
+#define GEN_ORIS(buf, rt, ra, si)	gen_d_form (buf, 25, rt, ra, si)
+#define GEN_LWZ(buf, rt, ra, si)	gen_d_form (buf, 32, rt, ra, si)
+#define GEN_STW(buf, rt, ra, si)	gen_d_form (buf, 36, rt, ra, si)
+
+/* Generate a xfx-form instruction in BUF and return the number of bytes
+   written.
+
+   0      6     11         21        31 32
+   | OPCD | RST |    RI    |    XO   |/|  */
+
+static int
+gen_xfx_form (unsigned char *buf, int opcd, int rst, int ri, int xo)
+{
+  uint32_t insn = opcd << 26;
+  unsigned int n = ((ri & 0x1f) << 5) | ((ri >> 5) & 0x1f);
+
+  insn |= (rst << 21) | (n << 11) | (xo << 1);
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used xfx-form instructions.  */
+
+#define GEN_MFSPR(buf, rt, spr)		gen_xfx_form (buf, 31, rt, spr, 339)
+#define GEN_MTSPR(buf, rt, spr)		gen_xfx_form (buf, 31, rt, spr, 467)
+
+/* Generate a x-form instruction in BUF and return the number of bytes written.
+
+   0      6     11   16   21       31 32
+   | OPCD | RST | RA | RB |   XO   |RC|  */
+
+static int
+gen_x_form (unsigned char *buf, int opcd, int rst, int ra, int rb,
+	    int xo, int rc)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (rst << 21) | (ra << 16) | (rb << 11) | (xo << 1) | rc;
+  return put_i32 (buf, insn);
+}
+
+/* Followings are frequently used x-form instructions.  */
+
+#define GEN_OR(buf, ra, rs, rb)		gen_x_form (buf, 31, rs, ra, rb, 444, 0)
+#define GEN_MR(buf, ra, rs)		GEN_OR (buf, ra, rs, rs)
+#define GEN_LWARX(buf, rt, ra, rb)	gen_x_form (buf, 31, rt, ra, rb, 20, 0)
+#define GEN_STWCX(buf, rs, ra, rb)	gen_x_form (buf, 31, rs, ra, rb, 150, 1)
+/* Assume bf = cr7.  */
+#define GEN_CMPW(buf, ra, rb)    gen_x_form (buf, 31, 28, ra, rb, 0, 0)
+
+/* Generate a md-form instruction in BUF and return the number of bytes written.
+
+   0      6    11   16   21   27   30 31 32
+   | OPCD | RS | RA | sh | mb | XO |sh|Rc|  */
+
+static int
+gen_md_form (unsigned char *buf, int opcd, int rs, int ra, int sh, int mb,
+	     int xo, int rc)
+{
+  uint32_t insn = opcd << 26;
+  unsigned int n = ((mb & 0x1f) << 1) | ((mb >> 5) & 0x1);
+  unsigned int sh0_4 = sh & 0x1f;
+  unsigned int sh5 = (sh >> 5) & 1;
+
+  insn |= (rs << 21) | (ra << 16) | (sh0_4 << 11) | (n << 5) | (sh5 << 1)
+	  | (xo << 2);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used md-form instructions.  */
+
+#define GEN_RLDICL(buf, ra, rs ,sh, mb) \
+				gen_md_form (buf, 30, rs, ra, sh, mb, 0, 0)
+#define GEN_RLDICR(buf, ra, rs ,sh, mb) \
+				gen_md_form (buf, 30, rs, ra, sh, mb, 1, 0)
+
+/* Generate a i-form instruction in BUF and return the number of bytes written.
+
+   0      6                          30 31 32
+   | OPCD |            LI            |AA|LK|  */
+
+static int
+gen_i_form (unsigned char *buf, int opcd, int li, int aa, int lk)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (li & 0x3fffffc) | (aa & 1) | (lk & 1);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used i-form instructions.  */
+
+#define GEN_B(buf, li)		gen_i_form (buf, 18, li, 0, 0)
+#define GEN_BL(buf, li)		gen_i_form (buf, 18, li, 0, 1)
+
+/* Generate a b-form instruction in BUF and return the number of bytes written.
+
+   0      6    11   16               30 31 32
+   | OPCD | BO | BI |      BD        |AA|LK|  */
+
+static int
+gen_b_form (unsigned char *buf, int opcd, int bo, int bi, int bd,
+	    int aa, int lk)
+{
+  uint32_t insn = opcd << 26;
+
+  insn |= (bo << 21) | (bi << 16) | (bd & 0xfffc) | (aa & 1) | (lk & 1);
+  return put_i32 (buf, insn);
+}
+
+/* The following are frequently used b-form instructions.  */
+/* Assume bi = cr7.  */
+#define GEN_BNE(buf, bd)  gen_b_form (buf, 16, 0x4, (7 << 2) | 2, bd, 0 ,0)
+
+/* GEN_LOAD and GEN_STORE generate 64- or 32-bit load/store for ppc64 or ppc32
+   respectively.  They are primary used for save/restore GPRs in jump-pad,
+   not used for bytecode compiling.  */
+
+#if defined __PPC64__
+#define GEN_LOAD(buf, rt, ra, si)	GEN_LD (buf, rt, ra, si)
+#define GEN_STORE(buf, rt, ra, si)	GEN_STD (buf, rt, ra, si)
+#else
+#define GEN_LOAD(buf, rt, ra, si)	GEN_LWZ (buf, rt, ra, si)
+#define GEN_STORE(buf, rt, ra, si)	GEN_STW (buf, rt, ra, si)
+#endif
+
+/* Generate a sequence of instructions to load IMM in the register REG.
+   Write the instructions in BUF and return the number of bytes written.  */
+
+static int
+gen_limm (unsigned char *buf, int reg, uint64_t imm)
+{
+  unsigned char *p = buf;
+
+  if ((imm >> 8) == 0)
+    {
+      /* li	reg, imm[7:0] */
+      p += GEN_LI (p, reg, imm);
+    }
+  else if ((imm >> 16) == 0)
+    {
+      /* li	reg, 0
+	 ori	reg, reg, imm[15:0] */
+      p += GEN_LI (p, reg, 0);
+      p += GEN_ORI (p, reg, reg, imm);
+    }
+  else if ((imm >> 32) == 0)
+    {
+      /* lis	reg, imm[31:16]
+	 ori	reg, reg, imm[15:0]
+	 rldicl	reg, reg, 0, 32 */
+      p += GEN_LIS (p, reg, (imm >> 16) & 0xffff);
+      p += GEN_ORI (p, reg, reg, imm & 0xffff);
+      p += GEN_RLDICL (p, reg, reg, 0, 32);
+    }
+  else
+    {
+      /* lis    reg, <imm[63:48]>
+	 ori    reg, reg, <imm[48:32]>
+	 rldicr reg, reg, 32, 31
+	 oris   reg, reg, <imm[31:16]>
+	 ori    reg, reg, <imm[15:0]> */
+      p += GEN_LIS (p, reg, ((imm >> 48) & 0xffff));
+      p += GEN_ORI (p, reg, reg, ((imm >> 32) & 0xffff));
+      p += GEN_RLDICR (p, reg, reg, 32, 31);
+      p += GEN_ORIS (p, reg, reg, ((imm >> 16) & 0xffff));
+      p += GEN_ORI (p, reg, reg, (imm & 0xffff));
+    }
+
+  return p - buf;
+}
+
+/* Generate a sequence for atomically exchange at location LOCK.
+   This code sequence clobbers r6, r7, r8, r9.  */
+
+static int
+gen_atomic_xchg (unsigned char *buf, CORE_ADDR lock, int old_value, int new_value)
+{
+  const int r_lock = 6;
+  const int r_old = 7;
+  const int r_new = 8;
+  const int r_tmp = 9;
+  unsigned char *p = buf;
+
+  /*
+  1: lwsync
+  2: lwarx   TMP, 0, LOCK
+     cmpwi   TMP, OLD
+     bne     1b
+     stwcx.  NEW, 0, LOCK
+     bne     2b */
+
+  p += gen_limm (p, r_lock, lock);
+  p += gen_limm (p, r_new, new_value);
+  p += gen_limm (p, r_old, old_value);
+
+  p += put_i32 (p, 0x7c2004ac);	/* lwsync */
+  p += GEN_LWARX (p, r_tmp, 0, r_lock);
+  p += GEN_CMPW (p, r_tmp, r_old);
+  p += GEN_BNE (p, -12);
+  p += GEN_STWCX (p, r_new, 0, r_lock);
+  p += GEN_BNE (p, -16);
+
+  return p - buf;
+}
+
+/* Generate a sequence of instructions for calling a function
+   at address of FN.  Return the number of bytes are written in BUF.
+
+   FIXME: For ppc64be, FN should be the address to the function
+   descriptor, so we should load 8(FN) to R2, 16(FN) to R11
+   and then call the function-entry at 0(FN).  However, current GDB
+   implicitly convert the address from function descriptor to the actual
+   function address. See qSymbol handling in remote.c.  Although it
+   seems we can successfully call however, things go wrong when callee
+   trying to access global variable.  */
+
+static int
+gen_call (unsigned char *buf, CORE_ADDR fn)
+{
+  unsigned char *p = buf;
+
+  /* Must be called by r12 for caller to calculate TOC address. */
+  p += gen_limm (p, 12, fn);
+  p += GEN_MTSPR (p, 12, 9);		/* mtctr  r12 */
+  p += put_i32 (p, 0x4e800421);		/* bctrl */
+
+  return p - buf;
+}
+
+/* Implement supports_tracepoints hook of target_ops.
+   Always return true.  */
+
+static int
+ppc_supports_tracepoints (void)
+{
+#if defined (__PPC64__) && _CALL_ELF == 2
+  return 1;
+#else
+  return 0;
+#endif
+}
+
+/* Implement install_fast_tracepoint_jump_pad of target_ops.
+   See target.h for details.  */
+
+static int
+ppc_install_fast_tracepoint_jump_pad (CORE_ADDR tpoint, CORE_ADDR tpaddr,
+				      CORE_ADDR collector,
+				      CORE_ADDR lockaddr,
+				      ULONGEST orig_size,
+				      CORE_ADDR *jump_entry,
+				      CORE_ADDR *trampoline,
+				      ULONGEST *trampoline_size,
+				      unsigned char *jjump_pad_insn,
+				      ULONGEST *jjump_pad_insn_size,
+				      CORE_ADDR *adjusted_insn_addr,
+				      CORE_ADDR *adjusted_insn_addr_end,
+				      char *err)
+{
+  unsigned char buf[1024];
+  unsigned char *p = buf;
+  int j, offset;
+  CORE_ADDR buildaddr = *jump_entry;
+  const CORE_ADDR entryaddr = *jump_entry;
+#if __PPC64__
+  const int rsz = 8;
+#else
+  const int rsz = 4;
+#endif
+  const int frame_size = (((37 * rsz) + 112) + 0xf) & ~0xf;
+
+  /* Stack frame layout for this jump pad,
+
+     High	CTR   -8(sp)
+		LR   -16(sp)
+		XER
+		CR
+		R31
+		R29
+		...
+		R1
+		R0
+     Low	PC/<tpaddr>
+
+     The code flow of this jump pad,
+
+     1. Save GPR and SPR
+     3. Adjust SP
+     4. Prepare argument
+     5. Call gdb_collector
+     6. Restore SP
+     7. Restore GPR and SPR
+     8. Build a jump for back to the program
+     9. Copy/relocate original instruction
+    10. Build a jump for replacing orignal instruction.  */
+
+  for (j = 0; j < 32; j++)
+    p += GEN_STORE (p, j, 1, (-rsz * 36 + j * rsz));
+
+  /* Save PC<tpaddr>  */
+  p += gen_limm (p, 3, tpaddr);
+  p += GEN_STORE (p, 3, 1, (-rsz * 37));
+
+  /* Save CR, XER, LR, and CTR.  */
+  p += put_i32 (p, 0x7c600026);			/* mfcr   r3 */
+  p += GEN_MFSPR (p, 4, 1);			/* mfxer  r4 */
+  p += GEN_MFSPR (p, 5, 8);			/* mflr   r5 */
+  p += GEN_MFSPR (p, 6, 9);			/* mfctr  r6 */
+  p += GEN_STORE (p, 3, 1, -4 * rsz);		/* std    r3, -32(r1) */
+  p += GEN_STORE (p, 4, 1, -3 * rsz);		/* std    r4, -24(r1) */
+  p += GEN_STORE (p, 5, 1, -2 * rsz);		/* std    r5, -16(r1) */
+  p += GEN_STORE (p, 6, 1, -1 * rsz);		/* std    r6, -8(r1) */
+
+  /* Adjust stack pointer.  */
+  p += GEN_ADDI (p, 1, 1, -frame_size);		/* subi   r1,r1,FRAME_SIZE */
+
+  /* Setup arguments to collector.  */
+
+  /* Set r4 to collected registers.  */
+  p += GEN_ADDI (p, 4, 1, frame_size - rsz * 37);
+  /* Set r3 to TPOINT.  */
+  p += gen_limm (p, 3, tpoint);
+
+  p += gen_atomic_xchg (p, lockaddr, 0, 1);
+  /* Call to collector.  */
+  p += gen_call (p, collector);
+  p += gen_atomic_xchg (p, lockaddr, 1, 0);
+
+  /* Restore stack and registers.  */
+  p += GEN_ADDI (p, 1, 1, frame_size);	/* addi	r1,r1,FRAME_SIZE */
+  p += GEN_LOAD (p, 3, 1, -4 * rsz);	/* ld	r3, -32(r1) */
+  p += GEN_LOAD (p, 4, 1, -3 * rsz);	/* ld	r4, -24(r1) */
+  p += GEN_LOAD (p, 5, 1, -2 * rsz);	/* ld	r5, -16(r1) */
+  p += GEN_LOAD (p, 6, 1, -1 * rsz);	/* ld	r6, -8(r1) */
+  p += put_i32 (p, 0x7c6ff120);		/* mtcr	r3 */
+  p += GEN_MTSPR (p, 4, 1);		/* mtxer  r4 */
+  p += GEN_MTSPR (p, 5, 8);		/* mtlr   r5 */
+  p += GEN_MTSPR (p, 6, 9);		/* mtctr  r6 */
+  for (j = 0; j < 32; j++)
+    p += GEN_LOAD (p, j, 1, (-rsz * 36 + j * rsz));
+
+  /* Flush instructions to inferior memory.  */
+  write_inferior_memory (buildaddr, buf, (p - buf));
+
+  /* Now, insert the original instruction to execute in the jump pad.  */
+  *adjusted_insn_addr = buildaddr + (p - buf);
+  *adjusted_insn_addr_end = *adjusted_insn_addr;
+  relocate_instruction (adjusted_insn_addr_end, tpaddr);
+
+  /* Verify the relocation size.  If should be 4 for normal copy, or 8
+     for some conditional branch.  */
+  if ((*adjusted_insn_addr_end - *adjusted_insn_addr == 0)
+      || (*adjusted_insn_addr_end - *adjusted_insn_addr > 8))
+    {
+      sprintf (err, "E.Unexpected instruction length = %d"
+		    "when relocate instruction.",
+		    (int) (*adjusted_insn_addr_end - *adjusted_insn_addr));
+      return 1;
+    }
+
+  buildaddr = *adjusted_insn_addr_end;
+  p = buf;
+  /* Finally, write a jump back to the program.  */
+  offset = (tpaddr + 4) - buildaddr;
+  if (offset >= (1 << 26) || offset < -(1 << 26))
+    {
+      sprintf (err, "E.Jump back from jump pad too far from tracepoint "
+		    "(offset 0x%x > 26-bit).", offset);
+      return 1;
+    }
+  /* b <tpaddr+4> */
+  p += GEN_B (p, offset);
+  write_inferior_memory (buildaddr, buf, (p - buf));
+  *jump_entry = buildaddr + (p - buf);
+
+  /* The jump pad is now built.  Wire in a jump to our jump pad.  This
+     is always done last (by our caller actually), so that we can
+     install fast tracepoints with threads running.  This relies on
+     the agent's atomic write support.  */
+  offset = entryaddr - tpaddr;
+  if (offset >= (1 << 25) || offset < -(1 << 25))
+    {
+      sprintf (err, "E.Jump back from jump pad too far from tracepoint "
+		    "(offset 0x%x > 26-bit).", offset);
+      return 1;
+    }
+  /* b <jentry> */
+  GEN_B (jjump_pad_insn, offset);
+  *jjump_pad_insn_size = 4;
+
+  return 0;
+}
+
+/* Returns the minimum instruction length for installing a tracepoint.  */
+
+static int
+ppc_get_min_fast_tracepoint_insn_len ()
+{
+  return 4;
+}
+
+#if __PPC64__
+
+static void
+emit_insns (unsigned char *buf, int n)
+{
+  write_inferior_memory (current_insn_ptr, buf, n);
+  current_insn_ptr += n;
+}
+
+#define __EMIT_ASM(NAME, INSNS)					\
+  do								\
+    {								\
+      extern unsigned char start_bcax_ ## NAME [];		\
+      extern unsigned char end_bcax_ ## NAME [];		\
+      emit_insns (start_bcax_ ## NAME,				\
+		  end_bcax_ ## NAME - start_bcax_ ## NAME);	\
+      __asm__ (".section .text.__ppcbcax\n\t"			\
+	       "start_bcax_" #NAME ":\n\t"			\
+	       INSNS "\n\t"					\
+	       "end_bcax_" #NAME ":\n\t"			\
+	       ".previous\n\t");				\
+    } while (0)
+
+#define _EMIT_ASM(NAME, INSNS)		__EMIT_ASM (NAME, INSNS)
+#define EMIT_ASM(INSNS)			_EMIT_ASM (__LINE__, INSNS)
+
+/*
+
+  Bytecode execution stack frame
+
+	|  Parameter save area    (SP + 48) [8 doublewords]
+	|  TOC save area          (SP + 40)
+	|  link editor doubleword (SP + 32)
+	|  compiler doubleword    (SP + 24)  save TOP here during call
+	|  LR save area           (SP + 16)
+	|  CR save area           (SP + 8)
+ SP' -> +- Back chain             (SP + 0)
+	|  Save r31
+	|  Save r30
+	|  Save r4    for *value
+	|  Save r3    for CTX
+ r30 -> +- Bytecode execution stack
+	|
+	|  64-byte (8 doublewords) at initial.  Expand stack as needed.
+	|
+ r31 -> +-
+
+  initial frame size
+  = (48 + 8 * 8) + (4 * 8) + 64
+  = 112 + 96
+  = 208
+
+   r31 is the frame-base for restoring stack-pointer.
+   r30 is the stack-pointer for bytecode machine.
+       It should point to next-empty, so we can use LDU for pop.
+   r3  is used for cache of TOP value.
+       It is the first argument, pointer to CTX.
+   r4  is the second argument, pointer to the result.
+   SP+24 is used for saving TOP during call.
+
+ Note:
+ * To restore stack at epilogue
+   => sp = r31 + 208
+ * To check stack is big enough for bytecode execution.
+   => r30 - 8 > SP + 112
+ * To return execution result.
+   => 0(r4) = TOP
+
+ */
+
+enum { bc_framesz = 208 };
+
+/* Emit prologue in inferior memory.  See above comments.  */
+
+static void
+ppc64_emit_prologue (void)
+{
+  EMIT_ASM ("mflr  0		\n"
+	    "std   0, 16(1)	\n"
+	    "std   31, -8(1)	\n"
+	    "std   30, -16(1)	\n"
+	    "std   4, -24(1)	\n"
+	    "std   3, -32(1)	\n"
+	    "addi  30, 1, -40	\n"
+	    "li	   3, 0		\n"
+	    "stdu  1, -208(1)	\n"
+	    "mr	   31, 1	\n");
+}
+
+/* Emit epilogue in inferior memory.  See above comments.  */
+
+static void
+ppc64_emit_epilogue (void)
+{
+  EMIT_ASM (/* Restore SP.  */
+	    "addi  1, 31, 208	\n"
+	    /* *result = TOP */
+	    "ld    4, -24(1)	\n"
+	    "std   3, 0(4)	\n"
+	    /* Return 0 for no-erro.  */
+	    "li    3, 0		\n"
+	    "ld    0, 16(1)	\n"
+	    "ld    31, -8(1)	\n"
+	    "ld    30, -16(1)	\n"
+	    "mtlr  0		\n"
+	    "blr		\n");
+}
+
+/* TOP = stack[--sp] + TOP  */
+
+static void
+ppc64_emit_add (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "add  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] - TOP  */
+
+static void
+ppc64_emit_sub (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "sub  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] * TOP  */
+
+static void
+ppc64_emit_mul (void)
+{
+  EMIT_ASM ("ldu    4, 8(30)	\n"
+	    "mulld  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] << TOP  */
+
+static void
+ppc64_emit_lsh (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "sld  3, 4, 3	\n");
+}
+
+/* Top = stack[--sp] >> TOP
+   (Arithmetic shift right)  */
+
+static void
+ppc64_emit_rsh_signed (void)
+{
+  EMIT_ASM ("ldu   4, 8(30)	\n"
+	    "srad  3, 4, 3	\n");
+}
+
+/* Top = stack[--sp] >> TOP
+   (Logical shift right)  */
+
+static void
+ppc64_emit_rsh_unsigned (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "srd  3, 4, 3	\n");
+}
+
+/* Emit code for signed-extension specified by ARG.  */
+
+static void
+ppc64_emit_ext (int arg)
+{
+  switch (arg)
+    {
+    case 8:
+      EMIT_ASM ("extsb  3, 3");
+      break;
+    case 16:
+      EMIT_ASM ("extsh  3, 3");
+      break;
+    case 32:
+      EMIT_ASM ("extsw  3, 3");
+      break;
+    default:
+      emit_error = 1;
+    }
+}
+
+/* Emit code for zero-extension specified by ARG.  */
+
+static void
+ppc64_emit_zero_ext (int arg)
+{
+  switch (arg)
+    {
+    case 8:
+      EMIT_ASM ("rldicl 3,3,0,56");
+      break;
+    case 16:
+      EMIT_ASM ("rldicl 3,3,0,48");
+      break;
+    case 32:
+      EMIT_ASM ("rldicl 3,3,0,32");
+      break;
+    default:
+      emit_error = 1;
+    }
+}
+
+/* TOP = !TOP
+   i.e., TOP = (TOP == 0) ? 1 : 0;  */
+
+static void
+ppc64_emit_log_not (void)
+{
+  EMIT_ASM ("cntlzd  3, 3	\n"
+	    "srdi    3, 3, 6	\n");
+}
+
+/* TOP = stack[--sp] & TOP  */
+
+static void
+ppc64_emit_bit_and (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "and  3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] | TOP  */
+
+static void
+ppc64_emit_bit_or (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "or   3, 4, 3	\n");
+}
+
+/* TOP = stack[--sp] ^ TOP  */
+
+static void
+ppc64_emit_bit_xor (void)
+{
+  EMIT_ASM ("ldu  4, 8(30)	\n"
+	    "xor  3, 4, 3	\n");
+}
+
+/* TOP = ~TOP
+   i.e., TOP = ~(TOP | TOP)  */
+
+static void
+ppc64_emit_bit_not (void)
+{
+  EMIT_ASM ("nor  3, 3, 3	\n");
+}
+
+/* TOP = stack[--sp] == TOP  */
+
+static void
+ppc64_emit_equal (void)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "xor     3, 3, 4	\n"
+	    "cntlzd  3, 3	\n"
+	    "srdi    3, 3, 6	\n");
+}
+
+/* TOP = stack[--sp] < TOP
+   (Signed comparison)  */
+
+static void
+ppc64_emit_less_signed (void)
+{
+  EMIT_ASM ("ldu     4, 8(30)		\n"
+	    "cmpd    7, 4, 3		\n"
+	    "mfocrf  3, 1		\n"
+	    "rlwinm  3, 3, 29, 31, 31	\n");
+}
+
+/* TOP = stack[--sp] < TOP
+   (Unsigned comparison)  */
+
+static void
+ppc64_emit_less_unsigned (void)
+{
+  EMIT_ASM ("ldu     4, 8(30)		\n"
+	    "cmpld   7, 4, 3		\n"
+	    "mfocrf  3, 1		\n"
+	    "rlwinm  3, 3, 29, 31, 31	\n");
+}
+
+/* Access the memory address in TOP in size of SIZE.
+   Zero-extend the read value.  */
+
+static void
+ppc64_emit_ref (int size)
+{
+  switch (size)
+    {
+    case 1:
+      EMIT_ASM ("lbz   3, 0(3)");
+      break;
+    case 2:
+      EMIT_ASM ("lhz   3, 0(3)");
+      break;
+    case 4:
+      EMIT_ASM ("lwz   3, 0(3)");
+      break;
+    case 8:
+      EMIT_ASM ("ld    3, 0(3)");
+      break;
+    }
+}
+
+/* TOP = NUM  */
+
+static void
+ppc64_emit_const (LONGEST num)
+{
+  unsigned char buf[5 * 4];
+  unsigned char *p = buf;
+
+  p += gen_limm (p, 3, num);
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += (p - buf);
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* Set TOP to the value of register REG by calling get_raw_reg function
+   with two argument, collected buffer and register number.  */
+
+static void
+ppc64_emit_reg (int reg)
+{
+  unsigned char buf[10 * 4];
+  unsigned char *p = buf;
+
+  p += GEN_LD (p, 3, 31, bc_framesz - 32);
+  p += GEN_LD (p, 3, 3, 48);	/* offsetof (fast_tracepoint_ctx, regs) */
+  p += GEN_LI (p, 4, reg);	/* mr	r4, reg */
+  p += gen_call (p, get_raw_reg_func_addr ());
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += p - buf;
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* TOP = stack[--sp] */
+
+static void
+ppc64_emit_pop (void)
+{
+  EMIT_ASM ("ldu  3, 8(30)");
+}
+
+/* stack[sp++] = TOP
+
+   Because we may use up bytecode stack, expand 8 doublewords more
+   if needed.  */
+
+static void
+ppc64_emit_stack_flush (void)
+{
+  /* Make sure bytecode stack is big enough before push.
+     Otherwise, expand 64-byte more.  */
+
+  EMIT_ASM ("  std   3, 0(30)		\n"
+	    "  addi  4, 30, -(112 + 8)	\n"
+	    "  cmpd  7, 4, 1		\n"
+	    "  bgt   1f			\n"
+	    "  ld    4, 0(1)		\n"
+	    "  addi  1, 1, -64		\n"
+	    "  std   4, 0(1)		\n"
+	    "1:addi  30, 30, -8		\n");
+}
+
+/* Swap TOP and stack[sp-1]  */
+
+static void
+ppc64_emit_swap (void)
+{
+  EMIT_ASM ("ld   4, 8(30)	\n"
+	    "std  3, 8(30)	\n"
+	    "mr   3, 4		\n");
+}
+
+/* Discard N elements in the stack.  */
+
+static void
+ppc64_emit_stack_adjust (int n)
+{
+  unsigned char buf[4];
+  unsigned char *p = buf;
+
+  p += GEN_ADDI (p, 30, 30, n << 3);	/* addi	r30, r30, (n << 3) */
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += p - buf;
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* Call function FN.  */
+
+static void
+ppc64_emit_call (CORE_ADDR fn)
+{
+  unsigned char buf[8 * 4];
+  unsigned char *p = buf;
+
+  p += gen_call (p, fn);
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += p - buf;
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* FN's prototype is `LONGEST(*fn)(int)'.
+   TOP = fn (arg1)
+  */
+
+static void
+ppc64_emit_int_call_1 (CORE_ADDR fn, int arg1)
+{
+  unsigned char buf[8 * 4];
+  unsigned char *p = buf;
+
+  /* Setup argument.  arg1 is a 16-bit value.  */
+  p += GEN_LI (p, 3, arg1);		/* li	r3, arg1 */
+  p += gen_call (p, fn);
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += p - buf;
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* FN's prototype is `void(*fn)(int,LONGEST)'.
+   fn (arg1, TOP)
+
+   TOP should be preserved/restored before/after the call.  */
+
+static void
+ppc64_emit_void_call_2 (CORE_ADDR fn, int arg1)
+{
+  unsigned char buf[12 * 4];
+  unsigned char *p = buf;
+
+  /* Save TOP */
+  p += GEN_STD (p, 3, 31, bc_framesz + 24);
+
+  /* Setup argument.  arg1 is a 16-bit value.  */
+  p += GEN_MR (p, 4, 3);		/* mr	r4, r3 */
+  p += GEN_LI (p, 3, arg1);	/* li	r3, arg1 */
+  p += gen_call (p, fn);
+
+  /* Restore TOP */
+  p += GEN_LD (p, 3, 31, bc_framesz + 24);
+
+  write_inferior_memory (current_insn_ptr, buf, (p - buf));
+  current_insn_ptr += p - buf;
+  gdb_assert ((p - buf) <= sizeof (buf));
+}
+
+/* Note in the following goto ops:
+
+   When emitting goto, the target address is later relocated by
+   write_goto_address.  OFFSET_P is the offset of the branch instruction
+   in the code sequence, and SIZE_P is how to relocate the instruction,
+   recognized by ppc_write_goto_address.  In current implementation,
+   SIZE can be either 24 or 14 for branch of conditional-branch instruction.
+ */
+
+/* If TOP is true, goto somewhere.  Otherwise, just fall-through.  */
+
+static void
+ppc64_emit_if_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("mr     4, 3	\n"
+	    "ldu    3, 8(30)	\n"
+	    "cmpdi  7, 4, 0	\n"
+	    "1:bne  7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Unconditional goto.  */
+
+static void
+ppc64_emit_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("1:b	1b");
+
+  if (offset_p)
+    *offset_p = 0;
+  if (size_p)
+    *size_p = 24;
+}
+
+/* Goto if stack[--sp] == TOP  */
+
+static void
+ppc64_emit_eq_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:beq   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] != TOP  */
+
+static void
+ppc64_emit_ne_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bne   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] < TOP  */
+
+static void
+ppc64_emit_lt_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:blt   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] <= TOP  */
+
+static void
+ppc64_emit_le_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:ble   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] > TOP  */
+
+static void
+ppc64_emit_gt_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bgt   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Goto if stack[--sp] >= TOP  */
+
+static void
+ppc64_emit_ge_goto (int *offset_p, int *size_p)
+{
+  EMIT_ASM ("ldu     4, 8(30)	\n"
+	    "cmpd    7, 4, 3	\n"
+	    "ldu     3, 8(30)	\n"
+	    "1:bge   7, 1b	\n");
+
+  if (offset_p)
+    *offset_p = 12;
+  if (size_p)
+    *size_p = 14;
+}
+
+/* Relocate previous emitted branch instruction.  FROM is the address
+   of the branch instruction, TO is the goto target address, and SIZE
+   if the value we set by *SIZE_P before.  Currently, it is either
+   24 or 14 of branch and conditional-branch instruction.  */
+
+static void
+ppc_write_goto_address (CORE_ADDR from, CORE_ADDR to, int size)
+{
+  int rel = to - from;
+  uint32_t insn;
+  int opcd;
+  unsigned char buf[4];
+
+  read_inferior_memory (from, buf, 4);
+  insn = get_i32 (buf);
+  opcd = (insn >> 26) & 0x3f;
+
+  switch (size)
+    {
+    case 14:
+      if (opcd != 16)
+	emit_error = 1;
+      insn = (insn & ~0xfffc) | (rel & 0xfffc);
+      break;
+    case 24:
+      if (opcd != 18)
+	emit_error = 1;
+      insn = (insn & ~0x3fffffc) | (rel & 0x3fffffc);
+      break;
+    default:
+      emit_error = 1;
+    }
+
+  put_i32 (buf, insn);
+  write_inferior_memory (from, buf, 4);
+}
+
+/* Vector of emit ops for PowerPC64.  */
+
+static struct emit_ops ppc64_emit_ops_vector =
+{
+  ppc64_emit_prologue,
+  ppc64_emit_epilogue,
+  ppc64_emit_add,
+  ppc64_emit_sub,
+  ppc64_emit_mul,
+  ppc64_emit_lsh,
+  ppc64_emit_rsh_signed,
+  ppc64_emit_rsh_unsigned,
+  ppc64_emit_ext,
+  ppc64_emit_log_not,
+  ppc64_emit_bit_and,
+  ppc64_emit_bit_or,
+  ppc64_emit_bit_xor,
+  ppc64_emit_bit_not,
+  ppc64_emit_equal,
+  ppc64_emit_less_signed,
+  ppc64_emit_less_unsigned,
+  ppc64_emit_ref,
+  ppc64_emit_if_goto,
+  ppc64_emit_goto,
+  ppc_write_goto_address,
+  ppc64_emit_const,
+  ppc64_emit_call,
+  ppc64_emit_reg,
+  ppc64_emit_pop,
+  ppc64_emit_stack_flush,
+  ppc64_emit_zero_ext,
+  ppc64_emit_swap,
+  ppc64_emit_stack_adjust,
+  ppc64_emit_int_call_1,
+  ppc64_emit_void_call_2,
+  ppc64_emit_eq_goto,
+  ppc64_emit_ne_goto,
+  ppc64_emit_lt_goto,
+  ppc64_emit_le_goto,
+  ppc64_emit_gt_goto,
+  ppc64_emit_ge_goto
+};
+
+/*  Implementation of emit_ops target ops.   */
+
+__attribute__ ((unused))
+static struct emit_ops *
+ppc_emit_ops (void)
+{
+  return &ppc64_emit_ops_vector;
+}
+#endif
+
+/* Returns true for supporting range-stepping.  */
+
+static int
+ppc_supports_range_stepping (void)
+{
+  return 1;
+}
+
  /* Provide only a fill function for the general register set.  ps_lgetregs
     will use this for NPTL support.  */

@@ -687,16 +1926,31 @@ struct linux_target_ops the_low_target = {
    ppc_set_pc,
    (const unsigned char *) &ppc_breakpoint,
    ppc_breakpoint_len,
-  NULL,
-  0,
+  NULL, /* breakpoint_reinsert_addr */
+  0, /* decr_pc_after_break */
    ppc_breakpoint_at,
-  NULL, /* supports_z_point_type */
-  NULL,
-  NULL,
-  NULL,
-  NULL,
+  ppc_supports_z_point_type, /* supports_z_point_type */
+  ppc_insert_point,
+  ppc_remove_point,
+  NULL, /* stopped_by_watchpoint */
+  NULL, /* stopped_data_address */
    ppc_collect_ptrace_register,
    ppc_supply_ptrace_register,
+  NULL, /* siginfo_fixup */
+  NULL, /* linux_new_process */
+  NULL, /* linux_new_thread */
+  NULL, /* linux_prepare_to_resume */
+  NULL, /* linux_process_qsupported */
+  ppc_supports_tracepoints,
+  NULL, /* get_thread_area */
+  ppc_install_fast_tracepoint_jump_pad,
+#if __PPC64__
+  ppc_emit_ops,
+#else
+  NULL, /* Use interpreter for ppc32.  */
+#endif
+  ppc_get_min_fast_tracepoint_insn_len,
+  ppc_supports_range_stepping,
  };

  void
diff --git a/gdb/rs6000-tdep.c b/gdb/rs6000-tdep.c
index ef94bba..dc27cfb 100644
--- a/gdb/rs6000-tdep.c
+++ b/gdb/rs6000-tdep.c
@@ -83,6 +83,9 @@
  #include "features/rs6000/powerpc-e500.c"
  #include "features/rs6000/rs6000.c"

+#include "ax.h"
+#include "ax-gdb.h"
+
  /* Determine if regnum is an SPE pseudo-register.  */
  #define IS_SPE_PSEUDOREG(tdep, regnum) ((tdep)->ppc_ev0_regnum >= 0 \
      && (regnum) >= (tdep)->ppc_ev0_regnum \
@@ -966,6 +969,21 @@ rs6000_breakpoint_from_pc (struct gdbarch *gdbarch, CORE_ADDR *bp_addr,
      return little_breakpoint;
  }

+/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
+   to the number of bytes the target should copy elsewhere for the
+   tracepoint.  */
+
+static int
+ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
+			      CORE_ADDR addr, int *isize, char **msg)
+{
+  if (isize)
+    *isize = gdbarch_max_insn_length (gdbarch);
+  if (msg)
+    *msg = NULL;
+  return 1;
+}
+
  /* Instruction masks for displaced stepping.  */
  #define BRANCH_MASK 0xfc000000
  #define BP_MASK 0xFC0007FE
@@ -3679,6 +3697,8 @@ bfd_uses_spe_extensions (bfd *abfd)
  #define PPC_LK(insn)	PPC_BIT (insn, 31)
  #define PPC_TX(insn)	PPC_BIT (insn, 31)
  #define PPC_LEV(insn)	PPC_FIELD (insn, 20, 7)
+#define PPC_LI(insn)	(PPC_SEXT (PPC_FIELD (insn, 6, 24), 24) << 2)
+#define PPC_BD(insn)	(PPC_SEXT (PPC_FIELD (insn, 16, 14), 14) << 2)

  #define PPC_XT(insn)	((PPC_TX (insn) << 5) | PPC_T (insn))
  #define PPC_XER_NB(xer)	(xer & 0x7f)
@@ -5332,6 +5352,146 @@ UNKNOWN_OP:
    return 0;
  }

+/* Copy the instruction from OLDLOC to *TO, and update *TO to *TO + size
+   of instruction.  This function is used to adjust pc-relative instructions
+   when copying.  */
+
+static void
+ppc_relocate_instruction (struct gdbarch *gdbarch,
+			  CORE_ADDR *to, CORE_ADDR oldloc)
+{
+  struct gdbarch_tdep *tdep = gdbarch_tdep (gdbarch);
+  enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
+  uint32_t insn;
+  int op6, rel, newrel;
+
+  insn = read_memory_unsigned_integer (oldloc, 4, byte_order);
+  op6 = PPC_OP6 (insn);
+
+  if (op6 == 18 && (insn & 2) == 0)
+    {
+      /* branch && AA = 0 */
+      rel = PPC_LI (insn);
+      newrel = (oldloc - *to) + rel;
+
+      /* Out of range. Cannot relocate instruction.  */
+      if (newrel >= (1 << 25) || newrel < -(1 << 25))
+	return;
+
+      insn = (insn & ~0x3fffffc) | (newrel & 0x3fffffc);
+    }
+  else if (op6 == 16 && (insn & 2) == 0)
+    {
+      /* conditional branch && AA = 0 */
+
+      rel = PPC_BD (insn);
+      newrel = (oldloc - *to) + rel;
+
+      if (newrel >= (1 << 25) || newrel < -(1 << 25))
+	return;
+
+      newrel -= 4;
+      if (newrel >= (1 << 15) || newrel < -(1 << 15))
+	{
+	   /* The offset of to big for conditional-branch (16-bit).
+	      Try to invert the condition and jump with 26-bit branch.
+	      For example,
+
+		beq  .Lgoto
+		INSN1
+
+	      =>
+
+		bne  1f
+		b    .Lgoto
+	      1:INSN1
+
+	    */
+
+	   /* Check whether BO is 001at or 011 at.  */
+	   if ((PPC_BO (insn) & 0x14) != 0x4)
+	     return;
+
+	   /* Invert condition.  */
+	   insn ^= (1 << 24);
+	   /* Jump over the unconditional branch.  */
+	   insn = (insn & ~0xfffc) | 0x8;
+	   write_memory_unsigned_integer (*to, 4, byte_order, insn);
+	   *to += 4;
+
+	   /* Copy LK bit.  */
+	   insn = (18 << 26) | (0x3fffffc & newrel) | (insn & 0x3);
+	   write_memory_unsigned_integer (*to, 4, byte_order, insn);
+	   *to += 4;
+
+	   return;
+	}
+      else
+	insn = (insn & ~0xfffc) | (newrel & 0xfffc);
+    }
+
+  write_memory_unsigned_integer (*to, 4, byte_order, insn);
+  *to += 4;
+}
+
+/* Implement gdbarch_gen_return_address.  Generate a bytecode expression
+   to get the value of the saved PC.  SCOPE is the address we want to
+   get return address for.  SCOPE maybe in the middle of a function.  */
+
+static void
+ppc_gen_return_address (struct gdbarch *gdbarch,
+			struct agent_expr *ax, struct axs_value *value,
+			CORE_ADDR scope)
+{
+  struct rs6000_framedata frame;
+  CORE_ADDR func_addr;
+
+  /* Try to find the start of the function and analyze the prologue.  */
+  if (find_pc_partial_function (scope, NULL, &func_addr, NULL))
+    {
+      skip_prologue (gdbarch, func_addr, scope, &frame);
+
+      if (frame.lr_offset == 0)
+	{
+	  value->type = register_type (gdbarch, PPC_LR_REGNUM);
+	  value->kind = axs_lvalue_register;
+	  value->u.reg = PPC_LR_REGNUM;
+	  return;
+	}
+    }
+  else
+    {
+      /* If we don't where the function starts, we cannot analyze it.
+	 Assuming it's not a leaf function, not frameless, and LR is
+	 saved at back-chain + 16.  */
+
+      frame.frameless = 0;
+      frame.lr_offset = 16;
+    }
+
+  /* if (frameless)
+       load 16(SP)
+     else
+       BC = 0(SP)
+       load 16(BC) */
+
+  ax_reg (ax, gdbarch_sp_regnum (gdbarch));
+
+  /* Load back-chain.  */
+  if (!frame.frameless)
+    {
+      if (register_size (gdbarch, PPC_LR_REGNUM) == 8)
+	ax_simple (ax, aop_ref64);
+      else
+	ax_simple (ax, aop_ref32);
+    }
+
+  ax_const_l (ax, frame.lr_offset);
+  ax_simple (ax, aop_add);
+  value->type = register_type (gdbarch, PPC_LR_REGNUM);
+  value->kind = axs_lvalue_memory;
+}
+
  /* Initialize the current architecture based on INFO.  If possible, re-use an
     architecture from ARCHES, which is a list of architectures already created
     during this debugging session.
@@ -5892,6 +6052,7 @@ rs6000_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)

    set_gdbarch_inner_than (gdbarch, core_addr_lessthan);
    set_gdbarch_breakpoint_from_pc (gdbarch, rs6000_breakpoint_from_pc);
+  set_gdbarch_fast_tracepoint_valid_at (gdbarch, ppc_fast_tracepoint_valid_at);

    /* The value of symbols of type N_SO and N_FUN maybe null when
       it shouldn't be.  */
@@ -5929,6 +6090,9 @@ rs6000_gdbarch_init (struct gdbarch_info info, struct gdbarch_list *arches)
    set_gdbarch_displaced_step_location (gdbarch,
  				       displaced_step_at_entry_point);

+  set_gdbarch_relocate_instruction (gdbarch, ppc_relocate_instruction);
+  set_gdbarch_gen_return_address (gdbarch, ppc_gen_return_address);
+
    set_gdbarch_max_insn_length (gdbarch, PPC_INSN_SIZE);

    /* Hook in ABI-specific overrides, if they have been registered.  */
diff --git a/gdb/testsuite/gdb.trace/backtrace.exp b/gdb/testsuite/gdb.trace/backtrace.exp
index 045778e..3094074 100644
--- a/gdb/testsuite/gdb.trace/backtrace.exp
+++ b/gdb/testsuite/gdb.trace/backtrace.exp
@@ -146,6 +146,9 @@ if [is_amd64_regs_target] {
  } elseif [is_x86_like_target] {
      set fpreg "\$ebp"
      set spreg "\$esp"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "\$r31"
+    set spreg "\$r1"
  } else {
      set fpreg "\$fp"
      set spreg "\$sp"
diff --git a/gdb/testsuite/gdb.trace/change-loc.h b/gdb/testsuite/gdb.trace/change-loc.h
index e8e2e86..8efe12d 100644
--- a/gdb/testsuite/gdb.trace/change-loc.h
+++ b/gdb/testsuite/gdb.trace/change-loc.h
@@ -36,6 +36,8 @@ func4 (void)
         SYMBOL(set_tracepoint) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func5) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

diff --git a/gdb/testsuite/gdb.trace/collection.exp b/gdb/testsuite/gdb.trace/collection.exp
index bd42cfa..ed562c9 100644
--- a/gdb/testsuite/gdb.trace/collection.exp
+++ b/gdb/testsuite/gdb.trace/collection.exp
@@ -44,6 +44,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  } else {
      set fpreg "fp"
      set spreg "sp"
diff --git a/gdb/testsuite/gdb.trace/entry-values.exp b/gdb/testsuite/gdb.trace/entry-values.exp
index 0cf5615..f9928f1 100644
--- a/gdb/testsuite/gdb.trace/entry-values.exp
+++ b/gdb/testsuite/gdb.trace/entry-values.exp
@@ -218,6 +218,8 @@ if [is_amd64_regs_target] {
      set spreg "\$rsp"
  } elseif [is_x86_like_target] {
      set spreg "\$esp"
+} elseif [istarget "powerpc*-*-*"] {
+    set spreg "\$r1"
  } else {
      set spreg "\$sp"
  }
diff --git a/gdb/testsuite/gdb.trace/ftrace.c b/gdb/testsuite/gdb.trace/ftrace.c
index f522e6f..e509c7b 100644
--- a/gdb/testsuite/gdb.trace/ftrace.c
+++ b/gdb/testsuite/gdb.trace/ftrace.c
@@ -42,6 +42,8 @@ marker (int anarg)
         SYMBOL(set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

@@ -53,6 +55,8 @@ marker (int anarg)
         SYMBOL(four_byter) ":\n"
  #if (defined __i386__)
         "    cmpl $0x1,0x8(%ebp) \n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/ftrace.exp b/gdb/testsuite/gdb.trace/ftrace.exp
index f2d8002..2cc7464 100644
--- a/gdb/testsuite/gdb.trace/ftrace.exp
+++ b/gdb/testsuite/gdb.trace/ftrace.exp
@@ -84,7 +84,8 @@ proc test_fast_tracepoints {} {

      gdb_test "print gdb_agent_gdb_trampoline_buffer_error" ".*" ""

-    if { [istarget "x86_64-*-*"] || [istarget "i\[34567\]86-*-*"] } {
+    if { [istarget "x86_64-*-*"] || [istarget "i\[34567\]86-*-*"] \
+	 || [istarget "powerpc*-*-*"] } {

  	gdb_test "ftrace set_point" "Fast tracepoint .*" \
  	    "fast tracepoint at a long insn"
@@ -178,6 +179,36 @@ proc test_fast_tracepoints {} {
      }
  }

+proc test_ftrace_condition { condexp list } \
+{ with_test_prefix "ond $condexp" \
+{
+    global executable
+    global hex
+
+    clean_restart ${executable}
+    if ![runto_main] {
+	fail "Can't run to main to check for trace support"
+	return -1
+    }
+
+    gdb_test "break end" ".*" ""
+    gdb_test "ftrace set_point if $condexp" "Fast tracepoint .*"
+    gdb_trace_setactions "set action for tracepoint .*" "" \
+	"collect globvar" "^$"
+
+    gdb_test_no_output "tstart" ""
+    gdb_test "continue" "Continuing\\.\[ \r\n\]+Breakpoint.*" ""
+    gdb_test_no_output "tstop" ""
+
+    set i 0
+    foreach expval $list {
+	gdb_test "tfind" "Found trace frame $i, tracepoint .*" "tfind frame $i"
+	gdb_test "print globvar" "\\$\[0-9\]+ = $expval\[\r\n\]" "expect $expval"
+	set i [expr $i + 1]
+    }
+    gdb_test "tfind" "Target failed to find requested trace frame\."
+}}
+
  gdb_reinitialize_dir $srcdir/$subdir

  if { [gdb_test "info sharedlibrary" ".*${libipa}.*" "IPA loaded"] != 0 } {
@@ -186,3 +217,31 @@ if { [gdb_test "info sharedlibrary" ".*${libipa}.*" "IPA loaded"] != 0 } {
  }

  test_fast_tracepoints
+
+test_ftrace_condition "globvar > 7" { 8 9 10 }
+test_ftrace_condition "globvar < 4" { 1 2 3 }
+test_ftrace_condition "globvar >= 7" { 7 8 9 10 }
+test_ftrace_condition "globvar <= 4" { 1 2 3 4 }
+test_ftrace_condition "globvar == 5" { 5 }
+test_ftrace_condition "globvar != 5" { 1 2 3 4 6 7 8 9 10 }
+test_ftrace_condition "globvar > 3 && globvar < 7" { 4 5 6 }
+test_ftrace_condition "globvar < 3 || globvar > 7" { 1 2 8 9 10 }
+test_ftrace_condition "(globvar << 2) + 1 == 29" { 7 }
+test_ftrace_condition "(globvar >> 2) == 2" { 8 9 10 }
+
+# This expression is used for testing emit_reg.
+if [is_amd64_regs_target] {
+    set arg0exp "\$rdi"
+} elseif [is_x86_like_target] {
+    set arg0exp "*(int *) (\$ebp + 8)"
+} elseif [istarget "powerpc*-*-*"] {
+    set arg0exp "\$r3"
+} elseif [istarget "aarch64*"] {
+    set arg0exp "\$x0"
+} else {
+    set arg0exp ""
+}
+
+if { "$arg0exp" != "" } {
+    test_ftrace_condition "($arg0exp > 500)" { 6 7 8 9 10 }
+}
diff --git a/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp b/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
index 51ed479..1df4d65 100644
--- a/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
+++ b/gdb/testsuite/gdb.trace/mi-trace-frame-collected.exp
@@ -56,6 +56,8 @@ if [is_amd64_regs_target] {
      set pcreg "rip"
  } elseif [is_x86_like_target] {
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set pcreg "pc"
  } else {
      # Other ports that support tracepoints should set the name of pc
      # register here.
diff --git a/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp b/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
index 6b97d9d..1e6e541 100644
--- a/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
+++ b/gdb/testsuite/gdb.trace/mi-trace-unavailable.exp
@@ -135,6 +135,8 @@ proc test_trace_unavailable { data_source } {
  	    set pcnum 16
  	} elseif [is_x86_like_target] {
  	    set pcnum 8
+	} elseif [istarget "powerpc*-*-*"] {
+	    set pcnum 64
  	} else {
  	    # Other ports support tracepoint should define the number
  	    # of its own pc register.
diff --git a/gdb/testsuite/gdb.trace/pending.exp b/gdb/testsuite/gdb.trace/pending.exp
index 0399807..ed36cac 100644
--- a/gdb/testsuite/gdb.trace/pending.exp
+++ b/gdb/testsuite/gdb.trace/pending.exp
@@ -441,6 +441,8 @@ proc pending_tracepoint_with_action_resolved { trace_type } \
  	set pcreg "rip"
      } elseif [is_x86_like_target] {
  	set pcreg "eip"
+    } elseif [istarget "powerpc*-*-*"] {
+	set pcreg "pc"
      }

      gdb_trace_setactions "set action for pending tracepoint" "" \
diff --git a/gdb/testsuite/gdb.trace/pendshr1.c b/gdb/testsuite/gdb.trace/pendshr1.c
index d3b5463..2fd0fba 100644
--- a/gdb/testsuite/gdb.trace/pendshr1.c
+++ b/gdb/testsuite/gdb.trace/pendshr1.c
@@ -38,6 +38,8 @@ pendfunc (int x)
         SYMBOL(set_point1) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(pendfunc1) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/pendshr2.c b/gdb/testsuite/gdb.trace/pendshr2.c
index b8a51a5..3f40c76 100644
--- a/gdb/testsuite/gdb.trace/pendshr2.c
+++ b/gdb/testsuite/gdb.trace/pendshr2.c
@@ -35,6 +35,8 @@ pendfunc2 (int x)
         SYMBOL(set_point2) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(foo) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/range-stepping.c b/gdb/testsuite/gdb.trace/range-stepping.c
index 113f0e2..606db25 100644
--- a/gdb/testsuite/gdb.trace/range-stepping.c
+++ b/gdb/testsuite/gdb.trace/range-stepping.c
@@ -26,6 +26,8 @@
     tracepoint jump.  */
  #if (defined __x86_64__ || defined __i386__)
  #  define NOP "   .byte 0xe9,0x00,0x00,0x00,0x00\n" /* jmp $+5 (5-byte nop) */
+#elif (defined __PPC64__ || defined __PPC__)
+#  define NOP "    nop\n"
  #else
  #  define NOP "" /* port me */
  #endif
diff --git a/gdb/testsuite/gdb.trace/report.exp b/gdb/testsuite/gdb.trace/report.exp
index 2fa676b..e0160f7 100644
--- a/gdb/testsuite/gdb.trace/report.exp
+++ b/gdb/testsuite/gdb.trace/report.exp
@@ -158,6 +158,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  } else {
      set fpreg "fp"
      set spreg "sp"
diff --git a/gdb/testsuite/gdb.trace/trace-break.c b/gdb/testsuite/gdb.trace/trace-break.c
index f381ec6..ced0e92 100644
--- a/gdb/testsuite/gdb.trace/trace-break.c
+++ b/gdb/testsuite/gdb.trace/trace-break.c
@@ -41,6 +41,8 @@ marker (void)
         SYMBOL(set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );

@@ -48,6 +50,8 @@ marker (void)
         SYMBOL(after_set_point) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/trace-break.exp b/gdb/testsuite/gdb.trace/trace-break.exp
index 4283ca6..9d6551a 100644
--- a/gdb/testsuite/gdb.trace/trace-break.exp
+++ b/gdb/testsuite/gdb.trace/trace-break.exp
@@ -49,6 +49,10 @@ if [is_amd64_regs_target] {
      set fpreg "ebp"
      set spreg "esp"
      set pcreg "eip"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "r31"
+    set spreg "r1"
+    set pcreg "pc"
  }

  # Set breakpoint and tracepoint at the same address.
diff --git a/gdb/testsuite/gdb.trace/trace-mt.c b/gdb/testsuite/gdb.trace/trace-mt.c
index 38aeff5..855de54 100644
--- a/gdb/testsuite/gdb.trace/trace-mt.c
+++ b/gdb/testsuite/gdb.trace/trace-mt.c
@@ -37,6 +37,8 @@ thread_function(void *arg)
         SYMBOL(set_point1) ":\n"
  #if (defined __x86_64__ || defined __i386__)
         "    call " SYMBOL(func) "\n"
+#elif (defined __PPC64__ || defined __PPC__)
+       "    nop\n"
  #endif
         );
  }
diff --git a/gdb/testsuite/gdb.trace/while-dyn.exp b/gdb/testsuite/gdb.trace/while-dyn.exp
index 198421e..ef92b2d 100644
--- a/gdb/testsuite/gdb.trace/while-dyn.exp
+++ b/gdb/testsuite/gdb.trace/while-dyn.exp
@@ -47,6 +47,8 @@ if [is_amd64_regs_target] {
      set fpreg "\$rbp"
  } elseif [is_x86_like_target] {
      set fpreg "\$ebp"
+} elseif [istarget "powerpc*-*-*"] {
+    set fpreg "\$r31"
  } else {
      set fpreg "\$fp"
  }
-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-02-20 18:04 [PATCH 1/2] Fast tracepoint for powerpc64le Wei-cheng Wang
  2015-02-25 15:20 ` [PATCH 1/3 v2] " Wei-cheng Wang
@ 2015-02-27 19:53 ` Ulrich Weigand
  2015-03-01 17:42   ` Wei-cheng Wang
  2015-03-04 17:13   ` Pedro Alves
  2015-03-04 17:22 ` Pedro Alves
  2 siblings, 2 replies; 15+ messages in thread
From: Ulrich Weigand @ 2015-02-27 19:53 UTC (permalink / raw)
  To: Wei-cheng Wang, palves; +Cc: gdb-patches

Wei-cheng Wang wrote:

> These patches implement fast tracepoint for PowerPC64.
> 
> The first part includes required porting for PowerPC64 (and 32-bit) target.
> Including
> * Install fast tracepoint jump pad
> * Agent expression bytecode compilation for powerpc64 only.
>    For 32-bit, bytecode interpreter is used instead.
> * IPA (libinproctrace.so)
> * Implement required gdbarch hooks.
> * Enable tracepoint testing for powerpc.

Excellent!  Thanks for working on this.

I'm still looking at the actual patches, but let me reply right now to
the extra questions you raise.

Pedro, I'd also appreciate your comments on some of the gdbserver
tracepoint.c issues ...

> * collection.exp fails are DWARF issues.  x86 failed too.
>    (https://sourceware.org/bugzilla/show_bug.cgi?id=15081)
> * ftrace.exp: x86 has the same issue (KFAIL in x86)
>    (https://sourceware.org/bugzilla/show_bug.cgi?id=13808)
> * no-attach-trace.exp: x86 has the same issue.
>    Tracepoint is not supported when target is `exec'.
>    I think this should be XFAIL?
> * unavailable.exp: x86 has the same issue.

For the time being, I think it's fine to fail on Power if we fail on
x86 too.  Maybe should mark the test KFAIL in that case, however ...

> * tspeed.exp:  This case is used to test whether fast tracepoints
>    are *faster* than regular tracepoints.  The case itself uses
>    sys+user time to find a proper iteration count for measurement.
>    (quote: "Total test time should be between 2 and 5 seconds.")
>    However, in my environment, 2 seconds of sys+user time means
>    2 minutes wall clock, so this case failed due to timeout.

The tspeed.exp file already has:

# Typically we need a little extra time for this test.
set timeout 180

Is that still not enough?

> * entry-values fails: The casea try to backtrace at
>    an inline-asm-inserted symbol without debug information.
>    The prologue analyzer is confused.

OK.  Maybe this can be fixed by enhancing the analyzer (but that
can be done in a separate patch).

> * tfind.exp: One of the tracepoint is inserted at
>    `*gdb_recursion_test'.  It's not hit because local-entry is called
>    instead.  The 18 FAILs are off-by-one error.

That's really a testcase issue, we had similar problems with setting
breakpoints on "*func" on powerpc64le.  This patch contains examples
how I handled it for breakpoints:
https://sourceware.org/ml/gdb-patches/2014-01/msg01102.html

This test case seem a bit more complicated, we may need to split it
in two parts; one that uses a normal "trace gdb_recursion_test"
without the "*", and possibly a second one that specifically tests
that "trace *func" works, using a source file that makes sure to
call func via a function pointers (as in step-bt.c).

> The main reason why PowerPC64 big-endian doesn't work is
> calling convention (function descriptors) issue.
>    When installing a tracepoint in inferior memory, gdbserver
> asks the address of "gdb_collect" (and etc.) using qSymbol packet,
> and it generate a sequence of instructions to calling that address.
>    However, gdb-client "return the start of code instead of
> any data function descriptor."
>    See commenting in remote_check_symbols/remote.c,
> https://sourceware.org/ml/gdb-patches/2007-06/msg00389.html
> and gen_call() in this patch.
>    In order for powerpc64be to work, qSymbol packet should be
> extend for function descriptors.

This is annoying.  This was done to support libthread_db in gdbserver.
Unfortunately, there are a number of components involved here:
- To support debugging multi-threaded inferiors, gdbserver links
  against libthread_db (provided by glibc).
- At startup, gdbserver's thread_db_enable_reporting routine
  calls into libthread_db's td_ta_event_addr.
- td_ta_event_addr calls back into gdbserver's ps_pglobal_lookup
  to retrieve the address of __nptl_create_event in the inferior
- In order to implement ps_pglobal_lookup, gdbserver issues a
  qSymbol packet back to GDB.
- GDB looks the symbol up in the symbol table, and sends a result
  packet to gdbserver.
- ps_pglobal_lookup returns that address to td_ta_event_addr.
- td_ta_event_addr returns that address to thread_db_enable_reporting.
- thread_db_enable_reporting uses set_breakpoint_at to install a
  breakpoint at that address (i.e. the __nptl_create_event routine
  in the inferior).

Now, the equivalent action happens in GDB itself when debugging
natively.  Here, the equivalent enable_thread_event_reporting routine
calls td_ta_event_addr, which calls ps_pglobal_lookup, which does a
regular symbol lookup, and returns the address back.  Now it is
enable_thread_event_reporting itself that translates this address
into the function code address required to set a breakpoint at,
in helper routine enable_thread_event:

  /* Set up the breakpoint.  */
  gdb_assert (exec_bfd);
  (*bp) = (gdbarch_convert_from_func_ptr_addr
           (target_gdbarch (),
            /* Do proper sign extension for the target.  */
            (bfd_get_sign_extend_vma (exec_bfd) > 0
             ? (CORE_ADDR) (intptr_t) notify.u.bptaddr
             : (CORE_ADDR) (uintptr_t) notify.u.bptaddr),
            &current_target));
  create_thread_event_breakpoint (target_gdbarch (), *bp);

With gdbserver, however, remote.c already replies with the code
address to the qSymbol command, and gdbserver passes the code
address through td_ta_event_addr back to itself.

It's really not correct to have ps_pglobal_lookup return different
addresses (code vs. descriptor), depending on whether GDB or
gdbserver calls it.  It happens to work because libthread_db doesn't
really look at the address except for passing it through, but it
still all seems quite weird.

So I guess there's two ways to fix this.   One would be to change
gdbserver to work more like GDB here.  This would involve removing
the descriptor->code address conversion in remote.c, and instead
performing the conversion in gdbserver's thread_db_enable_reporting.
Now, there is no gdbarch_convert_from_func_ptr_addr in gdbserver,
so a similar mechanism would have to be invented there.  (I guess
this would mean a new target hook.)  Fortunately, the only platform
that uses function descriptors *and* supports libthread_db debugging
in gdbserver is ppc64-linux, so we'd only have to add that new
mechanim on this platform.

This has the advantage that qSymbol could now be used to lookup
function symbols and get the descriptor address as expected.
On the other hand, this would mean an incompatible change in the
remote protocol: if you used a new GDB together with an old
gdbserver (or vice versa), thread debugging would stop working.
However, I guess that could be fixed by having gdbserver request
the new behavior from GDB by specifying a feature code.  With old
GDBs gdbserver would have to skip the descriptor->code conversion.

The second alternative would be to extend qSymbol to support
returning two different types of addresses for function symbols:
the symbol value (i.e. function pointer value, i.e. descriptor
on PPC64), and a code address suitable to set a breakpoint on
function entry.  This could be either by having gdbserver
request one or the other via an additional flag on the qSymbol
request, or else by GDB simply always returning both values
in two fields.  Again, this would be an incompatible protocol
change that would need to be guarded by a qFeature check.

In this case, gdbserver would use the "normal" symbol values
for most purposes (e.g. tracepoint routine lookup), but would
use the "code address" values to return from ps_pglobal_lookup.
With old GDBs, gdbserver would disable fast tracepoint support
on powerpc64.

If we do that, then for consistency it might also be useful
to pass code addresses to ps_pglobal_lookup in GDB itself.
In addition, the "code address" could be changed to skip
the local entry point prolog on powerpc64le to ensure that
the breakpoint is set correctly.  (This does not matter in
practice since __nptl_create_event has no local entry point,
but it would seem more fully correct in general.)

Overall, the second alternative seems a bit better to me,
but I'd certainly appreciate feedback on this ...

> For powerpc32 to work, some data structure/function in tracepoint.c
> need to be fixed.  For example,
> 
> * write_inferior_data_ptr should be fixed for big-endian.
>    If sizeof (CORE_ADDR) is larger than sizeof (void*), zeros are written.
>    BTW, I thnink write_inferior_data_pointer provides the same functionality
>    without this issue.  I'm not sure why write_inferior_data_ptr is needed?

This is odd, I don't see the point of this either.   Of course, as the
comment says, much of this stuff will break anyway if gdbserver is
compiled differently than the inferior (e.g. a 64-bit gdbserver
debugging a 32-bit inferior), because it assumes the structure layout
is identical.  However, if we do have a 32-bit gdbserver, then I don't
see why it shouldn't be possible to debug a 32-bit inferior, just
because CORE_ADDR is a 64-bit type ...

> * Data structure layout between gdbserver and IPA is not consistent.
> 
>    There are two versions of tracepoint_action one for gdbserver,
>    and antoher for inferior (IPA side).
> 
>    -    struct tracepoint_action
>    |    {
>    |    #ifndef IN_PROCESS_AGENT
>    |      const struct tracepoint_action_ops *ops;
>    | -  #endif
>    | |    char type;
>    - -  };
> 
>    It is the base object for action objects.
> 
>    struct collect_memory_action
>    {
>      struct tracepoint_action base;  <--
>      {
>        const struct tracepoint_action_ops *ops;
>    -   char type;
>    | }
>    |
>    | ULONGEST addr;
>    | ULONGEST len;
>    - int32_t basereg;
>    };
> 
>    When gdbserver downloading the action object to inferior,
>    it copies the object from offsetof(type) to the end.
>    (See m_tracepoint_action_download/tracepoint.c for example)
>    Howevery, the object layouts may not be consistent between
>    the two versions (with or without ops fields.)
>    It depends the the alignment requirement of addr (first data member
>    after base object), and the padding of tracepoint_action.
> 
>    In this case, the distance from "type" to "addr" changes
> 
>       Wihtout ops           with ops
>       0   1   2   3         0   1   2   3
>     0 type| PADDING...    0 ops-------------|
>     4 ................    4 type|PADDING....|
>     8 addr------------    8 addr-------------
>     c ---------------|    c ----------------|
>    10 len-------------   10 len--------------
>    14 ---------------|   14 ----------------|
>    18 basereg--------|   18 basereg---------|

Ugh.  That's a strange construct, and extremely dependent on alignment
rules (as you noticed).  I'm not really sure what the best way to fix
this would be.  My preference right now would be get rid of "ops" on
the gdbserver side too, and just switch on "type" in the two places
where the ops->send and ops->download routines are called right now.

This makes the data structures the same on gdbserver and IPA, which
simplifies downloading quite a bit.  (Also, it keeps the data structure
identical in IPA, which should avoid compatibility issues between
versions.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-02-27 19:53 ` [PATCH 1/2] " Ulrich Weigand
@ 2015-03-01 17:42   ` Wei-cheng Wang
  2015-03-17 13:48     ` Ulrich Weigand
  2015-03-04 17:13   ` Pedro Alves
  1 sibling, 1 reply; 15+ messages in thread
From: Wei-cheng Wang @ 2015-03-01 17:42 UTC (permalink / raw)
  To: Ulrich Weigand, palves; +Cc: gdb-patches

On 2015/2/28 ä¸Šåˆ 03:52, Ulrich Weigand wrote:
> The tspeed.exp file already has:
> # Typically we need a little extra time for this test.
> set timeout 180
> Is that still not enough?

It should include the time spent in trying different loop counts,
so it would be 11 + 22 + 45 + 90 + 180 = at least 348 seconds in my environment.
(for 10000, 20000, 40000, 80000, 160000 iterations respectively)
If I set timeout to 360, the case will pass.

>> * tfind.exp: One of the tracepoint is inserted at
>>     `*gdb_recursion_test'.  It's not hit because local-entry is called
>>     instead.  The 18 FAILs are off-by-one error.
> This test case seem a bit more complicated, we may need to split it
> in two parts; one that uses a normal "trace gdb_recursion_test"
> without the "*", and possibly a second one that specifically tests
> that "trace *func" works, using a source file that makes sure to
> call func via a function pointers (as in step-bt.c).

How about simply change the code to this?  It wouldn't hurt other cases.
And all the failed cases in tfind.exp now pass.

--- a/gdb/testsuite/gdb.trace/actions.c
+++ b/gdb/testsuite/gdb.trace/actions.c
@@ -46,6 +46,8 @@ static union GDB_UNION_TEST
  } gdb_union1_test;

  void gdb_recursion_test (int, int, int, int,  int,  int,  int);
+typedef void (*gdb_recursion_test_fp) (int, int, int, int,  int,  int,  int);
+gdb_recursion_test_fp gdb_recursion_test_ptr = gdb_recursion_test;

  void gdb_recursion_test (int depth,
                          int q1,
@@ -64,7 +66,7 @@ void gdb_recursion_test (int depth,
    q5 = q6;                                             /* gdbtestline 6 */
    q6 = q;                                              /* gdbtestline 7 */
    if (depth--)                                         /* gdbtestline 8 */
-    gdb_recursion_test (depth, q1, q2, q3, q4, q5, q6);        /* gdbtestline 9 */
+    gdb_recursion_test_ptr (depth, q1, q2, q3, q4, q5, q6);    /* gdbtestline 9 */
  }


@@ -103,7 +105,7 @@ unsigned long   gdb_c_test( unsigned long *parm )
     gdb_structp_test      = &gdb_struct1_test;
     gdb_structpp_test     = &gdb_structp_test;

-   gdb_recursion_test (3, (long) parm[1], (long) parm[2], (long) parm[3],
+   gdb_recursion_test_ptr (3, (long) parm[1], (long) parm[2], (long) parm[3],
                        (long) parm[4], (long) parm[5], (long) parm[6]);

>> For powerpc32 to work, some data structure/function in tracepoint.c
>> need to be fixed.  For example,
>> * write_inferior_data_ptr should be fixed for big-endian.
>>     If sizeof (CORE_ADDR) is larger than sizeof (void*), zeros are written.
>>     BTW, I thnink write_inferior_data_pointer provides the same functionality
>>     without this issue.  I'm not sure why write_inferior_data_ptr is needed?
> This is odd, I don't see the point of this either.   Of course, as the
> comment says, much of this stuff will break anyway if gdbserver is
> compiled differently than the inferior (e.g. a 64-bit gdbserver
> debugging a 32-bit inferior), because it assumes the structure layout
> is identical.  However, if we do have a 32-bit gdbserver, then I don't
> see why it shouldn't be possible to debug a 32-bit inferior, just
> because CORE_ADDR is a 64-bit type ...

For example, CORE_ADDR ptr = 0x11223344, a 32-bit address,
and sizeof (void *) = 4-byte

   |------------ 64-bit CORE_ADDR ---------|
   MSB                                    LSB
   | 00 | 00 | 00 | 00 | 11 | 22 | 33 | 44 |
   Low                                    High Address
   |-- 32-bit(void*) --|
   &ptr,4 means these zeros are written to inferior.

static int
write_inferior_data_ptr (CORE_ADDR where, CORE_ADDR ptr)
{
   return write_inferior_memory (where,
                                 (unsigned char *) &ptr, sizeof (void *));
                                                   ^^^^  ^^^^^^^^^^^^^^^
}

CORE_ADDR is declared as "unsigned long long" for gdbserver
(in common/gdb/common-types.h)

> Ugh.  That's a strange construct, and extremely dependent on alignment
> rules (as you noticed).  I'm not really sure what the best way to fix
> this would be.  My preference right now would be get rid of "ops" on
> the gdbserver side too, and just switch on "type" in the two places
> where the ops->send and ops->download routines are called right now.
>
> This makes the data structures the same on gdbserver and IPA, which
> simplifies downloading quite a bit.  (Also, it keeps the data structure
> identical in IPA, which should avoid compatibility issues between
> versions.)
   That sounds great to me!

Thanks
Wei-cheng,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-02-27 19:53 ` [PATCH 1/2] " Ulrich Weigand
  2015-03-01 17:42   ` Wei-cheng Wang
@ 2015-03-04 17:13   ` Pedro Alves
  2015-03-17 18:12     ` Ulrich Weigand
  1 sibling, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2015-03-04 17:13 UTC (permalink / raw)
  To: Ulrich Weigand, Wei-cheng Wang; +Cc: gdb-patches

On 02/27/2015 07:52 PM, Ulrich Weigand wrote:

>> The main reason why PowerPC64 big-endian doesn't work is
>> calling convention (function descriptors) issue.
>>    When installing a tracepoint in inferior memory, gdbserver
>> asks the address of "gdb_collect" (and etc.) using qSymbol packet,
>> and it generate a sequence of instructions to calling that address.
>>    However, gdb-client "return the start of code instead of
>> any data function descriptor."
>>    See commenting in remote_check_symbols/remote.c,
>> https://sourceware.org/ml/gdb-patches/2007-06/msg00389.html
>> and gen_call() in this patch.
>>    In order for powerpc64be to work, qSymbol packet should be
>> extend for function descriptors.

> So I guess there's two ways to fix this.   One would be to change
> gdbserver to work more like GDB here.  This would involve removing
> the descriptor->code address conversion in remote.c, and instead
> performing the conversion in gdbserver's thread_db_enable_reporting.
> Now, there is no gdbarch_convert_from_func_ptr_addr in gdbserver,
> so a similar mechanism would have to be invented there.  (I guess
> this would mean a new target hook.)  Fortunately, the only platform
> that uses function descriptors *and* supports libthread_db debugging
> in gdbserver is ppc64-linux, so we'd only have to add that new
> mechanim on this platform.

Note sure about this one, ppc64_convert_from_func_ptr_addr wants to
get at the bfd/binary's unrelocated sections.  We'd have to teach
gdbserver to read the binary.

> 
> This has the advantage that qSymbol could now be used to lookup
> function symbols and get the descriptor address as expected.
> On the other hand, this would mean an incompatible change in the
> remote protocol: if you used a new GDB together with an old
> gdbserver (or vice versa), thread debugging would stop working.
> However, I guess that could be fixed by having gdbserver request
> the new behavior from GDB by specifying a feature code.  With old
> GDBs gdbserver would have to skip the descriptor->code conversion.
> 
> 
> The second alternative would be to extend qSymbol to support
> returning two different types of addresses for function symbols:
> the symbol value (i.e. function pointer value, i.e. descriptor
> on PPC64), and a code address suitable to set a breakpoint on
> function entry.  This could be either by having gdbserver
> request one or the other via an additional flag on the qSymbol
> request, or else by GDB simply always returning both values
> in two fields.  Again, this would be an incompatible protocol
> change that would need to be guarded by a qFeature check.
> 
> In this case, gdbserver would use the "normal" symbol values
> for most purposes (e.g. tracepoint routine lookup), but would
> use the "code address" values to return from ps_pglobal_lookup.
> With old GDBs, gdbserver would disable fast tracepoint support
> on powerpc64.
> 
> If we do that, then for consistency it might also be useful
> to pass code addresses to ps_pglobal_lookup in GDB itself.
> In addition, the "code address" could be changed to skip
> the local entry point prolog on powerpc64le to ensure that
> the breakpoint is set correctly.  (This does not matter in
> practice since __nptl_create_event has no local entry point,
> but it would seem more fully correct in general.)
> 
> 
> Overall, the second alternative seems a bit better to me,
> but I'd certainly appreciate feedback on this ...

I inclined for the second alternative as well.

(Note for testing: __nptl_create_event will only be used
on old kernels without PTRACE_EVENT_CLONE, unless you hack the
code to force usage.)

> 
> 
>> For powerpc32 to work, some data structure/function in tracepoint.c
>> need to be fixed.  For example,
>>
>> * write_inferior_data_ptr should be fixed for big-endian.
>>    If sizeof (CORE_ADDR) is larger than sizeof (void*), zeros are written.
>>    BTW, I thnink write_inferior_data_pointer provides the same functionality
>>    without this issue.  I'm not sure why write_inferior_data_ptr is needed?
> 
> This is odd, I don't see the point of this either.

Yeah, probably I just missed merging them fully while cleaning up the
initial contribution...  I agree we should just use
write_inferior_data_pointer.

>> * Data structure layout between gdbserver and IPA is not consistent.
>>
>>    There are two versions of tracepoint_action one for gdbserver,
>>    and antoher for inferior (IPA side).
>>

...

> Ugh.  That's a strange construct, and extremely dependent on alignment
> rules (as you noticed).  I'm not really sure what the best way to fix
> this would be.  My preference right now would be get rid of "ops" on
> the gdbserver side too, and just switch on "type" in the two places
> where the ops->send and ops->download routines are called right now.
> 
> This makes the data structures the same on gdbserver and IPA, which
> simplifies downloading quite a bit.  (Also, it keeps the data structure
> identical in IPA, which should avoid compatibility issues between
> versions.)

That sounds fine.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-02-20 18:04 [PATCH 1/2] Fast tracepoint for powerpc64le Wei-cheng Wang
  2015-02-25 15:20 ` [PATCH 1/3 v2] " Wei-cheng Wang
  2015-02-27 19:53 ` [PATCH 1/2] " Ulrich Weigand
@ 2015-03-04 17:22 ` Pedro Alves
  2 siblings, 0 replies; 15+ messages in thread
From: Pedro Alves @ 2015-03-04 17:22 UTC (permalink / raw)
  To: Wei-cheng Wang, uweigand, gdb-patches

On 02/20/2015 06:04 PM, Wei-cheng Wang wrote:
> +/* Return the value of register REGNUM.  RAW_REGS is collected buffer
> +   by jump pad.  This function is called by emit_reg.  */
> +
> +ULONGEST __attribute__ ((visibility("default"), used))
> +gdb_agent_get_raw_reg (const unsigned char *raw_regs, int regnum)
> +{
> +  if (regnum >= PPC_NUM_FT_COLLECT_GREGS)

Meanwhile several C++ patches landed which changed how this
function should be declared.  Please make this:

 IP_AGENT_EXPORT_FUNC ULONGEST
 gdb_agent_get_raw_reg (const unsigned char *raw_regs, int regnum)
 {

I think it'd be good to split out the changes that make
ppc gdbserver do Z0 packets too.

(nit: it'd make it easier to identify the different patches in
the series if they had different subjects, which identified their
actual contents.  That's ideal for the subjects of the git commits
too, so best do that when submitting the patches already.)

Thanks for working on this!

Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3 v2] Fast tracepoint for powerpc64le
  2015-02-25 15:20 ` [PATCH 1/3 v2] " Wei-cheng Wang
@ 2015-03-17 13:34   ` Ulrich Weigand
  2015-03-29 19:27     ` Wei-cheng Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Weigand @ 2015-03-17 13:34 UTC (permalink / raw)
  To: Wei-cheng Wang; +Cc: gdb-patches

Wei-cheng Wang wrote:

>I just found my mail client the it to the wrong address.
>Here are some detailed explanation in my previous mail,
>in cases you've not read yet.
>https://sourceware.org/ml/gdb-patches/2015-02/msg00604.html
>https://sourceware.org/ml/gdb-patches/2015-02/msg00605.html

Sorry for the late reply; I didn't find the time to do a
thorough review before now.   Thanks again for working on
this feature.  In general, the patch is looking good; I do
have a couple of comments below.

See also Pedro's comments on the patch here:
https://sourceware.org/ml/gdb-patches/2015-03/msg00131.html

I'll follow up on the outstanding questions in the other
patches shortly.


>2. Add testcases for bytecode compilation in ftrace.exp
>    It is used to testing various emit_OP functions.

Adding additional tests is good, but should be done as a separate patch
(can be done before the main ppc support patch).

>diff --git a/gdb/gdbserver/linux-ppc-low.c b/gdb/gdbserver/linux-ppc-low.c
>index 188fac0..0b47543 100644
>--- a/gdb/gdbserver/linux-ppc-low.c
>+++ b/gdb/gdbserver/linux-ppc-low.c
>
>
>+/* Put a 32-bit INSN instruction in BUF in target endian.  */
>+
>+static int
>+put_i32 (unsigned char *buf, uint32_t insn)
>+{
>+  if (__BYTE_ORDER == __LITTLE_ENDIAN)
>+    {
>+      buf[3] = (insn >> 24) & 0xff;
>+      buf[2] = (insn >> 16) & 0xff;
>+      buf[1] = (insn >> 8) & 0xff;
>+      buf[0] = insn & 0xff;
>+    }
>+  else
>+    {
>+      buf[0] = (insn >> 24) & 0xff;
>+      buf[1] = (insn >> 16) & 0xff;
>+      buf[2] = (insn >> 8) & 0xff;
>+      buf[3] = insn & 0xff;
>+    }
>+
>+  return 4;
>+}

This seems a bit overkill -- this is gdbserver code, which always
runs in the same endianness as the inferior.   So this could be
done via a simple copy.   (In order to avoid aliasing violations,
the copy should be done via memcpy -- which the compiler will
optimize away --, or even better, the type of buf could be changed
to uint32_t throughout, since all instructions are 4 bytes.)

Returning "number of bytes" from all these routines is likewise a
bit odd on PowerPC.  (Obviously, it makes sense on Intel, which is
where you've probably copied it from.)

Maybe all the GEN_ routines should just return an uint32_t instruction
on PowerPC, which the user could then place into the buffer via e.g.
   *p++ = GEN_... 
(if p is a uint32_t *)?

>+/* Generate a ds-form instruction in BUF and return the number of bytes written
>+
>+   0      6     11   16          30 32
>+   | OPCD | RST | RA |     DS    |XO|  */
>+
>+__attribute__((unused)) /* Maybe unused due to conditional compilation.  */
>+static int
>+gen_ds_form (unsigned char *buf, int opcd, int rst, int ra, int ds, int xo)
>+{
>+  uint32_t insn = opcd << 26;
>+
>+  insn |= (rst << 21) | (ra << 16) | (ds & 0xfffc) | (xo & 0x3);

Maybe mask off excess bits of rst and rs here too?  Just to make sure
you don't get completely random instructions if the macro is used
incorrectly?   Or just assert the values are in range?  (Similarly
with the other gen_ routines.)

>+#define GEN_LWARX(buf, rt, ra, rb)	gen_x_form (buf, 31, rt, ra, rb, 20, 0)
Depending on which synchronization primitives are needed, we might want
to expose the EH flag.

>+/* Generate a md-form instruction in BUF and return the number of bytes written.
>+
>+   0      6    11   16   21   27   30 31 32
>+   | OPCD | RS | RA | sh | mb | XO |sh|Rc|  */
>+
>+static int
>+gen_md_form (unsigned char *buf, int opcd, int rs, int ra, int sh, int mb,
>+	     int xo, int rc)
>+{
>+  uint32_t insn = opcd << 26;
>+  unsigned int n = ((mb & 0x1f) << 1) | ((mb >> 5) & 0x1);
>+  unsigned int sh0_4 = sh & 0x1f;
>+  unsigned int sh5 = (sh >> 5) & 1;
>+
>+  insn |= (rs << 21) | (ra << 16) | (sh0_4 << 11) | (n << 5) | (sh5 << 1)
>+	  | (xo << 2);

"rc" is missing here.  (Doesn't matter right now, but should still be
fixed.)

>+/* Generate a sequence of instructions to load IMM in the register REG.
>+   Write the instructions in BUF and return the number of bytes written.  */
>+
>+static int
>+gen_limm (unsigned char *buf, int reg, uint64_t imm)
>+{
>+  unsigned char *p = buf;
>+
>+  if ((imm >> 8) == 0)
>+    {
>+      /* li	reg, imm[7:0] */
>+      p += GEN_LI (p, reg, imm);

Actually, you can load values up to 32767 with a single LI.

>+    }
>+  else if ((imm >> 16) == 0)
>+    {
>+      /* li	reg, 0
>+	 ori	reg, reg, imm[15:0] */
>+      p += GEN_LI (p, reg, 0);
>+      p += GEN_ORI (p, reg, reg, imm);
>+    }
>+  else if ((imm >> 32) == 0)
>+    {
>+      /* lis	reg, imm[31:16]
>+	 ori	reg, reg, imm[15:0]
>+	 rldicl	reg, reg, 0, 32 */
>+      p += GEN_LIS (p, reg, (imm >> 16) & 0xffff);
>+      p += GEN_ORI (p, reg, reg, imm & 0xffff);
>+      p += GEN_RLDICL (p, reg, reg, 0, 32);

You really need the rldicl only if the top bit
was set; otherwise, lis already zeros out the
high bits.

>+    }
>+  else
>+    {
>+      /* lis    reg, <imm[63:48]>
>+	 ori    reg, reg, <imm[48:32]>
>+	 rldicr reg, reg, 32, 31
>+	 oris   reg, reg, <imm[31:16]>
>+	 ori    reg, reg, <imm[15:0]> */
>+      p += GEN_LIS (p, reg, ((imm >> 48) & 0xffff));
>+      p += GEN_ORI (p, reg, reg, ((imm >> 32) & 0xffff));
>+      p += GEN_RLDICR (p, reg, reg, 32, 31);
>+      p += GEN_ORIS (p, reg, reg, ((imm >> 16) & 0xffff));
>+      p += GEN_ORI (p, reg, reg, (imm & 0xffff));
>+    }
>+
>+  return p - buf;
>+}


>+/* Generate a sequence for atomically exchange at location LOCK.
>+   This code sequence clobbers r6, r7, r8, r9.  */
>+
>+static int
>+gen_atomic_xchg (unsigned char *buf, CORE_ADDR lock, int old_value, int new_value)
>+{
>+  const int r_lock = 6;
>+  const int r_old = 7;
>+  const int r_new = 8;
>+  const int r_tmp = 9;
>+  unsigned char *p = buf;
>+
>+  /*
>+  1: lwsync
>+  2: lwarx   TMP, 0, LOCK
>+     cmpwi   TMP, OLD
>+     bne     1b
>+     stwcx.  NEW, 0, LOCK
>+     bne     2b */
>+
>+  p += gen_limm (p, r_lock, lock);
>+  p += gen_limm (p, r_new, new_value);
>+  p += gen_limm (p, r_old, old_value);
>+
>+  p += put_i32 (p, 0x7c2004ac);	/* lwsync */
>+  p += GEN_LWARX (p, r_tmp, 0, r_lock);
>+  p += GEN_CMPW (p, r_tmp, r_old);
>+  p += GEN_BNE (p, -12);
>+  p += GEN_STWCX (p, r_new, 0, r_lock);
>+  p += GEN_BNE (p, -16);
>+
>+  return p - buf;
>+}

A generic compare-and-swap will be correct, but probably not the most
efficient way to implement a spinlock on PowerPC.  We might want to
look into implementing release/acquire semantics along the lines of
the sample code in B.2.1.1 / B 2.2.1 of the PowerISA.  (I guess this
doesn't need to be done in the initial version of the patch.)


>+/* Implement install_fast_tracepoint_jump_pad of target_ops.
>+   See target.h for details.  */
>+
>+static int
>+ppc_install_fast_tracepoint_jump_pad (CORE_ADDR tpoint, CORE_ADDR tpaddr,
>+				      CORE_ADDR collector,
>+				      CORE_ADDR lockaddr,
>+				      ULONGEST orig_size,
>+				      CORE_ADDR *jump_entry,
>+				      CORE_ADDR *trampoline,
>+				      ULONGEST *trampoline_size,
>+				      unsigned char *jjump_pad_insn,
>+				      ULONGEST *jjump_pad_insn_size,
>+				      CORE_ADDR *adjusted_insn_addr,
>+				      CORE_ADDR *adjusted_insn_addr_end,
>+				      char *err)
>+{
>+  unsigned char buf[1024];
>+  unsigned char *p = buf;
>+  int j, offset;
>+  CORE_ADDR buildaddr = *jump_entry;
>+  const CORE_ADDR entryaddr = *jump_entry;
>+#if __PPC64__
>+  const int rsz = 8;
>+#else
>+  const int rsz = 4;
>+#endif
>+  const int frame_size = (((37 * rsz) + 112) + 0xf) & ~0xf;

See below for comments of the frame size (112 byte constant) ...

>+
>+  /* Stack frame layout for this jump pad,
>+
>+     High	CTR   -8(sp)
>+		LR   -16(sp)
>+		XER
>+		CR
>+		R31
>+		R29
>+		...
>+		R1
>+		R0
>+     Low	PC/<tpaddr>
>+
>+     The code flow of this jump pad,
>+
>+     1. Save GPR and SPR
>+     3. Adjust SP
>+     4. Prepare argument
>+     5. Call gdb_collector
>+     6. Restore SP
>+     7. Restore GPR and SPR
>+     8. Build a jump for back to the program
>+     9. Copy/relocate original instruction
>+    10. Build a jump for replacing orignal instruction.  */
>+
>+  for (j = 0; j < 32; j++)
>+    p += GEN_STORE (p, j, 1, (-rsz * 36 + j * rsz));

This writes to below the SP, which is OK or ppc64 since there is a
stack "red zone", but may fail on ppc32.  You should (save and) update
SP before saving the other registers there.

>+  /* Save PC<tpaddr>  */
>+  p += gen_limm (p, 3, tpaddr);
>+  p += GEN_STORE (p, 3, 1, (-rsz * 37));

This is actually a problem even on ELFv1 ppc64 since the red zone size
is only 288 bytes.  (Only on ELFv2, the red zone size is 512 bytes.)

>+  /* Save CR, XER, LR, and CTR.  */
>+  p += put_i32 (p, 0x7c600026);			/* mfcr   r3 */
>+  p += GEN_MFSPR (p, 4, 1);			/* mfxer  r4 */
>+  p += GEN_MFSPR (p, 5, 8);			/* mflr   r5 */
>+  p += GEN_MFSPR (p, 6, 9);			/* mfctr  r6 */
>+  p += GEN_STORE (p, 3, 1, -4 * rsz);		/* std    r3, -32(r1) */
>+  p += GEN_STORE (p, 4, 1, -3 * rsz);		/* std    r4, -24(r1) */
>+  p += GEN_STORE (p, 5, 1, -2 * rsz);		/* std    r5, -16(r1) */
>+  p += GEN_STORE (p, 6, 1, -1 * rsz);		/* std    r6, -8(r1) */
>+
>+  /* Adjust stack pointer.  */
>+  p += GEN_ADDI (p, 1, 1, -frame_size);		/* subi   r1,r1,FRAME_SIZE */

This violates the ABI because the back chain link is not maintained.
At any point, r1 should point to a word that holds the back chain
to the next higher frame.

>+  /* Set r4 to collected registers.  */
>+  p += GEN_ADDI (p, 4, 1, frame_size - rsz * 37);
>+  /* Set r3 to TPOINT.  */
>+  p += gen_limm (p, 3, tpoint);
>+
>+  p += gen_atomic_xchg (p, lockaddr, 0, 1);

This seems wrong.  Shouldn't *lockaddr be set to the address
of a collecting_t object, and not just "1"?

>+  /* Restore stack and registers.  */
>+  p += GEN_ADDI (p, 1, 1, frame_size);	/* addi	r1,r1,FRAME_SIZE */

Similar to above, this doesn't work on ppc32.

>+  p += GEN_LOAD (p, 3, 1, -4 * rsz);	/* ld	r3, -32(r1) */
>+  p += GEN_LOAD (p, 4, 1, -3 * rsz);	/* ld	r4, -24(r1) */
>+  p += GEN_LOAD (p, 5, 1, -2 * rsz);	/* ld	r5, -16(r1) */
>+  p += GEN_LOAD (p, 6, 1, -1 * rsz);	/* ld	r6, -8(r1) */
>+  p += put_i32 (p, 0x7c6ff120);		/* mtcr	r3 */
>+  p += GEN_MTSPR (p, 4, 1);		/* mtxer  r4 */
>+  p += GEN_MTSPR (p, 5, 8);		/* mtlr   r5 */
>+  p += GEN_MTSPR (p, 6, 9);		/* mtctr  r6 */
>+  for (j = 0; j < 32; j++)
>+    p += GEN_LOAD (p, j, 1, (-rsz * 36 + j * rsz));

>+  /* Now, insert the original instruction to execute in the jump pad.  */
>+  *adjusted_insn_addr = buildaddr + (p - buf);
>+  *adjusted_insn_addr_end = *adjusted_insn_addr;
>+  relocate_instruction (adjusted_insn_addr_end, tpaddr);
>+
>+  /* Verify the relocation size.  If should be 4 for normal copy, or 8
>+     for some conditional branch.  */
>+  if ((*adjusted_insn_addr_end - *adjusted_insn_addr == 0)
>+      || (*adjusted_insn_addr_end - *adjusted_insn_addr > 8))
>+    {
>+      sprintf (err, "E.Unexpected instruction length = %d"
>+		    "when relocate instruction.",
>+		    (int) (*adjusted_insn_addr_end - *adjusted_insn_addr));
>+      return 1;
>+    }

Hmm.  This calls back to GDB to perform the relocation of the
original instruction.  On PowerPC, there are only a handful of
instructions that need to be relocated; I'm not sure it is really
necessary to call back to GDB.  Can't those just be relocated
directly here?   This might even make the code simpler overall.

>+  buildaddr = *adjusted_insn_addr_end;
>+  p = buf;
>+  /* Finally, write a jump back to the program.  */
>+  offset = (tpaddr + 4) - buildaddr;
>+  if (offset >= (1 << 26) || offset < -(1 << 26))
I guess this needs to check for (1 << 25) like below, since we have
a signed displacement.

>+/*
>+
>+  Bytecode execution stack frame
>+
>+	|  Parameter save area    (SP + 48) [8 doublewords]
>+	|  TOC save area          (SP + 40)
>+	|  link editor doubleword (SP + 32)
>+	|  compiler doubleword    (SP + 24)  save TOP here during call
>+	|  LR save area           (SP + 16)
>+	|  CR save area           (SP + 8)
>+ SP' -> +- Back chain             (SP + 0)
>+	|  Save r31
>+	|  Save r30
>+	|  Save r4    for *value
>+	|  Save r3    for CTX
>+ r30 -> +- Bytecode execution stack
>+	|
>+	|  64-byte (8 doublewords) at initial.  Expand stack as needed.
>+	|
>+ r31 -> +-

Note that the stack frame layout as above only applies to ELFv1, but
you're actually only supporting ELFv2 at the moment.  For ELFv2, there
is no parameter save area (for this specific call), there is no compiler
or linker doubleword, and the TOC save area is at SP + 24.  (So this
location probably shouldn't be used to save something else ...)

>+  initial frame size
>+  = (48 + 8 * 8) + (4 * 8) + 64
>+  = 112 + 96
>+  = 208

This is also a bit bigger than required for ELFv2.  On the other hand,
having a larger buffer doesn't hurt.


>+static void
>+ppc64_emit_reg (int reg)
>+{
>+  unsigned char buf[10 * 4];
>+  unsigned char *p = buf;
>+
>+  p += GEN_LD (p, 3, 31, bc_framesz - 32);
>+  p += GEN_LD (p, 3, 3, 48);	/* offsetof (fast_tracepoint_ctx, regs) */

This seems a bit fragile, it would be better to determine the offset
automatically ...   (I don't quite understand why the x86 code works
either, as it is right now ...)


>+static void
>+ppc64_emit_stack_flush (void)
>+{
>+  /* Make sure bytecode stack is big enough before push.
>+     Otherwise, expand 64-byte more.  */
>+
>+  EMIT_ASM ("  std   3, 0(30)		\n"
>+	    "  addi  4, 30, -(112 + 8)	\n"
>+	    "  cmpd  7, 4, 1		\n"
>+	    "  bgt   1f			\n"
>+	    "  ld    4, 0(1)		\n"
>+	    "  addi  1, 1, -64		\n"
>+	    "  std   4, 0(1)		\n"

For full ABI compliance, the back chain needs to be maintained
at every instruction, so you always have to update the stack
pointer using stdu.  Should be simple enough to do:

 	    "  ld    4, 0(1)		\n"
 	    "  stdu  4, -64(1)		\n"


>+/* Discard N elements in the stack.  */
>+
>+static void
>+ppc64_emit_stack_adjust (int n)
>+{
>+  unsigned char buf[4];
>+  unsigned char *p = buf;
>+
>+  p += GEN_ADDI (p, 30, 30, n << 3);	/* addi	r30, r30, (n << 3) */

"n" probably isnt't too big for addi here, but should better be
verified, just in case new callers are ever added ...


>+static void
>+ppc64_emit_int_call_1 (CORE_ADDR fn, int arg1)
>+{
>+  unsigned char buf[8 * 4];
>+  unsigned char *p = buf;
>+
>+  /* Setup argument.  arg1 is a 16-bit value.  */
>+  p += GEN_LI (p, 3, arg1);		/* li	r3, arg1 */

Well ... even so, you still cannot load values > 32767 with LI.
Either check, or just call gen_limm, which should always do
the right thing.

>+static void
>+ppc64_emit_void_call_2 (CORE_ADDR fn, int arg1)
>+{
>+  unsigned char buf[12 * 4];
>+  unsigned char *p = buf;
>+
>+  /* Save TOP */
>+  p += GEN_STD (p, 3, 31, bc_framesz + 24);

On ELFv2, that is really the TOC save slot, see above.  Why not
just save TOP at 0(30)?  That should be available ...

>+  /* Setup argument.  arg1 is a 16-bit value.  */
>+  p += GEN_MR (p, 4, 3);		/* mr	r4, r3 */
>+  p += GEN_LI (p, 3, arg1);	/* li	r3, arg1 */

See above.


>+static void
>+ppc64_emit_if_goto (int *offset_p, int *size_p)
>+{
>+  EMIT_ASM ("mr     4, 3	\n"
>+	    "ldu    3, 8(30)	\n"
>+	    "cmpdi  7, 4, 0	\n"
>+	    "1:bne  7, 1b	\n");

Why not just:
    cmpdi 7, 3, 0
    ldu 3, 8(30)
    1:bne 7, 1b

>+static void
>+ppc_write_goto_address (CORE_ADDR from, CORE_ADDR to, int size)
>+{
>+  int rel = to - from;
>+  uint32_t insn;
>+  int opcd;
>+  unsigned char buf[4];
>+
>+  read_inferior_memory (from, buf, 4);
>+  insn = get_i32 (buf);
>+  opcd = (insn >> 26) & 0x3f;
>+
>+  switch (size)
>+    {
>+    case 14:
>+      if (opcd != 16)
>+	emit_error = 1;
>+      insn = (insn & ~0xfffc) | (rel & 0xfffc);
>+      break;
>+    case 24:
>+      if (opcd != 18)
>+	emit_error = 1;
>+      insn = (insn & ~0x3fffffc) | (rel & 0x3fffffc);
>+      break;

So this really should check for overflow -- I guess usually the code
generated here shouldn't be too big, but if it is, we really should
detect that and fail cleanly instead of just jumping to random
locations ...


>diff --git a/gdb/rs6000-tdep.c b/gdb/rs6000-tdep.c
>index ef94bba..dc27cfb 100644
>--- a/gdb/rs6000-tdep.c
>+++ b/gdb/rs6000-tdep.c
>@@ -966,6 +969,21 @@ rs6000_breakpoint_from_pc (struct gdbarch *gdbarch, CORE_ADDR *bp_addr,
>      return little_breakpoint;
>  }
>
>+/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
>+   to the number of bytes the target should copy elsewhere for the
>+   tracepoint.  */
>+
>+static int
>+ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
>+			      CORE_ADDR addr, int *isize, char **msg)
>+{
>+  if (isize)
>+    *isize = gdbarch_max_insn_length (gdbarch);
>+  if (msg)
>+    *msg = NULL;
>+  return 1;
>+}

Should/can we check here where the jump to the jump pad will be in
range?  Might be better to detect this early ...


>+/* Copy the instruction from OLDLOC to *TO, and update *TO to *TO + size
>+   of instruction.  This function is used to adjust pc-relative instructions
>+   when copying.  */
>+
>+static void
>+ppc_relocate_instruction (struct gdbarch *gdbarch,
>+			  CORE_ADDR *to, CORE_ADDR oldloc)

See above for whether we need this here; maybe all this should
be done directly in gdbserver.  Nothing in here seems to require
support from the full GDB code base.

>+    {
>+      /* conditional branch && AA = 0 */
>+
>+      rel = PPC_BD (insn);
>+      newrel = (oldloc - *to) + rel;
>+
>+      if (newrel >= (1 << 25) || newrel < -(1 << 25))
>+	return;
>+
>+      newrel -= 4;

Why is this correct?   If we fit in a conditional branch, the
value of newrel computed above should be correct.  Only if we
do the jump-over, we need to adjust newrel ...

>+      if (newrel >= (1 << 15) || newrel < -(1 << 15))
>+	{
>+	   /* The offset of to big for conditional-branch (16-bit).
>+	      Try to invert the condition and jump with 26-bit branch.
>+	      For example,
>+
>+		beq  .Lgoto
>+		INSN1
>+
>+	      =>
>+
>+		bne  1f
>+		b    .Lgoto
>+	      1:INSN1
>+
>+	    */
>+
>+	   /* Check whether BO is 001at or 011 at.  */
>+	   if ((PPC_BO (insn) & 0x14) != 0x4)
>+	     return;

Well, we really should handle the other cases too; there's no reason
to simply fail if this happens to be a branch on count or such ...

>+/* Implement gdbarch_gen_return_address.  Generate a bytecode expression
>+   to get the value of the saved PC.  SCOPE is the address we want to
>+   get return address for.  SCOPE maybe in the middle of a function.  */
>+
>+static void
>+ppc_gen_return_address (struct gdbarch *gdbarch,
>+			struct agent_expr *ax, struct axs_value *value,
>+			CORE_ADDR scope)
>+{
>+  struct rs6000_framedata frame;
>+  CORE_ADDR func_addr;
>+
>+  /* Try to find the start of the function and analyze the prologue.  */
>+  if (find_pc_partial_function (scope, NULL, &func_addr, NULL))
>+    {
>+      skip_prologue (gdbarch, func_addr, scope, &frame);
>+
>+      if (frame.lr_offset == 0)
>+	{
>+	  value->type = register_type (gdbarch, PPC_LR_REGNUM);
>+	  value->kind = axs_lvalue_register;
>+	  value->u.reg = PPC_LR_REGNUM;
>+	  return;
>+	}
>+    }
>+  else
>+    {
>+      /* If we don't where the function starts, we cannot analyze it.
>+	 Assuming it's not a leaf function, not frameless, and LR is
>+	 saved at back-chain + 16.  */
>+
>+      frame.frameless = 0;
>+      frame.lr_offset = 16;

This isn't correct for ppc32 ...

>+    }
>+
>+  /* if (frameless)
>+       load 16(SP)
>+     else
>+       BC = 0(SP)
>+       load 16(BC) */

In any case, this code makes many assumptions that may not always be
true.  But then again, the same is true for the i386 case, so maybe this
is OK for now ...   In general, if we have DWARF CFI for the function,
it would be much preferable to refer to that in order to determine the
exact stack layout.


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-01 17:42   ` Wei-cheng Wang
@ 2015-03-17 13:48     ` Ulrich Weigand
  0 siblings, 0 replies; 15+ messages in thread
From: Ulrich Weigand @ 2015-03-17 13:48 UTC (permalink / raw)
  To: Wei-cheng Wang; +Cc: palves, gdb-patches

Wei-cheng Wang wrote:
> On 2015/2/28 Ã¤Â¸ÂŠÃ¥ÂÂˆ 03:52, Ulrich Weigand wrote:
> > The tspeed.exp file already has:
> > # Typically we need a little extra time for this test.
> > set timeout 180
> > Is that still not enough?
> 
> It should include the time spent in trying different loop counts,
> so it would be 11 + 22 + 45 + 90 + 180 = at least 348 seconds in my environment.
> (for 10000, 20000, 40000, 80000, 160000 iterations respectively)
> If I set timeout to 360, the case will pass.

I guess that's OK with me.  Or else we could reduce the number of passes ...

> >> * tfind.exp: One of the tracepoint is inserted at
> >>     `*gdb_recursion_test'.  It's not hit because local-entry is called
> >>     instead.  The 18 FAILs are off-by-one error.
> > This test case seem a bit more complicated, we may need to split it
> > in two parts; one that uses a normal "trace gdb_recursion_test"
> > without the "*", and possibly a second one that specifically tests
> > that "trace *func" works, using a source file that makes sure to
> > call func via a function pointers (as in step-bt.c).
> 
> How about simply change the code to this?  It wouldn't hurt other cases.
> And all the failed cases in tfind.exp now pass.

That should be OK.

> > This is odd, I don't see the point of this either.   Of course, as the
> > comment says, much of this stuff will break anyway if gdbserver is
> > compiled differently than the inferior (e.g. a 64-bit gdbserver
> > debugging a 32-bit inferior), because it assumes the structure layout
> > is identical.  However, if we do have a 32-bit gdbserver, then I don't
> > see why it shouldn't be possible to debug a 32-bit inferior, just
> > because CORE_ADDR is a 64-bit type ...
> 
> For example, CORE_ADDR ptr = 0x11223344, a 32-bit address,
> and sizeof (void *) = 4-byte
> 
>    |------------ 64-bit CORE_ADDR ---------|
>    MSB                                    LSB
>    | 00 | 00 | 00 | 00 | 11 | 22 | 33 | 44 |
>    Low                                    High Address
>    |-- 32-bit(void*) --|
>    &ptr,4 means these zeros are written to inferior.
> 
> static int
> write_inferior_data_ptr (CORE_ADDR where, CORE_ADDR ptr)
> {
>    return write_inferior_memory (where,
>                                  (unsigned char *) &ptr, sizeof (void *));
>                                                    ^^^^  ^^^^^^^^^^^^^^^
> }
> 
> CORE_ADDR is declared as "unsigned long long" for gdbserver
> (in common/gdb/common-types.h)

I understood why this is failing with the code as is, I just didn't
understand why the code is that way today :-)   Given Pedro's comment,
I think we should simply remove that function.

> > Ugh.  That's a strange construct, and extremely dependent on alignment
> > rules (as you noticed).  I'm not really sure what the best way to fix
> > this would be.  My preference right now would be get rid of "ops" on
> > the gdbserver side too, and just switch on "type" in the two places
> > where the ops->send and ops->download routines are called right now.
> >
> > This makes the data structures the same on gdbserver and IPA, which
> > simplifies downloading quite a bit.  (Also, it keeps the data structure
> > identical in IPA, which should avoid compatibility issues between
> > versions.)
>    That sounds great to me!

OK, let's do it that way.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-04 17:13   ` Pedro Alves
@ 2015-03-17 18:12     ` Ulrich Weigand
  2015-03-17 19:03       ` Pedro Alves
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Weigand @ 2015-03-17 18:12 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Wei-cheng Wang, gdb-patches

Pedro Alves wrote:
> On 02/27/2015 07:52 PM, Ulrich Weigand wrote:
> > So I guess there's two ways to fix this.   One would be to change
> > gdbserver to work more like GDB here.  This would involve removing
> > the descriptor->code address conversion in remote.c, and instead
> > performing the conversion in gdbserver's thread_db_enable_reporting.
> > Now, there is no gdbarch_convert_from_func_ptr_addr in gdbserver,
> > so a similar mechanism would have to be invented there.  (I guess
> > this would mean a new target hook.)  Fortunately, the only platform
> > that uses function descriptors *and* supports libthread_db debugging
> > in gdbserver is ppc64-linux, so we'd only have to add that new
> > mechanim on this platform.
> 
> Note sure about this one, ppc64_convert_from_func_ptr_addr wants to
> get at the bfd/binary's unrelocated sections.  We'd have to teach
> gdbserver to read the binary.

That's probably not necessary.  The reason the GDB implementation
does it that way is that it needs to work under various different
circumstances, like when debugging a core file, or before the
dynamic linker has relocated an executable.  For the gdbserver
implementation, we should never need to handle such conditions,
so we are able to simply read the target address from memory.

> (Note for testing: __nptl_create_event will only be used
> on old kernels without PTRACE_EVENT_CLONE, unless you hack the
> code to force usage.)

I wonder why Wei-cheng noticed the problem then ...


Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-17 18:12     ` Ulrich Weigand
@ 2015-03-17 19:03       ` Pedro Alves
  2015-03-18 11:04         ` Ulrich Weigand
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2015-03-17 19:03 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Wei-cheng Wang, gdb-patches

On 03/17/2015 06:12 PM, Ulrich Weigand wrote:
> Pedro Alves wrote:
>> On 02/27/2015 07:52 PM, Ulrich Weigand wrote:
>>> So I guess there's two ways to fix this.   One would be to change
>>> gdbserver to work more like GDB here.  This would involve removing
>>> the descriptor->code address conversion in remote.c, and instead
>>> performing the conversion in gdbserver's thread_db_enable_reporting.
>>> Now, there is no gdbarch_convert_from_func_ptr_addr in gdbserver,
>>> so a similar mechanism would have to be invented there.  (I guess
>>> this would mean a new target hook.)  Fortunately, the only platform
>>> that uses function descriptors *and* supports libthread_db debugging
>>> in gdbserver is ppc64-linux, so we'd only have to add that new
>>> mechanim on this platform.
>>
>> Note sure about this one, ppc64_convert_from_func_ptr_addr wants to
>> get at the bfd/binary's unrelocated sections.  We'd have to teach
>> gdbserver to read the binary.
> 
> That's probably not necessary.  The reason the GDB implementation
> does it that way is that it needs to work under various different
> circumstances, like when debugging a core file, or before the
> dynamic linker has relocated an executable.  For the gdbserver
> implementation, we should never need to handle such conditions,
> so we are able to simply read the target address from memory.
> 

Maybe not cores today, but why doesn't gdbserver have to
handle the case of connecting before the executable has been
relocated?

I also wonder about all the break-interp.exp corner cases.

>> (Note for testing: __nptl_create_event will only be used
>> on old kernels without PTRACE_EVENT_CLONE, unless you hack the
>> code to force usage.)
> 
> I wonder why Wei-cheng noticed the problem then ...

I think he is seeing the problem with the function symbol look ups
gdbserver's tracepoints module does (tracepoint_look_up_symbols),
and that in that case he needs to get the function descriptor
instead of the start of code address?  From your previous explanation
I understand that the __nptl_create_event breakpoint (when used)
is set correctly because what gdbserver needs in that case is the start
of code address, which is what remote.c returns.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-17 19:03       ` Pedro Alves
@ 2015-03-18 11:04         ` Ulrich Weigand
  2015-03-18 16:07           ` Pedro Alves
  0 siblings, 1 reply; 15+ messages in thread
From: Ulrich Weigand @ 2015-03-18 11:04 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Wei-cheng Wang, gdb-patches

Pedro Alves wrote:
> On 03/17/2015 06:12 PM, Ulrich Weigand wrote:
> > That's probably not necessary.  The reason the GDB implementation
> > does it that way is that it needs to work under various different
> > circumstances, like when debugging a core file, or before the
> > dynamic linker has relocated an executable.  For the gdbserver
> > implementation, we should never need to handle such conditions,
> > so we are able to simply read the target address from memory.
> > 
> 
> Maybe not cores today, but why doesn't gdbserver have to
> handle the case of connecting before the executable has been
> relocated?
> 
> I also wonder about all the break-interp.exp corner cases.

gdbserver would access function descriptors only for the
__nptl_create_event etc. routines.  These are looked up
only after a libthread_db td_ta_new_p call succeeds, which
should only be true if libpthread has been loaded (and
relocated) in the inferior.  If it hasn't been yet at the
time gdbserver attaches, the whole thread initialization
sequence is defered until after the new_objfile event that
happens after libpthread *was* loaded and relocated.
Am I missing something here?

Maybe if in the future additional function descriptor lookups
are added to gdbserver, we could run into that issue.

In any case, the other solution is probably better anyway.


> >> (Note for testing: __nptl_create_event will only be used
> >> on old kernels without PTRACE_EVENT_CLONE, unless you hack the
> >> code to force usage.)
> > 
> > I wonder why Wei-cheng noticed the problem then ...
> 
> I think he is seeing the problem with the function symbol look ups
> gdbserver's tracepoints module does (tracepoint_look_up_symbols),
> and that in that case he needs to get the function descriptor
> instead of the start of code address?  From your previous explanation
> I understand that the __nptl_create_event breakpoint (when used)
> is set correctly because what gdbserver needs in that case is the start
> of code address, which is what remote.c returns.

Ah, of course.  Sorry for the confusion.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-18 11:04         ` Ulrich Weigand
@ 2015-03-18 16:07           ` Pedro Alves
  2015-03-18 16:53             ` Ulrich Weigand
  0 siblings, 1 reply; 15+ messages in thread
From: Pedro Alves @ 2015-03-18 16:07 UTC (permalink / raw)
  To: Ulrich Weigand; +Cc: Wei-cheng Wang, gdb-patches

On 03/18/2015 11:04 AM, Ulrich Weigand wrote:
> Pedro Alves wrote:
>> On 03/17/2015 06:12 PM, Ulrich Weigand wrote:
>>> That's probably not necessary.  The reason the GDB implementation
>>> does it that way is that it needs to work under various different
>>> circumstances, like when debugging a core file, or before the
>>> dynamic linker has relocated an executable.  For the gdbserver
>>> implementation, we should never need to handle such conditions,
>>> so we are able to simply read the target address from memory.
>>>
>>
>> Maybe not cores today, but why doesn't gdbserver have to
>> handle the case of connecting before the executable has been
>> relocated?
>>
>> I also wonder about all the break-interp.exp corner cases.
> 
> gdbserver would access function descriptors only for the
> __nptl_create_event etc. routines.  These are looked up
> only after a libthread_db td_ta_new_p call succeeds, which
> should only be true if libpthread has been loaded (and
> relocated) in the inferior.  If it hasn't been yet at the
> time gdbserver attaches, the whole thread initialization
> sequence is defered until after the new_objfile event that
> happens after libpthread *was* loaded and relocated.
> Am I missing something here?

You're missing the case of statically linked threaded
programs.  AFAICS, on x86-64, libthread_db.so is loaded
successfully on initial connection, and if I hack gdbserver
to use __nptl_create_event events, I see it setting the
breakpoint already on initial connection.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] Fast tracepoint for powerpc64le
  2015-03-18 16:07           ` Pedro Alves
@ 2015-03-18 16:53             ` Ulrich Weigand
  0 siblings, 0 replies; 15+ messages in thread
From: Ulrich Weigand @ 2015-03-18 16:53 UTC (permalink / raw)
  To: Pedro Alves; +Cc: Wei-cheng Wang, gdb-patches

Pedro Alves wrote:
> On 03/18/2015 11:04 AM, Ulrich Weigand wrote:
> > gdbserver would access function descriptors only for the
> > __nptl_create_event etc. routines.  These are looked up
> > only after a libthread_db td_ta_new_p call succeeds, which
> > should only be true if libpthread has been loaded (and
> > relocated) in the inferior.  If it hasn't been yet at the
> > time gdbserver attaches, the whole thread initialization
> > sequence is defered until after the new_objfile event that
> > happens after libpthread *was* loaded and relocated.
> > Am I missing something here?
> 
> You're missing the case of statically linked threaded
> programs.  AFAICS, on x86-64, libthread_db.so is loaded
> successfully on initial connection, and if I hack gdbserver
> to use __nptl_create_event events, I see it setting the
> breakpoint already on initial connection.

Hmm, I would have thought that in a statically linked
executable, function descriptors would need no relocation.
However, I guess that isn't true when using PIE ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3 v2] Fast tracepoint for powerpc64le
  2015-03-17 13:34   ` Ulrich Weigand
@ 2015-03-29 19:27     ` Wei-cheng Wang
  2015-04-08 16:49       ` Ulrich Weigand
  0 siblings, 1 reply; 15+ messages in thread
From: Wei-cheng Wang @ 2015-03-29 19:27 UTC (permalink / raw)
  To: Ulrich Weigand, palves, gdb-patches

Hi Ulrich and Pedro,

Thank you for reviewing my patch, the suggestions are really helpful.

Sorry for a very late reply. I wasn't free until this weekend.
I've almost finished the new patches as you suggested, and I probably will propose
the new patch set by tomorrow.

Just a few comments for now.

 >> +static void
 >> +ppc64_emit_reg (int reg)
 >> +{
 >> +  unsigned char buf[10 * 4];
 >> +  unsigned char *p = buf;
 >> +
 >> +  p += GEN_LD (p, 3, 31, bc_framesz - 32);
 >> +  p += GEN_LD (p, 3, 3, 48);	/* offsetof (fast_tracepoint_ctx, regs) */
 >
 > This seems a bit fragile, it would be better to determine the offset
 > automatically ...   (I don't quite understand why the x86 code works
 > either, as it is right now ...)

Hi Predro,

I checked the implementation of x86 emit_reg and it seems the implementation
is wrong.  It assumes the first argument is ctx->regs, but it's actually 'ctx'

if (tpoint->compiled_cond)
   err = ((condfn) (uintptr_t) (tpoint->compiled_cond)) (ctx, &value);

I think probably either we could pass ctx->regs to compiled_cond instead,
or move the declarations of fast_tracepoint_ctx (and others) to tracepoint.h,
so we can use "offsetof (fast_tracepoint_ctx, regs)" instead.
Any suggestion?



The following are specific to PowerPC.

 >> +  if ((imm >> 8) == 0)
 >> +    {
 >> +      /* li	reg, imm[7:0] */
 >> +      p += GEN_LI (p, reg, imm);
 > Actually, you can load values up to 32767 with a single LI.

How about this fix?  So we can load a small negative number with LI.

if ((imm + 32768) < 65536)
   {
     /* li     reg, imm[7:0] */
     p += GEN_LI (p, reg, imm);
   }

 >> p += gen_atomic_xchg (p, lockaddr, 0, 1);
 >> /* Call to collector.  */
 >> p += gen_call (p, collector);
 >> p += gen_atomic_xchg (p, lockaddr, 1, 0);
 > This seems wrong.  Shouldn't *lockaddr be set to the address
 > of a collecting_t object, and not just "1"?

AFAIK, lockaddr only matters to the two lines above,
so simply put '1' for LOCKED should be fine.  Or am I missing anything?

 >> +  /* Now, insert the original instruction to execute in the jump pad.  */
 >> +  *adjusted_insn_addr = buildaddr + (p - buf);
 >> +  *adjusted_insn_addr_end = *adjusted_insn_addr;
 >> +  relocate_instruction (adjusted_insn_addr_end, tpaddr);
 > Hmm.  This calls back to GDB to perform the relocation of the
 > original instruction.  On PowerPC, there are only a handful of
 > instructions that need to be relocated; I'm not sure it is really
 > necessary to call back to GDB.  Can't those just be relocated
 > directly here?   This might even make the code simpler overall.

I just follow the design of Predo.
I could move the code to gdbserver side if you suggest.

 > Note that the stack frame layout as above only applies to ELFv1, but
 > you're actually only supporting ELFv2 at the moment.  For ELFv2, there
 > is no parameter save area (for this specific call), there is no compiler
 > or linker doubleword, and the TOC save area is at SP + 24.  (So this
 > location probably shouldn't be used to save something else ...)
 > This is also a bit bigger than required for ELFv2.  On the other hand,
 > having a larger buffer doesn't hurt.

Oops, I have to admit I looked into the wrong ABI document.
Hopefully we will support ppc64be soon, so I suggest still use 112-byte for
minimual frame size for simplicity?

 >> +/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
 >> +   to the number of bytes the target should copy elsewhere for the
 >> +   tracepoint.  */
 >> +
 >> +static int
 >> +ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
 >> +			      CORE_ADDR addr, int *isize, char **msg)
 >> +{
 >> +  if (isize)
 >> +    *isize = gdbarch_max_insn_length (gdbarch);
 >> +  if (msg)
 >> +    *msg = NULL;
 >> +  return 1;
 >> +}
 >
 > Should/can we check here where the jump to the jump pad will be in
 > range?  Might be better to detect this early ...

Client has no idea about where the jump pad will be installed.
If it's out of range, gdbserver will report it right after user
entered 'tstart' command

 > Well, we really should handle the other cases too; there's no reason
 > to simply fail if this happens to be a branch on count or such ...

In the new patch, I will handle other cases as such,

   bdnz   .Lgoto
1:INSN1

is transform to

   bdz
   b      .Lgoto
1:INSN1

   and

   bdnzt  eq, .Lgoto

is transfrom to

   bdz    1f (+12)
   bf     eq, 1f (+8)
   b      .Lgoto
1:INSN1

Is it right?

 >> +      frame.frameless = 0;
 >> +      frame.lr_offset = 16;
 >
 > This isn't correct for ppc32 ...

frame.lr_offset = 4;
Right?

 > In any case, this code makes many assumptions that may not always be
 > true.  But then again, the same is true for the i386 case, so maybe this
 > is OK for now ...   In general, if we have DWARF CFI for the function,
 > it would be much preferable to refer to that in order to determine the
 > exact stack layout.

This function will only be used if user want to collect return address
(using 'collect $_ret' action command)

I agree that we should use DWARF CFI, but I have no idea how can I
get the information in this function.  Any suggestion where I can look into?

Thanks,
Wei-cheng

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3 v2] Fast tracepoint for powerpc64le
  2015-03-29 19:27     ` Wei-cheng Wang
@ 2015-04-08 16:49       ` Ulrich Weigand
  0 siblings, 0 replies; 15+ messages in thread
From: Ulrich Weigand @ 2015-04-08 16:49 UTC (permalink / raw)
  To: Wei-cheng Wang; +Cc: palves, gdb-patches

Wei-cheng Wang wrote:

> >> +static void
> >> +ppc64_emit_reg (int reg)
> >> +{
> >> +  unsigned char buf[10 * 4];
> >> +  unsigned char *p = buf;
> >> +
> >> +  p += GEN_LD (p, 3, 31, bc_framesz - 32);
> >> +  p += GEN_LD (p, 3, 3, 48);	/* offsetof (fast_tracepoint_ctx, regs) */
> >
> > This seems a bit fragile, it would be better to determine the offset
> > automatically ...   (I don't quite understand why the x86 code works
> > either, as it is right now ...)
>
>Hi Predro,
>
>I checked the implementation of x86 emit_reg and it seems the implementation
>is wrong.  It assumes the first argument is ctx->regs, but it's actually 'ctx'
>
>if (tpoint->compiled_cond)
>   err = ((condfn) (uintptr_t) (tpoint->compiled_cond)) (ctx, &value);
>
>I think probably either we could pass ctx->regs to compiled_cond instead,
>or move the declarations of fast_tracepoint_ctx (and others) to tracepoint.h,
>so we can use "offsetof (fast_tracepoint_ctx, regs)" instead.
>Any suggestion?

FWIW, passing the regs buffer directly to the compiled routine seems
more straightforward to me ...

>The following are specific to PowerPC.
>
> >> +  if ((imm >> 8) == 0)
> >> +    {
> >> +      /* li	reg, imm[7:0] */
> >> +      p += GEN_LI (p, reg, imm);
> > Actually, you can load values up to 32767 with a single LI.
>
>How about this fix?  So we can load a small negative number with LI.
>
>if ((imm + 32768) < 65536)
>   {
>     /* li     reg, imm[7:0] */
>     p += GEN_LI (p, reg, imm);
>   }

That looks good to me.

> >> p += gen_atomic_xchg (p, lockaddr, 0, 1);
> >> /* Call to collector.  */
> >> p += gen_call (p, collector);
> >> p += gen_atomic_xchg (p, lockaddr, 1, 0);
> > This seems wrong.  Shouldn't *lockaddr be set to the address
> > of a collecting_t object, and not just "1"?
>
>AFAIK, lockaddr only matters to the two lines above,
>so simply put '1' for LOCKED should be fine.  Or am I missing anything?

Yes, the lockaddr is used from the gdbserver side.  See the uses of
ipa_sym_addrs.addr_collecting in tracepoint.c; it is expected that
this value is either NULL or else points to a collecting_t object
that can be read/written by gdbserver.

> >> +  /* Now, insert the original instruction to execute in the jump pad.  */
> >> +  *adjusted_insn_addr = buildaddr + (p - buf);
> >> +  *adjusted_insn_addr_end = *adjusted_insn_addr;
> >> +  relocate_instruction (adjusted_insn_addr_end, tpaddr);
> > Hmm.  This calls back to GDB to perform the relocation of the
> > original instruction.  On PowerPC, there are only a handful of
> > instructions that need to be relocated; I'm not sure it is really
> > necessary to call back to GDB.  Can't those just be relocated
> > directly here?   This might even make the code simpler overall.
>
>I just follow the design of Predo.
>I could move the code to gdbserver side if you suggest.

I think if there is no benefit in having this code on the GDB side,
it would be better to move it to gdbsever.  (Potential benefits could
be: we need information that isn't available in gdbserver, like symbol
data; or the code can be shared with other users if in GDB.  But on
PowerPC, I think none of this applies.)

> > Note that the stack frame layout as above only applies to ELFv1, but
> > you're actually only supporting ELFv2 at the moment.  For ELFv2, there
> > is no parameter save area (for this specific call), there is no compiler
> > or linker doubleword, and the TOC save area is at SP + 24.  (So this
> > location probably shouldn't be used to save something else ...)
> > This is also a bit bigger than required for ELFv2.  On the other hand,
> > having a larger buffer doesn't hurt.
>
>Oops, I have to admit I looked into the wrong ABI document.
>Hopefully we will support ppc64be soon, so I suggest still use 112-byte for
>minimual frame size for simplicity?

Yes, as I said, just having a larger frame is OK.  However, there are
other places where the ABI matters (like not overwriting the TOC save
area, or like respecting the (lack of) stack red zone ...).

> >> +/* Return true if ADDR is a valid address for tracepoint.  Set *ISZIE
> >> +   to the number of bytes the target should copy elsewhere for the
> >> +   tracepoint.  */
> >> +
> >> +static int
> >> +ppc_fast_tracepoint_valid_at (struct gdbarch *gdbarch,
> >> +			      CORE_ADDR addr, int *isize, char **msg)
> >> +{
> >> +  if (isize)
> >> +    *isize = gdbarch_max_insn_length (gdbarch);
> >> +  if (msg)
> >> +    *msg = NULL;
> >> +  return 1;
> >> +}
> >
> > Should/can we check here where the jump to the jump pad will be in
> > range?  Might be better to detect this early ...
>
>Client has no idea about where the jump pad will be installed.
>If it's out of range, gdbserver will report it right after user
>entered 'tstart' command

Well, but we know the logic the stub uses.  For example, we know that
we certainly cannot install a fast tracepoint in any shared library code,
since the jump pad will definitely be too far away.  We can check for
this condition here.  (We could also check for tracepoints in executables
that have a text section larger than 32 MB ...)

> > Well, we really should handle the other cases too; there's no reason
> > to simply fail if this happens to be a branch on count or such ...
>
>In the new patch, I will handle other cases as such,
>
>   bdnz   .Lgoto
>1:INSN1
>
>is transform to
>
>   bdz
>   b      .Lgoto
>1:INSN1
>
>   and
>
>   bdnzt  eq, .Lgoto
>
>is transfrom to
>
>   bdz    1f (+12)
>   bf     eq, 1f (+8)
>   b      .Lgoto
>1:INSN1
>
>Is it right?

That looks OK, thanks.

> >> +      frame.frameless = 0;
> >> +      frame.lr_offset = 16;
> >
> > This isn't correct for ppc32 ...
>
>frame.lr_offset = 4;
>Right?

Correct.

> > In any case, this code makes many assumptions that may not always be
> > true.  But then again, the same is true for the i386 case, so maybe this
> > is OK for now ...   In general, if we have DWARF CFI for the function,
> > it would be much preferable to refer to that in order to determine the
> > exact stack layout.
>
>This function will only be used if user want to collect return address
>(using 'collect $_ret' action command)
>
>I agree that we should use DWARF CFI, but I have no idea how can I
>get the information in this function.  Any suggestion where I can look into?

This is probably going to require major changes, so it looks like something
for a separate project.  In general, using DWARF CFI to unwind a frame is
currently done in dwarf2-frame.c, with the bulk of the work done in
dwarf2_frame_cache and dwarf2_frame_prev_register.  This conceptually
involves two stages: looking up the FDE from the current PC (which can
be done using just the executable file) to determine the "recipe" how
to unwind at this location, and then implement the recipe, which involves
using current register values and memory content.  Unfortunately, those
two stages are somewhat interwoven and not cleanly separated in the
current implemention, which makes it not directly usable for tracepoints.

What we should be doing is to actually implement a clean separation,
so that the work to determine the unwind "recipe" can be shared between
live unwinding and tracepoints, and then implement a tracepoint-specific
step that uses the recipe to generate a series of agent commands that
will implement "running" the recipe in the context of the agent.  This
has already been done for the use of DWARF that describes variable
locations (see locexpr_tracepoint_var_ref in dwarf2loc.c), but not yet
for the use of DWARF to unwind.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  Ulrich.Weigand@de.ibm.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-04-08 16:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-20 18:04 [PATCH 1/2] Fast tracepoint for powerpc64le Wei-cheng Wang
2015-02-25 15:20 ` [PATCH 1/3 v2] " Wei-cheng Wang
2015-03-17 13:34   ` Ulrich Weigand
2015-03-29 19:27     ` Wei-cheng Wang
2015-04-08 16:49       ` Ulrich Weigand
2015-02-27 19:53 ` [PATCH 1/2] " Ulrich Weigand
2015-03-01 17:42   ` Wei-cheng Wang
2015-03-17 13:48     ` Ulrich Weigand
2015-03-04 17:13   ` Pedro Alves
2015-03-17 18:12     ` Ulrich Weigand
2015-03-17 19:03       ` Pedro Alves
2015-03-18 11:04         ` Ulrich Weigand
2015-03-18 16:07           ` Pedro Alves
2015-03-18 16:53             ` Ulrich Weigand
2015-03-04 17:22 ` Pedro Alves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).