public inbox for systemtap@sourceware.org
 help / color / mirror / Atom feed
* [Bug bpf/24926] New: non-ascii characters not printing on stapbpf
@ 2019-08-21 16:10 sapatel at redhat dot com
  2019-08-21 22:05 ` [Bug bpf/24926] " me at serhei dot io
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: sapatel at redhat dot com @ 2019-08-21 16:10 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24926

            Bug ID: 24926
           Summary: non-ascii characters not printing on stapbpf
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: bpf
          Assignee: systemtap at sourceware dot org
          Reporter: sapatel at redhat dot com
  Target Milestone: ---

All unicode characters seem to be printing on the normal Systemtap backend but,
not on stapbpf, which seems to be limited to ASCII characters only. 

This can be seen with the following scripts: 

'probe oneshot { print("官话") }'
'probe oneshot { print("संस्कृतम्") }'

After some discussion with serhei, it was noted that that stapbpf might be
assuming that a single character is a single byte.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug bpf/24926] non-ascii characters not printing on stapbpf
  2019-08-21 16:10 [Bug bpf/24926] New: non-ascii characters not printing on stapbpf sapatel at redhat dot com
@ 2019-08-21 22:05 ` me at serhei dot io
  2019-08-26 18:04 ` sapatel at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: me at serhei dot io @ 2019-08-21 22:05 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24926

Serhei Makarov <me at serhei dot io> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |me at serhei dot io

--- Comment #1 from Serhei Makarov <me at serhei dot io> ---
UTF8 bytes are being corrupted in emit_simple_literal_str() due to an
unexpected sign extension for negative char values:

char c = 0xe5;
uint64_t b = c; // b becomes ffffffffffffffe5

Adding a cast to unsigned char seems to fix the problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug bpf/24926] non-ascii characters not printing on stapbpf
  2019-08-21 16:10 [Bug bpf/24926] New: non-ascii characters not printing on stapbpf sapatel at redhat dot com
  2019-08-21 22:05 ` [Bug bpf/24926] " me at serhei dot io
@ 2019-08-26 18:04 ` sapatel at redhat dot com
  2019-08-28 21:26 ` sapatel at redhat dot com
  2019-09-03 20:07 ` sapatel at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: sapatel at redhat dot com @ 2019-08-26 18:04 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24926

--- Comment #2 from Sagar Patel <sapatel at redhat dot com> ---
The unexpected sign extension fix seems to fix some cases such as:

'probe oneshot { print("官") }' 

However, longer sequences of Unicode characters are still not printing.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug bpf/24926] non-ascii characters not printing on stapbpf
  2019-08-21 16:10 [Bug bpf/24926] New: non-ascii characters not printing on stapbpf sapatel at redhat dot com
  2019-08-21 22:05 ` [Bug bpf/24926] " me at serhei dot io
  2019-08-26 18:04 ` sapatel at redhat dot com
@ 2019-08-28 21:26 ` sapatel at redhat dot com
  2019-09-03 20:07 ` sapatel at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: sapatel at redhat dot com @ 2019-08-28 21:26 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24926

--- Comment #3 from Sagar Patel <sapatel at redhat dot com> ---
The root issue is that the immediate values carried by instructions only
support 32-bit integers, and the compiler performs some problematic fixups. 

When strings are rolled into their byte representations, a generic value struct
is used. This struct holds an int64_t immediate value which represents the
string and is used as the src1 register in the insn struct. However, the insn
struct only supports 32-bit integers in its src1 registers. 

There is a check in fixup_operands that checks for these mismatches: 

s1->imm() != (int32_t) s1->imm()


Normally, the first bit of a character would just be 0, since ASCII only goes
upto 127. Consequently, when strings with plain ASCII characters get converted
to an immediate value, the most significant bit is not set and no fixups are
done because the above check is satisfied.

However, for UTF-8 characters, the most significant bit may be set causing the
compiler to perform some fixups. In the fixups, an intermediate register is
used to store and load and the string bytes. The original insn that was
generated for the strings uses the 0x62 opcode: stw [dst+off], imm. But, since
we have an intermediate register holding the data, we need to use the 0x63
opcode instead: stxw [dst+off], src.

The changing of the opcode from 0x62 to 0x63 was missing, and with that change,
it seems that all UTF-8 characters can be printed.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug bpf/24926] non-ascii characters not printing on stapbpf
  2019-08-21 16:10 [Bug bpf/24926] New: non-ascii characters not printing on stapbpf sapatel at redhat dot com
                   ` (2 preceding siblings ...)
  2019-08-28 21:26 ` sapatel at redhat dot com
@ 2019-09-03 20:07 ` sapatel at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: sapatel at redhat dot com @ 2019-09-03 20:07 UTC (permalink / raw)
  To: systemtap

https://sourceware.org/bugzilla/show_bug.cgi?id=24926

Sagar Patel <sapatel at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Sagar Patel <sapatel at redhat dot com> ---
Fixed in commit a58390d23.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-09-03 20:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-21 16:10 [Bug bpf/24926] New: non-ascii characters not printing on stapbpf sapatel at redhat dot com
2019-08-21 22:05 ` [Bug bpf/24926] " me at serhei dot io
2019-08-26 18:04 ` sapatel at redhat dot com
2019-08-28 21:26 ` sapatel at redhat dot com
2019-09-03 20:07 ` sapatel at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).