From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 81089 invoked by alias); 28 Aug 2019 21:26:35 -0000 Mailing-List: contact systemtap-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: systemtap-owner@sourceware.org Received: (qmail 81011 invoked by uid 48); 28 Aug 2019 21:26:29 -0000 From: "sapatel at redhat dot com" To: systemtap@sourceware.org Subject: [Bug bpf/24926] non-ascii characters not printing on stapbpf Date: Wed, 28 Aug 2019 21:26:00 -0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: systemtap X-Bugzilla-Component: bpf X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: sapatel at redhat dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: systemtap at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2019-q3/txt/msg00050.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D24926 --- Comment #3 from Sagar Patel --- The root issue is that the immediate values carried by instructions only support 32-bit integers, and the compiler performs some problematic fixups.= =20 When strings are rolled into their byte representations, a generic value st= ruct is used. This struct holds an int64_t immediate value which represents the string and is used as the src1 register in the insn struct. However, the in= sn struct only supports 32-bit integers in its src1 registers.=20 There is a check in fixup_operands that checks for these mismatches:=20 s1->imm() !=3D (int32_t) s1->imm() Normally, the first bit of a character would just be 0, since ASCII only go= es upto 127. Consequently, when strings with plain ASCII characters get conver= ted to an immediate value, the most significant bit is not set and no fixups are done because the above check is satisfied. However, for UTF-8 characters, the most significant bit may be set causing = the compiler to perform some fixups. In the fixups, an intermediate register is used to store and load and the string bytes. The original insn that was generated for the strings uses the 0x62 opcode: stw [dst+off], imm. But, si= nce we have an intermediate register holding the data, we need to use the 0x63 opcode instead: stxw [dst+off], src. The changing of the opcode from 0x62 to 0x63 was missing, and with that cha= nge, it seems that all UTF-8 characters can be printed. --=20 You are receiving this mail because: You are the assignee for the bug.