From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <patrick@monnerat.net>
Received: from jupiter.monnerat.net (jupiter.monnerat.net [46.226.111.226])
 by sourceware.org (Postfix) with ESMTPS id 32EA23938C2C
 for <gdb-patches@sourceware.org>; Tue, 11 Jan 2022 19:43:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 32EA23938C2C
Received: from [192.168.0.128] ([192.168.0.128])
 by jupiter.monnerat.net (8.14.8/8.14.8) with ESMTP id 20BJgtWf031526
 (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256 verify=OK)
 for <gdb-patches@sourceware.org>; Tue, 11 Jan 2022 20:43:00 +0100
DKIM-Filter: OpenDKIM Filter v2.10.3 jupiter.monnerat.net 20BJgtWf031526
Message-ID: <9f0b4f22-94b2-9954-dcb7-7f5abdeb4e3d@monnerat.net>
Date: Tue, 11 Jan 2022 20:42:54 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.4.0
Subject: Re: [PATCH 4/4] gdb/python: handle non utf-8 characters when source
 highlighting
Content-Language: en-US
To: gdb-patches@sourceware.org
References: <cover.1641565040.git.aburgess@redhat.com>
 <825abc2257c992be90af28973c54f98e7cf4371f.1641565040.git.aburgess@redhat.com>
 <87czkykre1.fsf@tromey.com>
From: Patrick Monnerat <patrick@monnerat.net>
In-Reply-To: <87czkykre1.fsf@tromey.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, NICE_REPLY_A,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Jan 2022 19:43:06 -0000


On 1/11/22 20:24, Tom Tromey wrote:
>>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:
> Andrew> We could try and make GDB smarter when it comes to converting C
> Andrew> strings into Python Unicode objects; this would probably require us to
> Andrew> just try a couple of different encoding schemes rather than just
> Andrew> giving up after utf-8.
>
> Perhaps it should be using the host charset here.
>
> Anyway, FWIW, I think this patch looks reasonable.
>
I did not follow all the discussion, but did you consider using 
surrogate escapes 
(https://docs.python.org/3/library/codecs.html#error-handlers) ?

I used that in RabbitCVS with quite good results.

Just my 2 cents,

Patrick