From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from jupiter.monnerat.net (jupiter.monnerat.net [46.226.111.226]) by sourceware.org (Postfix) with ESMTPS id 32EA23938C2C for ; Tue, 11 Jan 2022 19:43:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 32EA23938C2C Received: from [192.168.0.128] ([192.168.0.128]) by jupiter.monnerat.net (8.14.8/8.14.8) with ESMTP id 20BJgtWf031526 (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 11 Jan 2022 20:43:00 +0100 DKIM-Filter: OpenDKIM Filter v2.10.3 jupiter.monnerat.net 20BJgtWf031526 Message-ID: <9f0b4f22-94b2-9954-dcb7-7f5abdeb4e3d@monnerat.net> Date: Tue, 11 Jan 2022 20:42:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH 4/4] gdb/python: handle non utf-8 characters when source highlighting Content-Language: en-US To: gdb-patches@sourceware.org References: <825abc2257c992be90af28973c54f98e7cf4371f.1641565040.git.aburgess@redhat.com> <87czkykre1.fsf@tromey.com> From: Patrick Monnerat In-Reply-To: <87czkykre1.fsf@tromey.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, JMQ_SPF_NEUTRAL, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 Jan 2022 19:43:06 -0000 On 1/11/22 20:24, Tom Tromey wrote: >>>>>> "Andrew" == Andrew Burgess via Gdb-patches writes: > Andrew> We could try and make GDB smarter when it comes to converting C > Andrew> strings into Python Unicode objects; this would probably require us to > Andrew> just try a couple of different encoding schemes rather than just > Andrew> giving up after utf-8. > > Perhaps it should be using the host charset here. > > Anyway, FWIW, I think this patch looks reasonable. > I did not follow all the discussion, but did you consider using surrogate escapes (https://docs.python.org/3/library/codecs.html#error-handlers) ? I used that in RabbitCVS with quite good results. Just my 2 cents, Patrick