From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30352 invoked by alias); 19 Mar 2010 13:26:35 -0000 Received: (qmail 30342 invoked by uid 22791); 19 Mar 2010 13:26:35 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SPF_FAIL X-Spam-Check-By: sourceware.org Received: from mx20.gnu.org (HELO mx20.gnu.org) (199.232.41.8) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 19 Mar 2010 13:26:31 +0000 Received: from mail.codesourcery.com ([38.113.113.100]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NscDc-0000el-Pv for gcc-patches@gcc.gnu.org; Fri, 19 Mar 2010 09:26:29 -0400 Received: (qmail 16161 invoked from network); 19 Mar 2010 13:26:26 -0000 Received: from unknown (HELO digraph.polyomino.org.uk) (joseph@127.0.0.2) by mail.codesourcery.com with ESMTPA; 19 Mar 2010 13:26:26 -0000 Received: from jsm28 (helo=localhost) by digraph.polyomino.org.uk with local-esmtp (Exim 4.69) (envelope-from ) id 1NscDZ-00082V-0D; Fri, 19 Mar 2010 13:26:25 +0000 Date: Fri, 19 Mar 2010 13:44:00 -0000 From: "Joseph S. Myers" To: Shujing Zhao cc: GCC Patches , Paolo Carlini Subject: Re: [PATCH PR/42686] Align the help text output In-Reply-To: <4BA35936.8070304@oracle.com> Message-ID: References: <4B863559.4000407@oracle.com> <4B976D51.8010705@oracle.com> <4B9A1E65.4080506@oracle.com> <4B9DFA37.9010303@oracle.com> <4BA07D89.3040406@oracle.com> <4BA2010C.9030103@oracle.com> <4BA35936.8070304@oracle.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-detected-operating-system: by mx20.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2010-03/txt/msg00872.txt.bz2 On Fri, 19 Mar 2010, Shujing Zhao wrote: > > Why this increment? Characters may have width 0 (combining characters) or > > more than 1 and it's not clear why adding 1 should be the correct thing to > > do here. > It is changed to *length--* at the new logic. > + colwidth += wcwidth (wc[0]); > + length--; > + if (length == 0) > + len = remaining; > I have tried to used length = length - nbytes to get the new length. But if > nbytes > 1 and the character is the last character of the string (length will > be 0 if it decreases nbytes), it would not assign the last variable *len* to > *len_bk*, then the character that maybe put into the ROOM would not be > printed. That is to say, it presumes the nbytes is 1. If (nbytes > 1), it > would be handled specially (just to update the value of len_bk). At the other > conditionals, len_bk is always kept consistently with len. I'm afraid none of this makes any sense to me, because I have no idea what "length", "len" and "len_bk" are meant to be, and how they differ. You need clearer variable names. I think you also need comments at various points stating what the values various variables mean at those points, and maybe an overview comment about the algorithm used. It would be best to avoid ambiguous terms such as "length" at all points in variable names and comments, instead talking about numbers of bytes difference between two points, or similar. For example, the present algorithm in opts.c:wrap_help might be described thus: /* Look for a suitable point at which to break HELP. If there is one after at most ROOM columns, take the latest such point. Otherwise, take the first such point in the string, or the end of the string if it cannot be broken. */ If the new algorithm is still like that, you may just need to update variable names (and explain what the interface is for indicating both how much of the string to print, and how much (spaces) to skip before starting on the next line). The key variables to explain are LEN and I. At each iteration in the present loop, something like this might explain them. /* The best candidate point found so far to break HELP is after LEN bytes, where either LEN < I, or LEN == REMAINING if the string cannot be broken within the first I bytes. If we are outside ROOM columns and have found somewhere to break the string, that is the best point available; otherwise, breaking after I bytes will be better if that is a valid point to break. */ Now, things will be a bit more complicated in that you need to track the current column width as well as the current number of bytes of the string that take up that width. But I think the basic idea is still the same - it is just that "length", "len" and "len_bk" are not useful variable names to distinguish the different quantities being tracked and so make clear whether particular arithmetic you are doing on them makes logical sense. I remain very suspicious of anything modifying a byte counter by 1 in this context; if you are ever modifying a byte counter by something other than the number of bytes in the relevant multibyte character I think you are confused and need to sort out the logic of the code so you don't need to do so. -- Joseph S. Myers joseph@codesourcery.com