[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] Fix error: array subscript has type 'char'
On Thu, Jan 14, 2021 at 03:16:15PM +0100, Manuel Bouyer wrote: > On Thu, Jan 14, 2021 at 02:25:05PM +0100, Jan Beulich wrote: > > On 14.01.2021 13:29, Manuel Bouyer wrote: > > > On Thu, Jan 14, 2021 at 11:53:20AM +0100, Jan Beulich wrote: > > >> On 12.01.2021 19:12, Manuel Bouyer wrote: > > >>> From: Manuel Bouyer <bouyer@xxxxxxxxxx> > > >>> > > >>> Use unsigned char variable, or cast to (unsigned char), for > > >>> tolower()/islower() and friends. Fix compiler error > > >>> array subscript has type 'char' [-Werror=char-subscripts] > > >> > > >> Isn't this something that wants changing in your ctype.h instead? > > >> the functions (or macros), as per the C standard, ought to accept > > >> plain char aiui, and if they use the input as an array subscript, > > >> it should be their implementation suitably converting type first. > > > > > > I asked for inputs from NetBSD developers familiar with this part. > > > > > > Although the parameter is an int, only a subset of values is valid, > > > as stated in ISO C 2018 (Section 7.4 paragrah 1): > > >> In all cases the argument is an int, the value of which shall be > > >> representable as an unsigned char or shall equal the value of the > > >> macro EOF. If the argument has any other value, the behavior is > > >> undefined. > > > > Which means you're shifting the undefined-ness from the implementation > > (using the value as array index) to the callers (truncating values, or > > converting value range). In particular I do not think that the > > intention behind the standard's wording is for every caller to need to > > cast to unsigned char, when e.g. character literals are of type char > > and string literals are of type const char[]. > > If you don't cast you fall into the undefined behavior case for non-ascii > characters. For example, "é" in iso-latin-1 is 0xe9. In a signed char, this is > -23 (decimal). without the cast, tolower() will see -23. > If it is casted to (unsigned char) first, tolower() will see 233, as expected. > The following test program illustrates this: > armandeche:/tmp>cat toto.c > #include <stdio.h> > > int > main(int _c, const char *_v[]) > { > char c = 0xe9; > printf("%d %d\n", (int)c, (int)(unsigned char)c); > } > armandeche:/tmp>./toto > -23 233 > > > > > > > > As stated by NetBSD's ctype(3) manual page, NetBSD and glibc took > > > different > > > approach. NetBSD emits a compile-time warning if the input may lead to > > > undefined behavior. quoting the man page: > > > Some implementations of libc, such as glibc as of 2018, attempt to > > > avoid > > > the worst of the undefined behavior by defining the functions to > > > work for > > > all integer inputs representable by either unsigned char or char, and > > > suppress the warning. However, this is not an excuse for avoiding > > > conversion to unsigned char: if EOF coincides with any such value, > > > as it > > > does when it is -1 on platforms with signed char, programs that pass > > > char > > > will still necessarily confuse the classification and mapping of EOF > > > with > > > the classification and mapping of some non-EOF inputs. > > > > > > > > > So, although no warning is emmited on linux, it looks like to me that the > > > cast to unsigned char is needed anyway, and relying on glibc's behavior > > > is not portable. > > > > I fully agree we shouldn't rely on glibc's behavior (I'm sure > > you've looked at xen/include/xen/ctype.h to see how we address > > this it Xen itself, but I will admit this is to a degree comparing > > apples and oranges, not the least because the lack of a need to > > consider EOF in Xen). At least in xen/tools/symbols.c I don't > > think we're at risk of running into "undefined" space. Casts are > > as long as there's only ascii characters. > > Anyway NetBSD won't change its ctype.h I guess I'm going to give up on this one. We'll keep it as a local patch. -- Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx> NetBSD: 26 ans d'experience feront toujours la difference --
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |