• relearning C: why does an in-place change to a char* segfault?

    From Mark Summerfield@21:1/5 to All on Thu Aug 1 08:06:57 2024
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }

    I know there are better ways to do ASCII uppercase, I don't care about
    that; what I don't understand is why I can't do an in-place edit of a non- const char*?

    I build using scons, which does:

    gcc -o inplace.o -c -Wall -g inplace.c

    gcc -o inplace inplace.o

    The error with gdb is:

    Starting program: /tmp/inplace/inplace
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". before [this is a test]

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
    at inplace.c:6
    6 *s = toupper(*s);

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Mark Summerfield on Thu Aug 1 09:38:13 2024
    On 01/08/2024 09:06, Mark Summerfield wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }

    text is pointing to "this is a test" - and that is stored in the program
    binary and that's why can't modify it.

    Change it to:

    char text[] = "this is a test";

    You can modify that, text gets it's own copy.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mark Summerfield@21:1/5 to Richard Harnden on Thu Aug 1 08:54:23 2024
    On Thu, 1 Aug 2024 09:38:13 +0100, Richard Harnden wrote:

    [snip]
    text is pointing to "this is a test" - and that is stored in the program binary and that's why can't modify it.

    Change it to:

    char text[] = "this is a test";

    You can modify that, text gets it's own copy.

    Thanks that works; & thanks for the explanation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mark Summerfield@21:1/5 to All on Thu Aug 1 08:24:45 2024
    The formatting was messed up by Pan.

    The function was:

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s);
    s++;
    }
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Richard Harnden on Thu Aug 1 11:12:47 2024
    On 01/08/2024 09:38, Richard Harnden wrote:
    On 01/08/2024 09:06, Mark Summerfield wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
         while (*s) {
             *s = toupper(*s); // SEGFAULT
             s++;
         }
    }

    int main() {
         char* text = "this is a test";
         printf("before [%s]\n", text);
         uppercase_ascii(text);
         printf("after  [%s]\n", text);
    }

    text is pointing to "this is a test" - and that is stored in the program binary and that's why can't modify it.

    That's not the reason for the segfault in this case. With some
    compilers, you *can* modify it, but that will permanently modify that
    string constant. (If the code is repeated, the text is already in
    capitals the second time around.)

    It segfaults when the string is stored in a read-only part of the binary.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Mark Summerfield on Thu Aug 1 11:53:48 2024
    Mark Summerfield <mark@qtrac.eu> writes:

    The formatting was messed up by Pan.

    The function was:

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s);

    There's a tricky technicality with all of the character functions. They
    take an int argument so that EOF (typically -1) can be passed, but
    otherwise the argument must be an int "which shall be representable as
    an unsigned char" or the result is undefined.

    If char is signed (as it very often is) then in some locales, like the ISO-8859-* ones, many lower-case letters are negative so, to be 100%
    portable, you should write

    *s = toupper((unsigned char)*s);

    Now, since the behaviour is undefined, many implementations "do what you
    want" but that only means you won't spot the bug by testing until the
    code is linked to some old library that does not fix the issue!

    s++;
    }
    }

    Note that this does not crop up in a typical input loop:

    int ch;
    while ((ch = getchar()) != EOF)
    putchar(toupper(ch));

    because the input function "obtains [the] character as an unsigned char converted to an int".

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to Mark Summerfield on Thu Aug 1 13:28:06 2024
    Mark Summerfield <mark@qtrac.eu> writes:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }

    I know there are better ways to do ASCII uppercase, I don't care about
    that; what I don't understand is why I can't do an in-place edit of a non- >const char*?

    Because char* is a pointer, not a string. In this case, it is
    pointing to a string stored in read-only memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Mark Summerfield on Thu Aug 1 17:40:26 2024
    On Thu, 01 Aug 2024 08:06:57 +0000
    Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }


    The answers to your question are already given above, so I'd talk about something else. Sorry about it.

    To my surprise, none of the 3 major compilers that I tried issued the
    warning at this line:
    char* text = "this is a test";
    If implicit conversion of 'const char*' to 'char*' does not warrant
    compiler warning than I don't know what does.
    Is there something in the Standard that explicitly forbids diagnostic
    for this sort of conversion?

    BTW, all 3 compilers issue reasonable warnings when I write it slightly differently:
    const char* ctext = "this is a test";
    char* text = ctext;

    I am starting to suspect that compilers (and the Standard?) consider
    string literals as being of type 'char*' rather than 'const char*'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Mark Summerfield on Thu Aug 1 12:02:30 2024
    On 8/1/24 04:06, Mark Summerfield wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    "In translation phase 7, a byte or code of value zero is appended to
    each multibyte character sequence that results from a string literal or literals. 89) The multibyte character sequence is then used to
    initialize an array of static storage duration and length just
    sufficient to contain the sequence. ..." (6.4.5p6)

    "... If the program attempts to modify such an array, the behavior is undefined." (6.4.5p7).

    This gives implementation the freedom,for instance, to store that array
    in read-only memory, though they don't have to do so. The segfault you
    got suggests that the implementation you're using did so. On other
    platforms, writes to read-only memory might be silently ignored. On a
    platform where it is possible to write to such memory, the
    implementation is still free to optimize the code on the assumption that
    you won't. That could produce bizarrely unexpected behavior if you
    actually do modify it.

    What you want to do is initialize an array with the static literal:

    char text[] = "this is a test";

    Nominally, such an array is initialized by copying from the string
    literal's array. However, there's no way for strictly conforming code to determine whether or not there are two such arrays. If the "text" array
    has static storage duration, the string literal's array is likely to be optimized away.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Michael S on Thu Aug 1 19:56:00 2024
    On 01/08/2024 16:40, Michael S wrote:
    On Thu, 01 Aug 2024 08:06:57 +0000
    Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }


    The answers to your question are already given above, so I'd talk about something else. Sorry about it.

    To my surprise, none of the 3 major compilers that I tried issued the
    warning at this line:
    char* text = "this is a test";
    If implicit conversion of 'const char*' to 'char*' does not warrant
    compiler warning than I don't know what does.
    Is there something in the Standard that explicitly forbids diagnostic
    for this sort of conversion?

    BTW, all 3 compilers issue reasonable warnings when I write it slightly differently:
    const char* ctext = "this is a test";
    char* text = ctext;

    I am starting to suspect that compilers (and the Standard?) consider
    string literals as being of type 'char*' rather than 'const char*'.


    Your suspicions are correct - in C, string literals are used to
    initialise an array of char (or wide char, or other appropriate
    character type). Perhaps you are thinking of C++, where the type is
    "const char" (or other const character type).

    So in C, when a string literal is used in an expression it is converted
    to a "char *" pointer. You can, of course, assign that to a "const char
    *" pointer. But it does not make sense to have a warning when assigning
    it to a non-const "char *" pointer. This is despite it being undefined behaviour (explicitly stated in the standards) to attempt to write to a
    string literal.

    The reason string literals are not const in C is backwards compatibility
    - they existed before C had "const", and making string literals into
    "const char" arrays would mean that existing code that assigned them to non-const pointers would then be in error. C++ was able to do the right
    thing and make them arrays of const char because it had "const" from the beginning.

    gcc has the option "-Wwrite-strings" that makes string literals in C
    have "const char" array type, and thus give errors when you try to
    assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the meaning
    of the code and can cause compatibility issues with existing correct code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Mark Summerfield on Thu Aug 1 19:39:04 2024
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the program's image. When you try to change it, you're making your program self-modifying.

    The ISO C language standard doesn't require implementations to support self-modifying programs; the behavior is left undefined.

    It could work in some documented, reliable way, in a given
    implementation.

    It's the same with any other constant in the program. Say you have
    a malloc(1024) somewhere in the program. That 1024 number is encoded
    into the program's image somhow, and in principle you could write code
    to somehow get at that number and change it to 256. Long before you got
    that far, you would be in undefined behavior territory. If it worked,
    it could have surprising effects. For instance, there could be another
    call to malloc(1024) in the program and, surprisingly, *that* one also
    changes to malloc(256).

    A literal like "this is a test" is similar to that 1024, except
    that it's very easy to get at it. The language defines it aws an object
    with an address, and to get that address all we have to do is evaluate
    that expression itself. A minimal piece of code that requests the
    undefined consequences of modifying a string literal is as easy
    as "a"[0] = 0.

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
    at inplace.c:6
    6 *s = toupper(*s);

    On Linux, the string literals of a C executable are located together
    with the program text. They are interspersed among the machine
    instructions which reference them. The program text is mapped
    read-only, so an attempted modification is an access violation trapped
    by the OS, turned into a SIGSEGV signal.

    GCC uses to have a -fwritable-strings option, but it has been removed
    for quite some time now.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Kaz Kylheku on Thu Aug 1 21:42:48 2024
    On 01/08/2024 20:39, Kaz Kylheku wrote:
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the program's image.

    So is the text here:

    char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    I guess it depends on what is classed as the program's 'image'.

    I'd say the image in the state it is in just after loading or just
    before execution starts (since certain fixups are needed). But some
    sections will be writable during execution, some not.

    When you try to change it, you're making your program self-modifying.

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
    at inplace.c:6
    6 *s = toupper(*s);

    On Linux, the string literals of a C executable are located together
    with the program text. They are interspersed among the machine
    instructions which reference them. The program text is mapped
    read-only, so an attempted modification is an access violation trapped
    by the OS, turned into a SIGSEGV signal.

    Does it really do that? That's the method I've used for read-only
    strings, to put them into the code-segment (since I neglected to support
    a dedicated read-only data section, and it's too much work now).

    But I don't like it since the code section is also executable; you could inadvertently execute code within a string (which might happen to
    contain machine code for other purposes).

    The dangers are small, but there must be reasons why a dedication
    section is normally used. gcc on Windows creates up to 19 sections, so
    it would odd for literal strings to share with code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Keith Thompson on Thu Aug 1 22:07:16 2024
    On 01/08/2024 21:59, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    On 01/08/2024 09:38, Richard Harnden wrote:
    On 01/08/2024 09:06, Mark Summerfield wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
         while (*s) {
             *s = toupper(*s); // SEGFAULT
             s++;
         }
    }

    int main() {
         char* text = "this is a test";
         printf("before [%s]\n", text);
         uppercase_ascii(text);
         printf("after  [%s]\n", text);
    }
    text is pointing to "this is a test" - and that is stored in the
    program binary and that's why can't modify it.

    That's not the reason for the segfault in this case.

    I'm fairly sure it is.

    With some
    compilers, you *can* modify it, but that will permanently modify that
    string constant. (If the code is repeated, the text is already in
    capitals the second time around.)

    It segfaults when the string is stored in a read-only part of the binary.

    A string literal creates an array object with static storage duration.
    Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:

    static char A[100];
    static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can
    be legally modified.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Bart on Thu Aug 1 22:40:23 2024
    Bart <bc@freeuk.com> writes:

    On 01/08/2024 20:39, Kaz Kylheku wrote:
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    The "this is a test" object is a literal. It is part of the program's
    image.

    So is the text here:

    char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    Different "this". The array generated by the string can't be modified
    without UB. The "this" that can be changed in the corrected version is
    a plain, automatically allocated array of char, initialised with the
    values from the string.

    I guess it depends on what is classed as the program's 'image'.

    The self-modifying remark is a bit of a red-herring, but altering the
    value of named automatic objects can't be classed as altering the
    program's image even in any reasonable way at all.

    I'd say the image in the state it is in just after loading or just before execution starts (since certain fixups are needed). But some sections will
    be writable during execution, some not.

    When you try to change it, you're making your program self-modifying.

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test") >>> at inplace.c:6
    6 *s = toupper(*s);
    On Linux, the string literals of a C executable are located together
    with the program text. They are interspersed among the machine
    instructions which reference them. The program text is mapped
    read-only, so an attempted modification is an access violation trapped
    by the OS, turned into a SIGSEGV signal.

    Does it really do that?

    Linux does not really have much to do with it; the C implementation
    decides, though the OS will influence what choices make more or less
    sense.

    For example, with my gcc (13.2.0) on Ubuntu the string is put into a
    section called .rodata, but tcc on the same Linux box puts it in .data.
    As a result the tcc compiled program runs without any issues and outputs

    before [this is a test]
    after [THIS IS A TEST]

    Some C implementations, on some Linux systems might put strings in the
    text segment, but I've not see a system that does that for decades.
    Mind you "Linux" refers to a huge class of systems ranging from top-end
    servers to tiny embedded devices)

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Bart on Fri Aug 2 00:37:44 2024
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    On 01/08/2024 20:39, Kaz Kylheku wrote:
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the program's image.

    So is the text here:

    char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    The array which is initialized by the literal is what can be
    changed.

    In this situation, the literal is just initializer syntax,
    not required to be an object with an address.

    But there could well be such an object in the program image,
    especially if the array is automatic, and thus instantiated
    many times.

    If the program tries to search for that object and modify it,
    it will run into UB.

    I guess it depends on what is classed as the program's 'image'.

    I'd say the image in the state it is in just after loading or just
    before execution starts (since certain fixups are needed). But some
    sections will be writable during execution, some not.

    Programs can self-modify in ways designed into the run time.
    The toaster has certain internal receptacles that can take
    certain forks, according to some rules, which do not affect
    the user operating the toaster according to the manual.

    The dangers are small, but there must be reasons why a dedication
    section is normally used. gcc on Windows creates up to 19 sections, so
    it would odd for literal strings to share with code.

    One reason is that PC-relative addressing can be used by code to
    find its literals. Since that usually has a limited range, it helps
    to keep the literals with the code. Combining sections also reduces
    size. The addressing is also relocatable, which is useful in shared
    libs.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Bart on Thu Aug 1 20:20:43 2024
    Bart <bc@freeuk.com> writes:
    On 01/08/2024 21:59, Keith Thompson wrote:
    Bart <bc@freeuk.com> writes:
    ...
    compilers, you *can* modify it, but that will permanently modify that
    string constant. (If the code is repeated, the text is already in
    capitals the second time around.)

    It segfaults when the string is stored in a read-only part of the binary. >> A string literal creates an array object with static storage
    duration.
    Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:

    static char A[100];
    static char B[100]={1};

    The difference is that when 6.4.5p7 says ""... If the program attempts
    to modify such an array, the behavior is undefined.", it is not talking
    about arrays with static storage duration in general, but only
    specifically about the arrays with static storage duration that are
    created to store the contents of string literals.

    For other arrays, whether or not it is defined behavior to modify them
    depends upon whether or not the array's definition is const-qualified.
    The arrays associated with string literals should have been specified as const-qualified, in which case any code that put them at risk of being
    modified would have required either a cast or a diagnostic.

    In C++ string literals are const-qualified, but "const" was a late
    addition to C, and by the time it was added to C, the committee's desire
    to ensure backwards compatibility prevented doing so in what would
    otherwise have been the most reasonable way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Bart on Fri Aug 2 01:06:08 2024
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    It segfaults when the string is stored in a read-only part of the binary. >>
    A string literal creates an array object with static storage duration.
    Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:

    Programming languages can have objects that have the same lifetime, yet some
    of which are mutable and some of which are immutable.

    If the compiler believes that the immutable objects are in fact
    not mutated, it's a bad idea to modify them behind the compiler's
    back.

    There doesn't have to be any actual difference in the implementation of
    these objects, like in what area they are stored, other than the rules regarding their correct use, namely prohibiting modification.

    The Racket language has both mutable and immutable cons cells.
    The difference is that the immutable cons cells simply lack the
    operations needed to mutate them. I'm not an expert on the Racket
    internals but I don't see a reason why they couldn't be stored in the
    same heap.

    static char A[100];
    static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can
    be legally modified.

    That 1 which initializes B[0] cannot be modified.

    There is no portable way to request that.

    C++ implementations have late initialization for block scope statics.

    A program which somehow gains access to the initialization data for those,
    and modifies it, would be squarely in undefined behavior territory.

    In mainstream C implementations there typically isn't a separate storage
    for the initialization data for statics. They are set up before the
    program runs.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From candycanearter07@21:1/5 to David Brown on Fri Aug 2 05:30:02 2024
    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
    On 01/08/2024 16:40, Michael S wrote:
    On Thu, 01 Aug 2024 08:06:57 +0000
    Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }


    The answers to your question are already given above, so I'd talk about
    something else. Sorry about it.

    To my surprise, none of the 3 major compilers that I tried issued the
    warning at this line:
    char* text = "this is a test";
    If implicit conversion of 'const char*' to 'char*' does not warrant
    compiler warning than I don't know what does.
    Is there something in the Standard that explicitly forbids diagnostic
    for this sort of conversion?

    BTW, all 3 compilers issue reasonable warnings when I write it slightly
    differently:
    const char* ctext = "this is a test";
    char* text = ctext;

    I am starting to suspect that compilers (and the Standard?) consider
    string literals as being of type 'char*' rather than 'const char*'.


    Your suspicions are correct - in C, string literals are used to
    initialise an array of char (or wide char, or other appropriate
    character type). Perhaps you are thinking of C++, where the type is
    "const char" (or other const character type).

    So in C, when a string literal is used in an expression it is converted
    to a "char *" pointer. You can, of course, assign that to a "const char
    *" pointer. But it does not make sense to have a warning when assigning
    it to a non-const "char *" pointer. This is despite it being undefined behaviour (explicitly stated in the standards) to attempt to write to a string literal.

    The reason string literals are not const in C is backwards compatibility
    - they existed before C had "const", and making string literals into
    "const char" arrays would mean that existing code that assigned them to non-const pointers would then be in error. C++ was able to do the right thing and make them arrays of const char because it had "const" from the beginning.

    gcc has the option "-Wwrite-strings" that makes string literals in C
    have "const char" array type, and thus give errors when you try to
    assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the meaning
    of the code and can cause compatibility issues with existing correct code.


    -Wwrite-strings is included in -Wpedantic.
    --
    user <candycane> is generated from /dev/urandom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Kaz Kylheku on Fri Aug 2 10:43:36 2024
    On 02/08/2024 02:06, Kaz Kylheku wrote:
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    It segfaults when the string is stored in a read-only part of the binary. >>>
    A string literal creates an array object with static storage duration.
    Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:

    Programming languages can have objects that have the same lifetime, yet some of which are mutable and some of which are immutable.

    If the compiler believes that the immutable objects are in fact
    not mutated, it's a bad idea to modify them behind the compiler's
    back.

    There doesn't have to be any actual difference in the implementation of
    these objects, like in what area they are stored, other than the rules regarding their correct use, namely prohibiting modification.

    The Racket language has both mutable and immutable cons cells.
    The difference is that the immutable cons cells simply lack the
    operations needed to mutate them. I'm not an expert on the Racket
    internals but I don't see a reason why they couldn't be stored in the
    same heap.

    static char A[100];
    static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can
    be legally modified.

    That 1 which initializes B[0] cannot be modified.


    Why not? I haven't requested that those are 'const'. Further, gcc has no problem running this program:

    static char A[100];
    static char B[100]={1};

    printf("%d %d %d\n", A[0], B[0], 1);
    A[0]=55;
    B[0]=89;
    printf("%d %d %d\n", A[0], B[0], 1);

    But it does use readonly memory for string literals.

    (The point of A and B was to represent .bss and .data segments
    respectively. A's data is not part of the EXE image; B's is.

    While the point of 'static' was to avoid having to specify whether A and
    B were at module scope or within a function.)

    That 1 which initializes B[0] cannot be modified.

    Or do you literally mean the value of that '1'? Then it doesn' make
    sense; here that is a copy of the literal stored in one cell of 'B'. The
    value of the cell can change, then that particular copy of '1' is lost.

    Here:

    static char B[100] = {1, 1, 1, 1, 1, 1};

    changing B[0] will not affect the 1s in B[1..5], and in my example
    above, that standalone '1' is not affected.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Kaz Kylheku on Fri Aug 2 11:36:36 2024
    On 02/08/2024 01:37, Kaz Kylheku wrote:
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    On 01/08/2024 20:39, Kaz Kylheku wrote:
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the program's image.

    So is the text here:

    char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    The array which is initialized by the literal is what can be
    changed.

    In this situation, the literal is just initializer syntax,
    not required to be an object with an address.

    I don't spot the 'int main() {' part of your example; my version of it
    was meant to be static. (My A, B examples explicitly used 'static'.)



    I guess it depends on what is classed as the program's 'image'.

    I'd say the image in the state it is in just after loading or just
    before execution starts (since certain fixups are needed). But some
    sections will be writable during execution, some not.

    Programs can self-modify in ways designed into the run time.
    The toaster has certain internal receptacles that can take
    certain forks, according to some rules, which do not affect
    the user operating the toaster according to the manual.

    The dangers are small, but there must be reasons why a dedication
    section is normally used. gcc on Windows creates up to 19 sections, so
    it would odd for literal strings to share with code.

    One reason is that PC-relative addressing can be used by code to
    find its literals. Since that usually has a limited range, it helps
    to keep the literals with the code. Combining sections also reduces
    size. The addressing is also relocatable, which is useful in shared
    libs.

    You must be talking about ARM then, with its limited address
    displacement (I think 12 bits or +/- 2KB).

    On x64, PC-relative uses a 32-bit offset so the range is +/- 2GB; enough
    to have string literals located in their own read-only section of memory.

    I'm sure you can do that on ARM too, I can think of several ways (and
    there are loads more registers to play with keep as bases to tables of
    such data). But I don't know what real code does.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Harnden@21:1/5 to Keith Thompson on Fri Aug 2 13:04:55 2024
    On 02/08/2024 11:02, Keith Thompson wrote:
    candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
    writes:
    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
    [...]
    gcc has the option "-Wwrite-strings" that makes string literals in C
    have "const char" array type, and thus give errors when you try to
    assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the meaning >>> of the code and can cause compatibility issues with existing correct code. >>
    -Wwrite-strings is included in -Wpedantic.

    No it isn't, nor is it included in -Wall -- and it wouldn't make sense
    to do so.

    The -Wpedantic option is intended to produce all required diagnostics
    for the specified C standard. -Wwrite-strings gives string literals the
    type `const char[LENGTH]`, which enables useful diagnostics but is *non-conforming*.

    For example, this program:

    ```
    #include <stdio.h>
    int main(void) {
    char *s = "hello, world";
    puts(s);
    }
    ```

    is valid (no diagnostic required), since it doesn't actually write to
    the string literal object, but `-Wwrite-strings` causes gcc to warn
    about it (because making the pointer non-const creates the potential for
    an error).


    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Richard Harnden on Fri Aug 2 09:59:40 2024
    On 8/2/24 08:04, Richard Harnden wrote:
    ...
    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    If you hate being notified of the errors that can be caught by
    appropriate use of 'const', as many do, that can be considered a
    disadvantage. I can't claim to understand why they feel that way, but
    such people do exist.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Bart on Fri Aug 2 11:03:13 2024
    On 8/2/24 5:43 AM, Bart wrote:
    On 02/08/2024 02:06, Kaz Kylheku wrote:
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    It segfaults when the string is stored in a read-only part of the
    binary.

    A string literal creates an array object with static storage duration. >>>> Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:

    Programming languages can have objects that have the same lifetime,
    yet some
    of which are mutable and some of which are immutable.

    If the compiler believes that the immutable objects are in fact
    not mutated, it's a bad idea to modify them behind the compiler's
    back.

    There doesn't have to be any actual difference in the implementation of
    these objects, like in what area they are stored, other than the rules
    regarding their correct use, namely prohibiting modification.

    The Racket language has both mutable and immutable cons cells.
    The difference is that the immutable cons cells simply lack the
    operations needed to mutate them. I'm not an expert on the Racket
    internals but I don't see a reason why they couldn't be stored in the
    same heap.

       static char A[100];
       static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can >>> be legally modified.

    That 1 which initializes B[0] cannot be modified.


    Why not? I haven't requested that those are 'const'. Further, gcc has no problem running this program:

        static char A[100];
        static char B[100]={1};

        printf("%d %d %d\n", A[0], B[0], 1);
        A[0]=55;
        B[0]=89;
        printf("%d %d %d\n", A[0], B[0], 1);

    But it does use readonly memory for string literals.

    (The point of A and B was to represent .bss and .data segments
    respectively. A's data is not part of the EXE image; B's is.

    While the point of 'static' was to avoid having to specify whether A and
    B were at module scope or within a function.)

    That 1 which initializes B[0] cannot be modified.

    Or do you literally mean the value of that '1'? Then it doesn' make
    sense; here that is a copy of the literal stored in one cell of 'B'. The value of the cell can change, then that particular copy of '1' is lost.

    Here:

        static char B[100] = {1, 1, 1, 1, 1, 1};

    changing B[0] will not affect the 1s in B[1..5], and in my example
    above, that standalone '1' is not affected.



    The key point is that the {1} isn't the value loclated in B[0], but the
    source of that value when B was initialize, which if B is in the .data
    segement is the source of the data to initialize that .data segement,
    which might exist nowhere in the actual ram memory of the machine, but
    might exist just in the file that was loaded.

    WHen accessing the value of a string literal, the compiler needs to do something so value is accessible, perhaps by creating a const object
    created like any other const object, and exposing that.

    The confusing part is that while it creates a "const char[]" object, the
    type of that object when refered to in code is just "char[]", the
    difference imposed to avoid breaking most code that used strings when
    the standard just was coming out.

    Most implementations have an option to at least give a warning if used
    in a way that the const is lost, and most programs today should be
    compiled using that option.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Bart on Fri Aug 2 14:19:49 2024
    On 8/2/24 5:43 AM, Bart wrote:
    On 02/08/2024 02:06, Kaz Kylheku wrote:
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    It segfaults when the string is stored in a read-only part of the
    binary.

    A string literal creates an array object with static storage duration. >>>> Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of
    these:
       static char A[100];
       static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can >>> be legally modified.

    That 1 which initializes B[0] cannot be modified.


    Why not? I haven't requested that those are 'const'. ...

    You don't get a choice in the matter. The C language doesn't permit
    numeric literals of any kind to be modified by your code. They can't be,
    and don't need to be, declared 'const'. I've heard that in some other languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
    bar(). That sounds like such a ridiculous mis-feature that I hesitate to identify which languages I had heard accused of having that feature, but
    it is important to note that C is not one of them.

    Just as 1 is an integer literal whose value cannot be modified, "Hello,
    world!" is a string literal whose contents cannot be safely modified.
    The key difference is that, in many context "Hello, world!" gets
    automatically converted into a pointer to it's first element, a feature
    that makes it a lot easier to work with string literals - but also opens
    up the possibility of attempting to write though that pointer. Doing so
    has undefined behavior, which can include the consequences of storing
    the contents of string literals in read-only memory.

    That pointer's value should logically have had the type "const char*",
    which would have made most attempts to write though that pointer
    constraint violations, but the language didn't have 'const' at the time
    that decision was made. In C++ the value is const-qualified. In C, the
    best you can do is to make sure that if you define a pointer, and
    initialize that pointer by setting it to point it inside a string
    literal, you should declare that pointer as "const char*".

    ... Further, gcc has no
    problem running this program:

        static char A[100];
        static char B[100]={1};

        printf("%d %d %d\n", A[0], B[0], 1);
        A[0]=55;
        B[0]=89;
        printf("%d %d %d\n", A[0], B[0], 1);

    Of course, why should it? Neither A nor B are string literals, they are
    only initialized by copying from a string literal. Since their
    definitions are not const-qualified, there's no problems with such code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Kuyper on Fri Aug 2 19:33:20 2024
    On 02/08/2024 19:19, James Kuyper wrote:
    On 8/2/24 5:43 AM, Bart wrote:
    On 02/08/2024 02:06, Kaz Kylheku wrote:
    On 2024-08-01, Bart <bc@freeuk.com> wrote:
    It segfaults when the string is stored in a read-only part of the
    binary.

    A string literal creates an array object with static storage duration. >>>>> Any attempt to modify that array object has undefined behavior.

    What's the difference between such an object, and an array like one of >>>> these:
       static char A[100];
       static char B[100]={1};

    Do these not also have static storage duration? Yet presumably these can >>>> be legally modified.

    That 1 which initializes B[0] cannot be modified.


    Why not? I haven't requested that those are 'const'. ...

    You don't get a choice in the matter. The C language doesn't permit
    numeric literals of any kind to be modified by your code.

    My post wasn't about numerical literals. I assumed it was about that '1'
    value which is stored B's first cell.

    However, just in case KK was talking about that unlikely possibly, I
    covered that as well.

    ey can't be,
    and don't need to be, declared 'const'. I've heard that in some other languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to bar(). That sounds like such a ridiculous mis-feature that I hesitate to identify which languages I had heard accused of having that feature, but
    it is important to note that C is not one of them.

    Just as 1 is an integer literal whose value cannot be modified,

    It can't modified, in a value that would also affect other instances of
    '1' within that module or produce, because it is very unlikely to be shared.

    I don't know of any implementations of this kind of language which do
    that. (The nearest might FORTRAN IV when '1' was passed by reference to
    a subroutine, and the subroutine then assigns to that parameter.)

    Where it would be more plausible is here:

    const char* B[] = {"A", "A", "A"};

    where if you can somehow change that first "A", then the other two could
    also change if the compiler decides to share those 3 identical strings.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Fri Aug 2 14:42:08 2024
    On 8/2/24 2:24 PM, Keith Thompson wrote:
    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]
    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    The "static", if this is at block scope, specifies that the pointer
    object, not the array object, has static storage duration. If it's at
    file scope it specifies that the name "s" is not visible to other
    translation units. Either way, use it if that's what you want, don't
    use it if it isn't.

    There's no good reason not to use "const". (If string literal objects
    were const, you'd have to use "const" here.)

    If you also want the pointer to be const, you can write:

    const char *const s = "hello, world";


    The one good reason to not make it const is that if you are passing it
    to functions that take (non-const) char* parameters that don't actually
    change that parameters contents.

    These may still exist in legacy code since so far nothing has required
    them to change.

    Perhaps it is getting to the point that the language needs to abandon
    support for that ancient code, and force "const correctness" (which I
    admit some will call const-pollution) onto code, first with a formal deprecation period, where implementations are strongly suggested to make
    the violation of the rule a warning, and then later changing the type of
    string constants.

    Of course, implementations would still be free to accept such code, and
    maybe even not even warn about it in non-pedantic mode, but making it
    part of the Standard would be a step to cleaning this up.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Richard Damon on Fri Aug 2 14:58:10 2024
    On 8/2/24 14:42, Richard Damon wrote:
    On 8/2/24 2:24 PM, Keith Thompson wrote:
    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]
    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?
    ...
    There's no good reason not to use "const". (If string literal objects
    were const, you'd have to use "const" here.)
    ...
    The one good reason to not make it const is that if you are passing it
    to functions that take (non-const) char* parameters that don't
    actually change that parameters contents.

    Actually, that's not a good reason. If you can't modify the function's interface, you should use a (char*) cast, which will serve to remind
    future programmers that this is a dangerous function call. You shouldn't
    make the pointer's own type "char *".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to James Kuyper on Fri Aug 2 15:11:20 2024
    On 8/2/24 2:58 PM, James Kuyper wrote:
    On 8/2/24 14:42, Richard Damon wrote:
    On 8/2/24 2:24 PM, Keith Thompson wrote:
    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]
    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?
    ...
    There's no good reason not to use "const". (If string literal objects
    were const, you'd have to use "const" here.)
    ...
    The one good reason to not make it const is that if you are passing it
    to functions that take (non-const) char* parameters that don't
    actually change that parameters contents.

    Actually, that's not a good reason. If you can't modify the function's interface, you should use a (char*) cast, which will serve to remind
    future programmers that this is a dangerous function call. You shouldn't
    make the pointer's own type "char *".



    Depends on the library and how many times it is used. It may be a
    perfectly safe call, as the function is defined not to change its
    parameter, but being external code the signature might not be fixable.

    Adding the cast at each call, may cause a "crying wolf" response that
    trains people to just add the cast where it seems to be needed (even if
    not warrented). You likely DO want a note at the statement explaining
    the situation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Sat Aug 3 00:14:09 2024
    On 01/08/2024 22:42, Bart wrote:
    On 01/08/2024 20:39, Kaz Kylheku wrote:
    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:
    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
         while (*s) {
             *s = toupper(*s); // SEGFAULT
             s++;
         }
    }

    int main() {
         char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the program's
    image.

    So is the text here:

      char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    "this is a test" is a string literal, and is typically part of the
    program's image. (There are some C implementations that do things
    differently, like storing such initialisation data in a compressed format.)

    The array "char text[]", however, is a normal variable of type array of
    char. It is most definitely not part of the program image - it is in
    ram (statically allocated or on the stack, depending on the context) and
    is initialised by copying the characters from the string literal (prior
    to main(), or at each entry to its scope if it is a local variable).

    The string literal initialisation data cannot be changed without
    self-modifying code or other undefined behaviour. The variable "text"
    is just a normal array and can be changed at will.


    I guess it depends on what is classed as the program's 'image'.


    No, it depends on understanding what the C means and not trying to
    confuse yourself and others.

    I'd say the image in the state it is in just after loading or just
    before execution starts (since certain fixups are needed). But some
    sections will be writable during execution, some not.


    That is a poor definition because you are not considering initialised
    data, and you are not clear about what you mean by "before execution
    starts". A C program typically has an entry point that clears the zero-initialised program-lifetime data, initialises the initialised program-lifetime data by copying from a block in the program image, then
    sets up things like stdin, heap support, argc/argv, and various other
    run-time setup features. Then it calls main(). The initialised data
    section and zero-initialised data section are certainly part of the
    state of the program at the start of the execution from C's viewpoint -
    entry to main(). They are equally certainly not part of the program image.

    One reasonable definition of "program image" would be "the file on the
    disk" (on general-purpose OS's) or "the binary data in flash" on typical embedded systems. Another might be the read-only data sections set up
    by the OS loader just before jumping to the entry point of the C
    run-time code (long before main() is called and the C code itself starts).

    When you try to change it, you're making your program self-modifying.

    Program received signal SIGSEGV, Segmentation fault.
    0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a
    test")
    at inplace.c:6
    6            *s = toupper(*s);

    On Linux, the string literals of a C executable are located together
    with the program text. They are interspersed among the machine
    instructions which reference them. The program text is mapped
    read-only, so an attempted modification is an access violation trapped
    by the OS, turned into a SIGSEGV signal.

    Does it really do that? That's the method I've used for read-only
    strings, to put them into the code-segment (since I neglected to support
    a dedicated read-only data section, and it's too much work now).


    No, Linux systems don't have read-only data or string literals
    interspersed with code. They have such data in separate segments, for
    better cache efficiency and to allow different section attributes
    (read-only data can't be executed).

    But I don't like it since the code section is also executable; you could inadvertently execute code within a string (which might happen to
    contain machine code for other purposes).


    That's why code and read-only data is rarely interspersed.

    The dangers are small, but there must be reasons why a dedication
    section is normally used. gcc on Windows creates up to 19 sections, so
    it would odd for literal strings to share with code.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Fri Aug 2 23:29:42 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    For some reason I had a sort of a habit wrt const pointers:

    (experimental code, no ads, raw text...)
    https://pastebin.com/raw/f52a443b1

    ________________________________
    /* Interfaces ____________________________________________________________________*/ #include <stddef.h>


    struct object_prv_vtable {
    int (*fp_destroy) (void* const);
    };


    struct device_prv_vtable {
    int (*fp_read) (void* const, void*, size_t);
    int (*fp_write) (void* const, void const*, size_t);
    };

    Why? It seems like an arbitrary choice to const qualify some pointer
    types and some pointed-to types (but never both).

    ;^)

    Does the wink mean I should not take what you write seriously? If so,
    please ignore my question.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to James Kuyper on Sat Aug 3 01:31:17 2024
    On Fri, 2 Aug 2024 14:19:49 -0400, James Kuyper wrote:

    I've heard that in some other
    languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to bar(). That sounds like such a ridiculous mis-feature that I hesitate to identify which languages I had heard accused of having that feature ...

    I heard that, too. I think it was on some early FORTRAN compilers, on
    early machine architectures, without stacks or reentrancy. And with the
    weird FORTRAN argument-passing conventions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Lawrence D'Oliveiro on Fri Aug 2 22:01:21 2024
    On 8/2/24 9:31 PM, Lawrence D'Oliveiro wrote:
    On Fri, 2 Aug 2024 14:19:49 -0400, James Kuyper wrote:

    I've heard that in some other
    languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to
    bar(). That sounds like such a ridiculous mis-feature that I hesitate to
    identify which languages I had heard accused of having that feature ...

    I heard that, too. I think it was on some early FORTRAN compilers, on
    early machine architectures, without stacks or reentrancy. And with the
    weird FORTRAN argument-passing conventions.

    I remember it too, and was based on the fact that all arguments were
    pass by reference (so they could be either in or out parameters), and
    constants were passed as pointers to the location of memory where that
    constant was stored, and perhaps used elsewhere too. Why waste precious
    memory to setup a temporary to hold be initialized and hold the value,
    when you could just pass the address of a location that you knew had the
    right value.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joe Pfeiffer@21:1/5 to Richard Damon on Sat Aug 3 08:32:00 2024
    Richard Damon <richard@damon-family.org> writes:

    On 8/2/24 9:31 PM, Lawrence D'Oliveiro wrote:
    On Fri, 2 Aug 2024 14:19:49 -0400, James Kuyper wrote:

    I've heard that in some other
    languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to >>> bar(). That sounds like such a ridiculous mis-feature that I hesitate to >>> identify which languages I had heard accused of having that feature ...
    I heard that, too. I think it was on some early FORTRAN compilers,
    on
    early machine architectures, without stacks or reentrancy. And with the
    weird FORTRAN argument-passing conventions.

    I remember it too, and was based on the fact that all arguments were
    pass by reference (so they could be either in or out parameters), and constants were passed as pointers to the location of memory where that constant was stored, and perhaps used elsewhere too. Why waste
    precious memory to setup a temporary to hold be initialized and hold
    the value, when you could just pass the address of a location that you
    knew had the right value.

    I actually had a bug once in my FORTRAN code on a CDC6400 where I changed the value of an argument in a function, and then passed in a constant. That "constant" had the new value for the rest of the program. Finding that
    one was a challenge, particularly since I was a very inexperienced
    undergrad at the time.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to David Brown on Sat Aug 3 17:07:59 2024
    David Brown <david.brown@hesbynett.no> writes:
    On 01/08/2024 22:42, Bart wrote:

      char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    "this is a test" is a string literal, and is typically part of the
    program's image. (There are some C implementations that do things >differently, like storing such initialisation data in a compressed format.)

    The array "char text[]", however, is a normal variable of type array of
    char. It is most definitely not part of the program image - it is in
    ram (statically allocated or on the stack, depending on the context) and
    is initialised by copying the characters from the string literal (prior
    to main(), or at each entry to its scope if it is a local variable).

    Linux (ELF):

    A file-scope static declaration of char text[] will emit the string
    literal into the .data section and that data section will be loaded
    into memory by the ELF loader. There is no copy made at runtime
    before main().

    #include <stdint.h>
    #include <stdlib.h>
    #include <stdio.h>

    char text1[] = "This is a test of a static-scope string";

    int
    main(int argc, const char **argv)
    {
    char text2[] = "This is a test of a function-scope string";


    fprintf(stdout, "%p %s\n", &text1, text1);
    fprintf(stdout, "%s\n", text2);

    return 0;
    }

    $ /tmp/a
    0x601060 This is a test of a static-scope string
    This is a test of a function-scope string

    $ objdump -p /tmp/a

    /tmp/a: file format elf64-x86-64

    Program Header:
    PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3
    filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x
    INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0
    filesz 0x000000000000001c memsz 0x000000000000001c flags r--
    LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
    filesz 0x00000000000007dc memsz 0x00000000000007dc flags r-x
    LOAD off 0x0000000000000e10 vaddr 0x0000000000600e10 paddr 0x0000000000600e10 align 2**21
    filesz 0x0000000000000278 memsz 0x0000000000000290 flags rw-

    .data section:

    0000e00: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000e10: 5005 4000 0000 0000 3005 4000 0000 0000 P.@.....0.@.....
    0000e20: 0000 0000 0000 0000 0100 0000 0000 0000 ................
    0000e30: 0100 0000 0000 0000 0c00 0000 0000 0000 ................
    0000e40: 2804 4000 0000 0000 0d00 0000 0000 0000 (.@.............
    0000e50: a406 4000 0000 0000 1900 0000 0000 0000 ..@.............
    0000e60: 100e 6000 0000 0000 1b00 0000 0000 0000 ..`.............
    0000e70: 0800 0000 0000 0000 1a00 0000 0000 0000 ................
    0000e80: 180e 6000 0000 0000 1c00 0000 0000 0000 ..`.............
    0000e90: 0800 0000 0000 0000 f5fe ff6f 0000 0000 ...........o....
    0000ea0: 9802 4000 0000 0000 0500 0000 0000 0000 ..@.............
    0000eb0: 3803 4000 0000 0000 0600 0000 0000 0000 8.@.............
    0000ec0: c002 4000 0000 0000 0a00 0000 0000 0000 ..@.............
    0000ed0: 4700 0000 0000 0000 0b00 0000 0000 0000 G...............
    0000ee0: 1800 0000 0000 0000 1500 0000 0000 0000 ................
    0000ef0: 0000 0000 0000 0000 0300 0000 0000 0000 ................
    0000f00: 0010 6000 0000 0000 0200 0000 0000 0000 ..`.............
    0000f10: 4800 0000 0000 0000 1400 0000 0000 0000 H...............
    0000f20: 0700 0000 0000 0000 1700 0000 0000 0000 ................
    0000f30: e003 4000 0000 0000 0700 0000 0000 0000 ..@.............
    0000f40: b003 4000 0000 0000 0800 0000 0000 0000 ..@.............
    0000f50: 3000 0000 0000 0000 0900 0000 0000 0000 0...............
    0000f60: 1800 0000 0000 0000 feff ff6f 0000 0000 ...........o....
    0000f70: 9003 4000 0000 0000 ffff ff6f 0000 0000 ..@........o....
    0000f80: 0100 0000 0000 0000 f0ff ff6f 0000 0000 ...........o....
    0000f90: 8003 4000 0000 0000 0000 0000 0000 0000 ..@.............
    0000fa0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000fb0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000fc0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000fd0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000fe0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0000ff0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0001000: 280e 6000 0000 0000 0000 0000 0000 0000 (.`.............
    0001010: 0000 0000 0000 0000 6604 4000 0000 0000 ........f.@.....
    0001020: 7604 4000 0000 0000 8604 4000 0000 0000 v.@.......@.....
    0001030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0001040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0001050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    0001060: 5468 6973 2069 7320 6120 7465 7374 206f This is a test o
    0001070: 6620 6120 7374 6174 6963 2d73 636f 7065 f a static-scope
    0001080: 2073 7472 696e 6700 4743 433a 2028 474e string.GCC: (GN

    $ printf "0x%x\n" $(( 0x601060 - 0x0000000000600e10 ))
    0x250

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Sat Aug 3 19:54:20 2024
    On 02/08/2024 07:30, candycanearter07 wrote:
    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):
    On 01/08/2024 16:40, Michael S wrote:
    On Thu, 01 Aug 2024 08:06:57 +0000
    Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";
    printf("before [%s]\n", text);
    uppercase_ascii(text);
    printf("after [%s]\n", text);
    }


    The answers to your question are already given above, so I'd talk about
    something else. Sorry about it.

    To my surprise, none of the 3 major compilers that I tried issued the
    warning at this line:
    char* text = "this is a test";
    If implicit conversion of 'const char*' to 'char*' does not warrant
    compiler warning than I don't know what does.
    Is there something in the Standard that explicitly forbids diagnostic
    for this sort of conversion?

    BTW, all 3 compilers issue reasonable warnings when I write it slightly
    differently:
    const char* ctext = "this is a test";
    char* text = ctext;

    I am starting to suspect that compilers (and the Standard?) consider
    string literals as being of type 'char*' rather than 'const char*'.


    Your suspicions are correct - in C, string literals are used to
    initialise an array of char (or wide char, or other appropriate
    character type). Perhaps you are thinking of C++, where the type is
    "const char" (or other const character type).

    So in C, when a string literal is used in an expression it is converted
    to a "char *" pointer. You can, of course, assign that to a "const char
    *" pointer. But it does not make sense to have a warning when assigning
    it to a non-const "char *" pointer. This is despite it being undefined
    behaviour (explicitly stated in the standards) to attempt to write to a
    string literal.

    The reason string literals are not const in C is backwards compatibility
    - they existed before C had "const", and making string literals into
    "const char" arrays would mean that existing code that assigned them to
    non-const pointers would then be in error. C++ was able to do the right
    thing and make them arrays of const char because it had "const" from the
    beginning.

    gcc has the option "-Wwrite-strings" that makes string literals in C
    have "const char" array type, and thus give errors when you try to
    assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the meaning
    of the code and can cause compatibility issues with existing correct code.


    -Wwrite-strings is included in -Wpedantic.

    No, it is not - which is a good thing, because -Wpedantic should not
    include features that change the semantics of the language! (IMHO the
    flag should not be called -Wwrite-strings, but -fconst-string-literals
    or similar. It's not really a normal warning option.)

    For C++, -pedantic-errors includes the -Wwrite-strings flag which then
    makes implicit conversion of string literal expressions to non-const
    char* pointers an error. But that's C++, not C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Richard Damon on Sun Aug 4 01:05:01 2024
    On Fri, 2 Aug 2024 22:01:21 -0400, Richard Damon wrote:

    ... was based on the fact that all arguments were pass by reference ...

    Slightly more subtle than that: simple variables (and I think array
    elements) were passed by reference; more complex expressions had their
    value stored in a temporary and the temporary was passed by reference.

    The “more complex” criterion could be triggered by something as simple as putting an extra pair of parentheses around a variable reference.

    It was a calling convention that really made no logical sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Sun Aug 4 01:08:40 2024
    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable format that they created for their PowerPC machines running old MacOS. This had
    to do with some clever instruction encodings for loading stuff into
    memory.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Sun Aug 4 07:22:57 2024
    On 8/3/24 10:58 PM, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable format
    that they created for their PowerPC machines running old MacOS. This had
    to do with some clever instruction encodings for loading stuff into
    memory.

    Is that relevant to what I asked about?

    What I had in mind is something that, given this:

    static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

    would store something less than 1000*sizeof(int) bytes in the executable file. I wouldn't be hard to do, but I'm not convinced it would be worthwhile.


    I vaguely seem to remember an embedded format that did something like
    this. The .init segement that was "copied" to the .data segement has a
    simple run-length encoding option. For non-repetitive data, it just
    encoded 1 copy of length n. But it could also encode repeats like your
    example. When EPROM was a scarce commodity squeezing out a bit of size
    for the .init segment was useful.

    My guess that since it didn't persist, it didn't actually help that much.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Sun Aug 4 17:20:42 2024
    On 04/08/2024 02:07, Keith Thompson wrote:
    David Brown <david.brown@hesbynett.no> writes:
    [...]
    "this is a test" is a string literal, and is typically part of the
    program's image. (There are some C implementations that do things
    differently, like storing such initialisation data in a compressed
    format.)
    [...]

    What implementations do that? Typically data that's all zeros isn't
    stored in the image, but general compression isn't something I've seen
    (not that I've paid much attention). It would save space in the image,
    but it would require decompression at load time and wouldn't save any
    space at run time.


    It is a technique I have seen in embedded systems. It is not uncommon
    for flash or other non-volatile storage to be significantly slower than
    ram, and for it to be helpful to keep the flash image as small as
    possible (this also helps for things like over-the-air updates). The compression is typically fairly simple, such as run-length encoding, to
    avoid significant time, code space and temporary ram space, but it can
    help with some initialised data.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Mon Aug 5 02:06:36 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/2/2024 3:29 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    For some reason I had a sort of a habit wrt const pointers:

    (experimental code, no ads, raw text...)
    https://pastebin.com/raw/f52a443b1

    ________________________________
    /* Interfaces
    ____________________________________________________________________*/
    #include <stddef.h>


    struct object_prv_vtable {
    int (*fp_destroy) (void* const);
    };


    struct device_prv_vtable {
    int (*fp_read) (void* const, void*, size_t);
    int (*fp_write) (void* const, void const*, size_t);
    };
    Why? It seems like an arbitrary choice to const qualify some pointer
    types and some pointed-to types (but never both).

    I just wanted to get the point across that the first parameter, aka, akin
    to "this" in C++ is a const pointer. Shall not be modified in any way shape or form. It is as it is, so to speak:

    void foo(struct foobar const* const self);

    constant pointer to a constant foobar, fair enough?

    No. If you intended a const pointer to const object why didn't you
    write that? My point was that the consts seems to be scattered about
    without any apparent logic and you've not explained why.

    ;^)
    Does the wink mean I should not take what you write seriously? If so,
    please ignore my question.

    The wink was meant to show my habit in basically a jestful sort of
    way.

    Your habit of what?

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Mon Aug 5 06:33:22 2024
    On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable
    format that they created for their PowerPC machines running old MacOS.
    This had to do with some clever instruction encodings for loading stuff
    into memory.

    Is that relevant to what I asked about?

    “Compression”

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Mon Aug 5 12:03:08 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/4/2024 6:06 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/2/2024 3:29 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    For some reason I had a sort of a habit wrt const pointers:

    (experimental code, no ads, raw text...)
    https://pastebin.com/raw/f52a443b1

    ________________________________
    /* Interfaces
    ____________________________________________________________________*/ >>>>> #include <stddef.h>


    struct object_prv_vtable {
    int (*fp_destroy) (void* const);
    };


    struct device_prv_vtable {
    int (*fp_read) (void* const, void*, size_t);
    int (*fp_write) (void* const, void const*, size_t);
    };
    Why? It seems like an arbitrary choice to const qualify some pointer
    types and some pointed-to types (but never both).

    I just wanted to get the point across that the first parameter, aka, akin >>> to "this" in C++ is a const pointer. Shall not be modified in any way shape >>> or form. It is as it is, so to speak:

    void foo(struct foobar const* const self);

    constant pointer to a constant foobar, fair enough?
    No. If you intended a const pointer to const object why didn't you
    write that? My point was that the consts seems to be scattered about
    without any apparent logic and you've not explained why.

    ;^)
    Does the wink mean I should not take what you write seriously? If so, >>>> please ignore my question.

    The wink was meant to show my habit in basically a jestful sort of
    way.
    Your habit of what?

    To write the declaration with names and the const access I want, so:

    extern void (void const* const ptr);

    void (void const* const ptr)
    {
    // ptr is a const pointer to a const void
    }

    I don't think you are following what I'm, saying. If you think there
    might be some value in finding out, you could as a few questions. I
    won't say it again ;-)

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Mon Aug 5 21:54:59 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/5/2024 4:03 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/4/2024 6:06 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/2/2024 3:29 PM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    For some reason I had a sort of a habit wrt const pointers:

    (experimental code, no ads, raw text...)
    https://pastebin.com/raw/f52a443b1

    ________________________________
    /* Interfaces
    ____________________________________________________________________*/ >>>>>>> #include <stddef.h>


    struct object_prv_vtable {
    int (*fp_destroy) (void* const);
    };


    struct device_prv_vtable {
    int (*fp_read) (void* const, void*, size_t);
    int (*fp_write) (void* const, void const*, size_t);
    };
    Why? It seems like an arbitrary choice to const qualify some pointer >>>>>> types and some pointed-to types (but never both).

    I just wanted to get the point across that the first parameter, aka, akin >>>>> to "this" in C++ is a const pointer. Shall not be modified in any way shape
    or form. It is as it is, so to speak:

    void foo(struct foobar const* const self);

    constant pointer to a constant foobar, fair enough?
    No. If you intended a const pointer to const object why didn't you
    write that? My point was that the consts seems to be scattered about
    without any apparent logic and you've not explained why.

    ;^)
    Does the wink mean I should not take what you write seriously? If so, >>>>>> please ignore my question.

    The wink was meant to show my habit in basically a jestful sort of
    way.
    Your habit of what?

    To write the declaration with names and the const access I want, so:

    extern void (void const* const ptr);

    void (void const* const ptr)
    {
    // ptr is a const pointer to a const void
    }
    I don't think you are following what I'm, saying. If you think there
    might be some value in finding out, you could as a few questions. I
    won't say it again ;-)

    I must be misunderstanding you. My habit in such code was to always make
    the "this" pointer wrt some of my "object" oriented code a const
    pointer. This was always the first parameter:

    extern void foobar(void const* const ptr);

    OK. So I conclude you don't want to know what I was saying. That's
    fine. It was a trivial point.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lawrence D'Oliveiro@21:1/5 to Keith Thompson on Mon Aug 5 21:27:16 2024
    On Sun, 04 Aug 2024 23:38:14 -0700, Keith Thompson wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable >>>> format that they created for their PowerPC machines running old
    MacOS. This had to do with some clever instruction encodings for
    loading stuff into memory.

    Is that relevant to what I asked about?

    “Compression”

    Was that intended to be responsive?

    Hint: you have to know something about executable formats.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Tue Aug 6 12:29:29 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    I must have completely missed it. Sorry about that. Please redefine?

    It's going to seem silly after all these exchanges. I simply wanted to
    know why you chose to use const as you originally posted:

    | struct object_prv_vtable {
    | int (*fp_destroy) (void* const);
    | int (*fp_read) (void* const, void*, size_t);
    | int (*fp_write) (void* const, void const*, size_t);
    | };

    because that looks peculiar (to the point of being arbitrary) to me.
    You went on to talk about "self" pointers being const pointers to const
    void, but that was not what you wrote, so it did not address what I was
    asking about.

    In general, const qualified argument types are rarely used and are even
    more rarely used in function (or type) declarations because there have
    no effect at all in that position. For example, I can assign fp_destroy
    from a function declared without the const-qualified parameter:

    int destroy(void *self) { /* ... */; return 1; }
    ...
    vtab.fp_destroy = destroy;

    or, if I do want the compiler to check that the function does not alter
    its parameter, I can add the const in the function definition (were it
    can be useful) even if it is missing from the declaration:

    struct object_prv_vtable {
    int (*fp_destroy) (void*);
    /* ... */
    };

    int destroy(void *const self) { /* ... */; return 1; }
    ...
    vtab.fp_destroy = destroy;

    But if you want the const there so that the declaration matches the
    function defintion, why not do that for all the parameters? Basically,
    I would have expercted either this (just ine const where it matters):

    struct object_prv_vtable {
    int (*fp_destroy) (void *);
    int (*fp_read) (void *, void *, size_t);
    int (*fp_write) (void *, void const *, size_t);
    };

    and the actual functions that get assigned to these pointers might, if
    you want that extra check, have all their parametera marked const. Or,
    for consistency, you might have written

    struct object_prv_vtable {
    int (*fp_destroy) (void * const);
    int (*fp_read) (void * const, void * const, size_t const);
    int (*fp_write) (void * const, void const * const, size_t const);
    };

    even if none of the actual functions have const parameters.

    Finally, if you had intended to write what you later went on to talk
    about, you would have written either

    struct object_prv_vtable {
    int (*fp_destroy) (const void *);
    int (*fp_read) (const void *, void *, size_t);
    int (*fp_write) (const void *, void const *, size_t);
    };

    or

    struct object_prv_vtable {
    int (*fp_destroy) (const void * const);
    int (*fp_read) (const void * const, void * const, size_t const);
    int (*fp_write) (const void * const, void const * const, size_t const); };

    TL;DR: where you put the consts in the original just seemed arbitrary.


    I'll also note that the term "const pointer" is often used when the
    pointer is not const! It most often mean that the pointed-to type is
    const qualified. As such, it's best to avoid the term altogether.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Keith Thompson on Tue Aug 6 16:57:16 2024
    On 05/08/2024 23:40, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sun, 04 Aug 2024 23:38:14 -0700, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable >>>>>> format that they created for their PowerPC machines running old
    MacOS. This had to do with some clever instruction encodings for
    loading stuff into memory.

    Is that relevant to what I asked about?

    “Compression”

    Was that intended to be responsive?

    Hint: you have to know something about executable formats.

    I am profoundly uninterested in hints.

    Here's what you snipped from what I wrote upthread:

    What I had in mind is something that, given this:

    static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

    would store something less than 1000*sizeof(int) bytes in the executable
    file. I wouldn't be hard to do, but I'm not convinced it would be
    worthwhile.

    There's a lot I don't know about executable formats, and you seem uninterested in doing more than showing off your presumed knowledge
    without actually sharing it. Others have already answered my direct
    question (Richard Damon and David Brown mentioned implementations
    that use simple run-length encoding, and David gave some reasons
    why it could be useful), so you can stop wasting everyone's time.

    Storing those 1000 integers is normally going to take 4000 bytes (at
    least, since data sections may be rounded up etc).

    Doing it in under 4000 bytes would require some extra help. Who or what
    is going to do that, and at what point?

    There are two lots of support needed:

    (1) Some process needs to run either while generating the EXE, or
    compressing an existing EXE, to convert that data into a more compact form

    (2) When launched, some other process is needed to decompress the data
    before reaching the normal entry point.

    I can tell you that nothing about Windows' EXE format will help here for
    either (1) or (2), since it would need support from the OS loader to
    decompress any data, and that doesn't exist.

    So it would presumably need to be done by some extra code that is added
    to the executable, that needs to be arranged to run as part of the
    user-code.

    A compiler that supports such compression could do this job: compressing sections, and then generating extra extra code, which must be called
    first, which decompresses those sections.

    Or an external utility like UPX can be applied, which tyically reduces
    the size of an EXE by 2/3 (both code /and/ data), and which
    transparently expands it when launched.

    So, with the existence of such a utility, I wouldn't even bother trying
    it within a compiler.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Tue Aug 6 20:40:39 2024
    On 06/08/2024 17:57, Bart wrote:
    On 05/08/2024 23:40, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sun, 04 Aug 2024 23:38:14 -0700, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 03 Aug 2024 19:58:37 -0700, Keith Thompson wrote:
    Lawrence D'Oliveiro <ldo@nz.invalid> writes:
    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:
    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the “PEF” executable >>>>>>> format that they created for their PowerPC machines running old
    MacOS. This had to do with some clever instruction encodings for >>>>>>> loading stuff into memory.

    Is that relevant to what I asked about?

    “Compression”

    Was that intended to be responsive?

    Hint: you have to know something about executable formats.

    I am profoundly uninterested in hints.

    Here's what you snipped from what I wrote upthread:

         What I had in mind is something that, given this:

             static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements >>
         would store something less than 1000*sizeof(int) bytes in the
    executable
         file.  I wouldn't be hard to do, but I'm not convinced it would be >>      worthwhile.

    There's a lot I don't know about executable formats, and you seem
    uninterested in doing more than showing off your presumed knowledge
    without actually sharing it.  Others have already answered my direct
    question (Richard Damon and David Brown mentioned implementations
    that use simple run-length encoding, and David gave some reasons
    why it could be useful), so you can stop wasting everyone's time.

    Storing those 1000 integers is normally going to take 4000 bytes (at
    least, since data sections may be rounded up etc).

    Doing it in under 4000 bytes would require some extra help. Who or what
    is going to do that, and at what point?

    There are two lots of support needed:

    (1) Some process needs to run either while generating the EXE, or
    compressing an existing EXE, to convert that data into a more compact form

    (2) When launched, some other process is needed to decompress the data
    before reaching the normal entry point.

    I can tell you that nothing about Windows' EXE format will help here for either (1) or (2), since it would need support from the OS loader to decompress any data, and that doesn't exist.

    So it would presumably need to be done by some extra code that is added
    to the executable, that needs to be arranged to run as part of the
    user-code.

    A compiler that supports such compression could do this job: compressing sections, and then generating extra extra code, which must be called
    first, which decompresses those sections.

    Or an external utility like UPX can be applied, which tyically reduces
    the size of an EXE by 2/3 (both code /and/ data), and which
    transparently expands it when launched.

    So, with the existence of such a utility, I wouldn't even bother trying
    it within a compiler.

    That may all be true for Windows - you know far more about executable
    formats on Windows, and how the OS loads and runs them, than I do.

    But it is not true for the kind of embedded development tools that I
    have seen using compression for initialised data - tools such as UPX are
    simply not applicable in this case.

    However, it is fair to say that it is not the compiler itself that will
    do the compression or decompression. In the implementations I have
    seen, it is the linker that compresses the initialised data section's
    data. And the code for decompressing it is part of the C runtime
    support code (the stuff that, amongst other things, zeros out the bss
    section).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to Chris M. Thomasson on Tue Aug 6 23:59:28 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    On 8/6/2024 4:29 AM, Ben Bacarisse wrote:
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:

    I must have completely missed it. Sorry about that. Please redefine?
    It's going to seem silly after all these exchanges. I simply wanted to
    know why you chose to use const as you originally posted:
    | struct object_prv_vtable {
    | int (*fp_destroy) (void* const);
    | int (*fp_read) (void* const, void*, size_t);
    | int (*fp_write) (void* const, void const*, size_t);
    | };
    because that looks peculiar (to the point of being arbitrary) to me.
    You went on to talk about "self" pointers being const pointers to const
    void, but that was not what you wrote, so it did not address what I was
    asking about.
    In general, const qualified argument types are rarely used and are even
    more rarely used in function (or type) declarations because there have
    no effect at all in that position. For example, I can assign fp_destroy
    from a function declared without the const-qualified parameter:
    int destroy(void *self) { /* ... */; return 1; }
    ...
    vtab.fp_destroy = destroy;
    or, if I do want the compiler to check that the function does not alter
    its parameter, I can add the const in the function definition (were it
    can be useful) even if it is missing from the declaration:
    struct object_prv_vtable {
    int (*fp_destroy) (void*);
    /* ... */
    };
    int destroy(void *const self) { /* ... */; return 1; }
    ...
    vtab.fp_destroy = destroy;
    But if you want the const there so that the declaration matches the
    function defintion, why not do that for all the parameters? Basically,
    I would have expercted either this (just ine const where it matters):
    struct object_prv_vtable {
    int (*fp_destroy) (void *);
    int (*fp_read) (void *, void *, size_t);
    int (*fp_write) (void *, void const *, size_t);
    };
    and the actual functions that get assigned to these pointers might, if
    you want that extra check, have all their parametera marked const. Or,
    for consistency, you might have written
    struct object_prv_vtable {
    int (*fp_destroy) (void * const);
    int (*fp_read) (void * const, void * const, size_t const);
    int (*fp_write) (void * const, void const * const, size_t const);
    };
    even if none of the actual functions have const parameters.
    Finally, if you had intended to write what you later went on to talk
    about, you would have written either
    struct object_prv_vtable {
    int (*fp_destroy) (const void *);
    int (*fp_read) (const void *, void *, size_t);
    int (*fp_write) (const void *, void const *, size_t);
    };
    or
    struct object_prv_vtable {
    int (*fp_destroy) (const void * const);
    int (*fp_read) (const void * const, void * const, size_t const);
    int (*fp_write) (const void * const, void const * const, size_t const);
    };
    TL;DR: where you put the consts in the original just seemed arbitrary.
    I'll also note that the term "const pointer" is often used when the
    pointer is not const! It most often mean that the pointed-to type is
    const qualified. As such, it's best to avoid the term altogether.

    I wanted to get across that the pointer value for the first parameter
    itself should not be modified. I read (void* const) as a const pointer to a "non-const" void. Now a const pointer to a const void is (void const*
    const), from my code, notice the first parameter?

    I consider the first parameter to be special in this older OO experiment of mine. It shall not be modified, so I wrote it into the API:

    You could have said that when I asked many posts ago! I can't see a
    sound technical reason to put a const there but that parameter is in
    some way different I suppose. The effect on readers is likely to be a
    puzzled, mild confusion.

    Note that is not really "in the API" as it is entirely optional whether
    the implementation has a const first parameter.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Richard Damon on Mon Aug 12 02:52:15 2024
    Richard Damon <richard@damon-family.org> writes:

    On 8/2/24 9:31 PM, Lawrence D'Oliveiro wrote:

    On Fri, 2 Aug 2024 14:19:49 -0400, James Kuyper wrote:

    I've heard that in some other
    languages, if you call foo(3), and foo() changes the value of it's
    argument to 2, then subsequent calls to bar(3) will pass a value of 2 to >>> bar(). That sounds like such a ridiculous mis-feature that I hesitate to >>> identify which languages I had heard accused of having that feature ...

    I heard that, too. I think it was on some early FORTRAN compilers, on
    early machine architectures, without stacks or reentrancy. And with the
    weird FORTRAN argument-passing conventions.

    I remember it too, and was based on the fact that all arguments were
    pass by reference (so they could be either in or out parameters), and constants were passed as pointers to the location of memory where that constant was stored, and perhaps used elsewhere too. Why waste
    precious memory to setup a temporary to hold be initialized and hold
    the value, when you could just pass the address of a location that you
    knew had the right value.

    I think the original FORTRAN, and FORTRAN II, used call by reference.
    In the early 1960s FORTRAN changed to using call by value-result
    (which is similar to call by reference but slightly different).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Richard Damon on Mon Aug 12 02:55:01 2024
    Richard Damon <richard@damon-family.org> writes:

    On 8/3/24 10:58 PM, Keith Thompson wrote:

    Lawrence D'Oliveiro <ldo@nz.invalid> writes:

    On Sat, 03 Aug 2024 17:07:37 -0700, Keith Thompson wrote:

    ... general compression isn't something I've seen ...

    I recall Apple had a patent on some aspects of the ?PEF?
    executable format that they created for their PowerPC machines
    running old MacOS. This had to do with some clever instruction
    encodings for loading stuff into memory.

    Is that relevant to what I asked about?

    What I had in mind is something that, given this:

    static int buf = { 1, 1, 1, ..., 1 }; // say, 1000 elements

    would store something less than 1000*sizeof(int) bytes in the
    executable file. I wouldn't be hard to do, but I'm not convinced
    it would be worthwhile.

    I vaguely seem to remember an embedded format that did something like
    this. The .init segement that was "copied" to the .data segement has
    a simple run-length encoding option. For non-repetitive data, it
    just encoded 1 copy of length n. But it could also encode repeats
    like your example. When EPROM was a scarce commodity squeezing out a
    bit of size for the .init segment was useful.

    My guess that since it didn't persist, it didn't actually help that
    much.

    Or maybe it helped back in the day, but since then technology has
    changed and it doesn't help any more.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Richard Damon on Mon Aug 12 08:32:32 2024
    Richard Damon <richard@damon-family.org> writes:

    On 8/2/24 2:58 PM, James Kuyper wrote:

    On 8/2/24 14:42, Richard Damon wrote:

    On 8/2/24 2:24 PM, Keith Thompson wrote:

    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]

    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    ...

    There's no good reason not to use "const". (If string literal objects >>>> were const, you'd have to use "const" here.)

    ...

    The one good reason to not make it const is that if you are passing it
    to functions that take (non-const) char* parameters that don't
    actually change that parameters contents.

    Actually, that's not a good reason. If you can't modify the function's
    interface, you should use a (char*) cast, which will serve to remind
    future programmers that this is a dangerous function call. You shouldn't
    make the pointer's own type "char *".

    Depends on the library and how many times it is used. It may be a
    perfectly safe call, as the function is defined not to change its
    parameter, but being external code the signature might not be fixable.

    Right. It isn't always feasible to assume source code can be
    modified, especially without causing downstream problems.

    Adding the cast at each call, may cause a "crying wolf" response that
    trains people to just add the cast where it seems to be needed (even
    if not warrented).

    Exactly. The last thing we want to do is have developers learn
    habits that tend to push code in the direction of being less
    safe.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Richard Damon on Mon Aug 12 08:27:04 2024
    Richard Damon <richard@damon-family.org> writes:

    On 8/2/24 2:24 PM, Keith Thompson wrote:

    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]

    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    The "static", if this is at block scope, specifies that the pointer
    object, not the array object, has static storage duration. If it's at
    file scope it specifies that the name "s" is not visible to other
    translation units. Either way, use it if that's what you want, don't
    use it if it isn't.

    There's no good reason not to use "const". (If string literal objects
    were const, you'd have to use "const" here.)

    If you also want the pointer to be const, you can write:

    const char *const s = "hello, world";

    The one good reason to not make it const is that if you are passing it
    to functions that take (non-const) char* parameters that don't
    actually change that parameters contents.

    Right.

    These may still exist in legacy code since so far nothing has required
    them to change.

    Perhaps it is getting to the point that the language needs to abandon
    support for that ancient code, and force "const correctness" (which I
    admit some will call const-pollution) onto code, first with a formal deprecation period, where implementations are strongly suggested to
    make the violation of the rule a warning, and then later changing the
    type of string constants.

    Given the widespread availability of compiler options to treat
    string literals as being const-qualified, it seems better to
    leave the language alone and have people use those options as
    they see fit. Making existing programs that have worked fine
    for years become non-conforming is a heavy and unnecessary
    burden, with an ROI that is at best very small and more likely
    negative.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Kaz Kylheku on Mon Aug 12 13:47:02 2024
    Kaz Kylheku <643-408-1753@kylheku.com> writes:

    On 2024-08-01, Bart <bc@freeuk.com> wrote:

    On 01/08/2024 20:39, Kaz Kylheku wrote:

    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the
    program's image.

    So is the text here:

    char text[]="this is a test";

    But this can be changed without making the program self-modifying.

    The array which is initialized by the literal is what can be
    changed.

    In this situation, the literal is just initializer syntax,
    not required to be an object with an address.

    In the abstract machine I believe the initializing string
    literal is required to be an object with an address. The
    discussion of string literals in 6.4.5 says there is such
    an object for every string literal, and I don't see any
    text in 6.7.9, covering Initialization, that overrules or
    contradicts that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Mon Aug 12 14:11:47 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    [...]

    A string literal creates an array object with static storage
    duration. [...]

    A small quibble. Every string literal does sit in an array,
    but it might not be a _new_ array, because different string
    literals are allowed to overlap as long as the bytes in the
    overlapping arrays have the right values.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Mon Aug 12 14:33:48 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
    writes:

    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):

    [...]

    gcc has the option "-Wwrite-strings" that makes string literals in
    C have "const char" array type, and thus give errors when you try
    to assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the
    meaning of the code and can cause compatibility issues with
    existing correct code.

    -Wwrite-strings is included in -Wpedantic.

    No it isn't, nor is it included in -Wall -- and it wouldn't make
    sense to do so.

    The -Wpedantic option is intended to produce all required
    diagnostics for the specified C standard. -Wwrite-strings
    gives string literals the type `const char[LENGTH]`, which
    enables useful diagnostics but is *non-conforming*.

    As long as the -Wwrite-strings diagnostics are only warnings the
    result is still conforming.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Mon Aug 12 14:38:36 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]

    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    The "static", if this is at block scope, specifies that the
    pointer object, not the array object, has static storage duration.
    If it's at file scope it specifies that the name "s" is not
    visible to other translation units. Either way, use it if that's
    what you want, don't use it if it isn't.

    There's no good reason not to use "const". [...]

    Other people have different opinions on that question.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Mon Aug 12 16:05:29 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
    writes:

    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT): >>>
    [...]

    gcc has the option "-Wwrite-strings" that makes string literals in
    C have "const char" array type, and thus give errors when you try
    to assign to a non-const char * pointer. But the option has to be
    specified explicitly (it is not in -Wall) because it changes the
    meaning of the code and can cause compatibility issues with
    existing correct code.

    -Wwrite-strings is included in -Wpedantic.

    No it isn't, nor is it included in -Wall -- and it wouldn't make
    sense to do so.

    The -Wpedantic option is intended to produce all required
    diagnostics for the specified C standard. -Wwrite-strings
    gives string literals the type `const char[LENGTH]`, which
    enables useful diagnostics but is *non-conforming*.

    As long as the -Wwrite-strings diagnostics are only warnings the
    result is still conforming.

    It's not just about diagnostics. This program:

    #include <stdio.h>
    int main(void) {
    puts(_Generic("hello",
    char*: "char*",
    const char*: "const char*",
    default: "?"));
    }

    must print "char*" in a conforming implementation. With
    (gcc|clang) -Wwrite-strings, it prints "const char*".

    Good point. I hadn't considered such cases.

    And something as simple as:

    char *p = "hello";

    is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

    That violates the "As long as the -Wwrite-strings diagnostics are
    only warnings" condition.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Tim Rentsch on Tue Aug 13 13:08:57 2024
    On 13/08/2024 01:05, Tim Rentsch wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    candycanearter07 <candycanearter07@candycanearter07.nomail.afraid>
    writes:

    David Brown <david.brown@hesbynett.no> wrote at 17:56 this Thursday (GMT):

    [...]

    gcc has the option "-Wwrite-strings" that makes string literals in >>>>>> C have "const char" array type, and thus give errors when you try
    to assign to a non-const char * pointer. But the option has to be >>>>>> specified explicitly (it is not in -Wall) because it changes the
    meaning of the code and can cause compatibility issues with
    existing correct code.

    -Wwrite-strings is included in -Wpedantic.

    No it isn't, nor is it included in -Wall -- and it wouldn't make
    sense to do so.

    The -Wpedantic option is intended to produce all required
    diagnostics for the specified C standard. -Wwrite-strings
    gives string literals the type `const char[LENGTH]`, which
    enables useful diagnostics but is *non-conforming*.

    As long as the -Wwrite-strings diagnostics are only warnings the
    result is still conforming.

    It's not just about diagnostics. This program:

    #include <stdio.h>
    int main(void) {
    puts(_Generic("hello",
    char*: "char*",
    const char*: "const char*",
    default: "?"));
    }

    must print "char*" in a conforming implementation. With
    (gcc|clang) -Wwrite-strings, it prints "const char*".

    Good point. I hadn't considered such cases.

    And something as simple as:

    char *p = "hello";

    is rejected with a fatal error with "-Wwrite-strings -pedantic-errors".

    That violates the "As long as the -Wwrite-strings diagnostics are
    only warnings" condition.

    Indeed.

    I personally think it is nice to have an option to make string literals
    "const" in C, even though it is non-conforming. I also think it is very
    useful to have a warning on attempts to write to string literals. But I
    think gcc has made a mistake here by conflating the two. I'd rather see
    the warning being enabled by default (or at least in -Wall), while the
    "make string literals const" option should require an explicit flag and
    be a "-f" flag rather than a "-W" flag. The current situation seems to
    be a quick-and-dirty way to get the warning.

    Other people may have different opinions, of course :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vir Campestris@21:1/5 to Tim Rentsch on Tue Aug 13 15:34:19 2024
    On 12/08/2024 22:11, Tim Rentsch wrote:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    [...]

    A string literal creates an array object with static storage
    duration. [...]

    A small quibble. Every string literal does sit in an array,
    but it might not be a _new_ array, because different string
    literals are allowed to overlap as long as the bytes in the
    overlapping arrays have the right values.

    And this is exactly why string literals should always have been const.

    A compiler is entitled to share memory between strings. so

    puts("lap");
    puts("overlap");

    it's entitled to make them overlap. Then add

    char * p = "lap";
    *p='X';

    and it can overwrite the shared string. I think. which would mean that
    writing "lap" again would have a different result.

    But that ship has sailed. I'm not even sure const had been invented that
    far back!

    Andy

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Vir Campestris on Tue Aug 13 17:40:24 2024
    Vir Campestris <vir.campestris@invalid.invalid> writes:

    On 12/08/2024 22:11, Tim Rentsch wrote:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    [...]

    A string literal creates an array object with static storage
    duration. [...]

    A small quibble. Every string literal does sit in an array,
    but it might not be a _new_ array, because different string
    literals are allowed to overlap as long as the bytes in the
    overlapping arrays have the right values.

    And this is exactly why string literals should always have been
    const.

    The people who wrote the C standard reached a different
    conclusion, and IMO the right one.

    A compiler is entitled to share memory between strings. so

    puts("lap");
    puts("overlap");

    it's entitled to make them overlap. Then add

    char * p = "lap";
    *p='X';

    and it can overwrite the shared string. I think. which would
    mean that writing "lap" again would have a different result.

    A C implementation is also allowed to put every string literal
    in its own separate array object, not shared even when two
    or more string literals are identical, and make them writable
    so they can be modified without problems. I believe some C
    compilers actually did this, perhaps under the control of a
    compilation option.

    But that ship has sailed. I'm not even sure const had been
    invented that far back!

    C was already well established before 'const' was invented, and it
    was a number of years after that before some C compilers started
    allowing 'const' in source code. The cost of not being backward
    compatible would be high; the cost adding const incrementally in
    new code is low. Generally speaking using string literals in open
    code is a bad idea anyway, regardless whether there is any concern
    that the string might be modified. I think most people who want
    string literals to be of type const char[] are only thinking about
    one side of the equation. It's always important to remember to
    look at both sides of the cost/benefit forces, and not focus on
    just the (imagined) benefits or (imagined) downsides.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Kaz Kylheku on Tue Aug 13 17:43:09 2024
    Kaz Kylheku <643-408-1753@kylheku.com> writes:

    On 2024-08-01, Mark Summerfield <mark@qtrac.eu> wrote:

    This program segfaults at the commented line:

    #include <ctype.h>
    #include <stdio.h>

    void uppercase_ascii(char *s) {
    while (*s) {
    *s = toupper(*s); // SEGFAULT
    s++;
    }
    }

    int main() {
    char* text = "this is a test";

    The "this is a test" object is a literal. It is part of the
    program's image. When you try to change it, you're making your
    program self-modifying.

    The ISO C language standard doesn't require implementations to
    support self-modifying programs; the behavior is left undefined.

    It could work in some documented, reliable way, in a given
    implementation.

    It's the same with any other constant in the program. [...]

    That is wrong both technically and practically. And obviously
    so.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to James Kuyper on Tue Aug 13 17:46:05 2024
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    Just as 1 is an integer literal whose value cannot be modified,
    [...]

    The C language doesn't have integer literals. C has string
    literals, and compound literals, and it has integer constants.
    But C does not have integer literals.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Tue Aug 13 17:41:16 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    In 20/20 hindsight, my personal opinion is that it would have been
    better to make string literals const in C89/C90.

    Fortunately wiser heads prevailed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Keith Thompson on Wed Aug 14 03:16:26 2024
    On 2024-08-14, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote:
    I can't speak for most people, but I want string literals to be const
    and I've thought about both sides of the equation. (Existing code could
    be compiled with options to enable the old behavior and could be changed incrementally.)

    C++ made string literals const sometime in the early 2000s.

    That makes it much easier to be in favor of the change; it not
    only helps prevent bugs, but improves C and C++ compatibility.

    When programmers write string manipulating functions, they tend
    to test them with string literal arguments. When string literals
    are const, that encourages the programmers to make arguments
    const whenever they can be which tends to improve the functions.

    I work in C codebases that are also compiled as C++, so const
    string literals are second nature. It's old hat by now.

    Also, <string.h> could have type generic functions where it
    makes sense to support both const char * and char *.

    E.g. strchr should could return const char * if the
    parameter is const char *, and char * when the parameter is char *.
    The one function we have now strips the qualifier, which is bad;
    when you find a character in a const string, you get a non-const
    pointer to it.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Keith Thompson on Wed Aug 14 10:40:05 2024
    On 13/08/2024 22:08, Keith Thompson wrote:

    In 20/20 hindsight, my personal opinion is that it would have been
    better to make string literals const in C89/C90. Compilers could
    still accept old const-incorrect code with a non-fatal warning,
    and programmers would be encouraged but not immediately forced to
    use const.


    Agreed.

    That's basically what happened when C++ was designed.

    This could still be done in C2y, but I'm not aware of any proposals.


    There is always going to be some hassle with things like search
    functions - 100% const correctness is not easy when you don't have
    overloads. (It's not always easy even in C++ where you /do/ have
    overloads and templates.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Tim Rentsch on Wed Aug 14 10:33:03 2024
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    Just as 1 is an integer literal whose value cannot be modified,
    [...]

    The C language doesn't have integer literals. C has string
    literals, and compound literals, and it has integer constants.
    But C does not have integer literals.

    True, but C++ does, and it means the same thing by "integer literal"
    that C means by "integer constant". C doesn't define the term "integer
    literal" with any conflicting meaning, and my use of the C++ terminology allowed me to make the parallel with string literals clearer, so I don't
    see any particular problem with my choice of words.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Thu Aug 15 16:00:35 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    Just as 1 is an integer literal whose value cannot be modified,
    [...]

    The C language doesn't have integer literals. C has string
    literals, and compound literals, and it has integer constants.
    But C does not have integer literals.

    Technically correct (but IMHO not really worth worrying about).

    Anyone who flogs others posters for incorrectly using terminology
    defined in the ISO C standard should set a good example by using
    the ISO-C-defined terms correctly himself.

    There is a proposal for C2y, authored by Jens Gustedt, to change the
    term "constant" to "literal" for character, integer, and floating
    constants. (I think it's a good idea.)

    <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3239.htm>

    The more C is changed to resemble C++ the worse it becomes. It
    isn't surprising that you like it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to James Kuyper on Thu Aug 15 16:05:11 2024
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    James Kuyper <jameskuyper@alumni.caltech.edu> writes:

    Just as 1 is an integer literal whose value cannot be modified,
    [...]

    The C language doesn't have integer literals. C has string
    literals, and compound literals, and it has integer constants.
    But C does not have integer literals.

    True, but C++ does, and it means the same thing by "integer literal"
    that C means by "integer constant".

    This is comp.lang.c, not comp.lang.c++. You flog Bart for using C-standard-defined terms wrongly. This case is no different.

    C doesn't define the term "integer
    literal" with any conflicting meaning, and my use of the C++ terminology allowed me to make the parallel with string literals clearer, so I don't
    see any particular problem with my choice of words.

    In this case you are in the wrong. Just be a man and admit it. Oh, I
    forgot, your rhetorical religion doesn't allow you to admit any
    linguistic imperfection, so you try to sleaze your way to a different
    subject so you can continue to argue.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dave_thompson_2@comcast.net@21:1/5 to All on Sun Aug 25 16:52:15 2024
    On Fri, 2 Aug 2024 13:04:55 +0100, Richard Harnden <richard.nospam@gmail.invalid> wrote:

    [string literals not typed const in C even though writing prohibited]

    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    But sizeof s is 8 or 4 regardless of the string, while sizeof "some
    string" is the length of the string plus 1 (for the null terminator).

    static const char s[] = "hello, world";
    // autosized by initializer

    would be a better replacement, or in C99+ if at file scope

    (const char[]){"hello, world"}

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Tue Sep 3 06:11:52 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:

    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    Richard Harnden <richard.nospam@gmail.invalid> writes:
    [...]

    Is there any reason not to always write ...

    static const char *s = "hello, world";

    ... ?

    You get all the warnings for free that way.

    The "static", if this is at block scope, specifies that the
    pointer object, not the array object, has static storage duration.
    If it's at file scope it specifies that the name "s" is not
    visible to other translation units. Either way, use it if that's
    what you want, don't use it if it isn't.

    There's no good reason not to use "const". [...]

    Other people have different opinions on that question.

    You could have told us your opinion. You could have explained why
    someone might have a different opinion. You could have given us a
    good reason not to use "const", assuming there is such a reason.
    You know the language well enough to make me suspect you might
    have something specific in mind. [...]

    I said all that I thought needed saying. I see no reason
    to add to it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Frabott@21:1/5 to In reply to "Janis Papanagnou" who on Sat Sep 28 17:57:56 2024
    In reply to "Janis Papanagnou" who wrote the following:

    On 28.09.2024 05:34, Keith Thompson wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The more C is changed to resemble C++ the worse it becomes. It
    isn't surprising that you like it.

    For context, since the parent article is from a month and a half
    ago, I was discussing a proposal to change a future C standard to
    refer to "constants" as "literals". I mentioned that I think it's
    a good idea.

    I've heard of and seen various forms to name such entities...
    - in a Pascal and an Eiffel book I find all these named "constants"
    - in an Algol 68 book I read about "standard designations"
    - in a book about languages and programming in general I find
    "literals" ("abc"), "numerals" (42), "word-symbols" (false),
    "graphemes" (), etc., differentiated
    - I've also have heard about "standard representations [for the
    values of a respective type]"; also a type-independent term

    I also think (for various reasons) that "constants" is not a good
    term. (Personally I like terms like the Algol 68 term, that seems
    to "operate" on another [more conceptual] abstraction level.)

    But you'll certainly have to expect a lot of anger if the terminology
    of some standards documents get changed from one version to another.

    Janis


    The only gripe I would have if we synonymized constants and literals is that not
    every const is initialized with a literal. There have been times where I have initialized a const from the value of a variable. I don't think that const and literals are the same thing because of this.

    To me a const is permanently set at initialization. That being runtime while a literal is a hardcoded value that gets set at compile time.

    There are cases where it does in fact matter, especially when a const is not initialized with a literal but a var. It can also make a bigger difference when someone actually needs to know when something is being set at compile time and when it is being set at runtime. It can have a huge impact especially in edge cases.

    But thats just my 2 cents in the mix.

    Have a good one!

    Phillip Frabott
    {Adam: Is a void really a void if it returns? - Jack: No, it's just nullspace at
    that point.}
    Phillip Frabott
    {Adam: Is a void really a void if it returns? - Jack: No, it's just nullspace at
    that point.}


    --
    ----------------------------------------- --- -- -
    Posted with NewsLeecher v7.0 Final
    Free Newsreader @ http://www.newsleecher.com/
    ------------------------------- ----- ---- -- -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Frabott@21:1/5 to In reply to "Keith Thompson" who on Sat Sep 28 22:05:30 2024
    In reply to "Keith Thompson" who wrote the following:

    Phillip Frabott <nntp@fulltermprivacy.com> writes:
    In reply to "Janis Papanagnou" who wrote the following:
    [...]
    I also think (for various reasons) that "constants" is not a good
    term. (Personally I like terms like the Algol 68 term, that seems
    to "operate" on another [more conceptual] abstraction level.)

    But you'll certainly have to expect a lot of anger if the terminology
    of some standards documents get changed from one version to another.

    The only gripe I would have if we synonymized constants and literals
    is that not every const is initialized with a literal. There have been times where I have initialized a const from the value of a variable. I don't think that const and literals are the same thing because of
    this.

    Though the word "const" is obviously derived from the English word "constant", in C "const" and "constant" are very different things.

    The "const" keyword really means "read-only" (and perhaps would have
    been clearer if it had been spelled "readonly").

    In the context of C I agree. Although I would point out that for some langauges const and readonly are two completely different things. (just a brevity remark, but I'll get back on topic now)

    A "constant" is what some languages call a "literal", and a "constant expression" is an expression that can be evaluated at compile time.

    For example, this:

    const int r = rand();

    is perfectly valid.

    Maybe the expression can be determined/evaluated at compile time but not the result. When I think of literals the resulting value has to be determined at compile time. So const int r = 15; would be to me a literal result. The compiler
    can bake that in without needing further runtime execution to get such result. But a const can be either a literal or non-literal in my view. Anything that cannot give a predetermined value at compile time is a const. So to me:

    const int r = rand();

    is not a literal only because the output of rand() is unknown until runtime. From a human-readable code perspective I get it. And fine, there can be a similarity between const and literal on the surface. But the moment you need to know exactly what the compiler is doing, those two things have to be separate.

    Perhaps there is a better way to do it. Or maybe there can be a literal type that is basically equal to const type for the purpose of coding or even perhaps a [--treat-const-as-literal] compiler parameter for code where a literal value and a const value should be treated the same. But I still think these two should
    be treated differently.

    I should note I don't have the original posting for this thread (I guess my provider doesn't have it) so I don't have the original URI that started this thread. If someone can share it in a reply I'd really appreciate it so I can be sure I'm on the same page with what is being discussed.
    Phillip Frabott
    {Adam: Is a void really a void if it returns? - Jack: No, it's just nullspace at
    that point.}


    --
    ----------------------------------------- --- -- -
    Posted with NewsLeecher v7.0 Final
    Free Newsreader @ http://www.newsleecher.com/
    ------------------------------- ----- ---- -- -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Keith Thompson on Fri Sep 27 17:33:56 2024
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

    The more C is changed to resemble C++ the worse it becomes. It
    isn't surprising that you like it.

    I presume that was intended as a personal insult.

    It wasn't.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Keith Thompson on Sat Sep 28 07:22:01 2024
    On 28.09.2024 05:34, Keith Thompson wrote:
    Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
    Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
    The more C is changed to resemble C++ the worse it becomes. It
    isn't surprising that you like it.

    For context, since the parent article is from a month and a half
    ago, I was discussing a proposal to change a future C standard to
    refer to "constants" as "literals". I mentioned that I think it's
    a good idea.

    I've heard of and seen various forms to name such entities...
    - in a Pascal and an Eiffel book I find all these named "constants"
    - in an Algol 68 book I read about "standard designations"
    - in a book about languages and programming in general I find
    "literals" ("abc"), "numerals" (42), "word-symbols" (false),
    "graphemes" (), etc., differentiated
    - I've also have heard about "standard representations [for the
    values of a respective type]"; also a type-independent term

    I also think (for various reasons) that "constants" is not a good
    term. (Personally I like terms like the Algol 68 term, that seems
    to "operate" on another [more conceptual] abstraction level.)

    But you'll certainly have to expect a lot of anger if the terminology
    of some standards documents get changed from one version to another.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)