• Re: tdom encoding

    From greg@21:1/5 to All on Tue Dec 17 03:13:14 2024
    Am 17.12.24 um 01:01 schrieb saito:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi <--Error-- "}"


    If it doesn't get removed by the newsgroup editors, there is a weird character at the very end of x.  It looks almost like "[]" but it is
    not.  When you edit it, it acts as if it has multiple characters in it.


    Another problem is that tdom man page talks about a command "dom setResultEncoding ?encodingName?" but trying it results in an unknown
    command error.

    Hello,

    The unknown character is 007 or BELL.
    Probably not allowed as a char in string.
    Instead: \u0007

    Gregor


    package req tdom

    proc chr c {
    if {[string length $c] > 1 } {
    error "chr: arg should be a single char"
    }
    set v 0
    scan $c %c v
    return $v
    }

    # Check character types and provide additional information
    proc charInfo char {
    if {[string is control $char]} {
    return "control character"
    } elseif {[string is space $char]} {
    return "space character"
    } elseif {[string is digit $char]} {
    return "digit character"
    } elseif {[string is lower $char]} {
    return "lowercase alphabetic character"
    } elseif {[string is upper $char]} {
    return "uppercase alphabetic character"
    } elseif {[string is punct $char]} {
    return "punctuation character"
    } elseif {[string is graph $char]} {
    return "graphical character"
    } elseif {[string is print $char]} {
    return "printable character"
    } else {
    return "unknown character type"
    }
    }

    proc infochar {x} {
    puts $x
    set i 0
    while {$i<[string length $x]} {
    set c [string index $x $i]
    puts "$i is $c [charInfo $c] [chr $c] "
    incr i
    }
    }

    set x {{"name":"Jeremi"}}
    infochar $x
    catch {dom parse -json $x} mess
    puts "mess: $mess"

    set x {{"name":"Jeremi\u0007"}}
    set doc [dom parse -json $x]
    puts [$doc asXML]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to saito on Tue Dec 17 04:20:54 2024
    saito <saitology9@gmail.com> wrote:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi^G"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi^G <--Error-- "}"

    Assuming the ^G that did come through properly represnts the
    character, then greg is right, it is an ASCII bell character, and per
    the JSON spec [1] raw control characters are not allowed to be part of
    a JSON string.

    Which is why Tdom is telling you 'error' at the ^G output.

    Are you on linux? If yes the hexdump, objdump, or xxd (xxd is easiest
    to use) commands will show you exactly what raw byte values exist in
    the file.


    [1] https://www.json.org/json-en.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to saito on Tue Dec 17 04:59:22 2024
    saito <saitology9@gmail.com> wrote:
    On 12/16/2024 9:13 PM, greg wrote:

    Hello,

    The unknown character is 007 or BELL.
    Probably not allowed as a char in  string.
    Instead: \u0007

    Gregor


    Thank you and Rich for the wonderful info and the code.

    The json data is what I receive from an api. I first thought it had
    to do with encoding issues. It happens frequently so I maybe I will
    ask them to be more careful with their json data generation.

    If you are getting it from an API then you've found a bug if the API
    is /really/ sending raw control characters as part of a JSON string.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rolf Ade@21:1/5 to saito on Wed Dec 18 15:04:07 2024
    saito <saitology9@gmail.com> writes:
    I am trying to see why tdom is failing on this json snippet.

    package req tdom
    set x {{"name":"Jeremi"}}
    dom parse -json $x

    error "JSON syntax error" at position 15
    "{"name":"Jeremi <--Error-- "}"

    Rich already pointed out rightly that control characters are not allowed literally in JSON strings. As tDOM rightly complains your input is not
    JSON.

    [snip]
    Another problem is that tdom man page talks about a command "dom setResultEncoding ?encodingName?" but trying it results in an unknown
    command error.

    You obviously use a (very) old tDOM version. The dom method
    setResultEncoding is a relict out of the times as tDOM still supported
    Tcl 8.0 (and the functionality was only needed / useful if build/used
    with Tcl 8.0).

    The documentation and implementation of this method was removed with
    tDOM 0.9.1 (more than six years ago). Most recent version is 0.9.5.

    rolf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Harald Oehlmann@21:1/5 to All on Wed Dec 18 21:49:14 2024
    Am 18.12.2024 um 20:57 schrieb saito:
    Thanks for the info. I am using version 0.9.5 I downloaded from its
    official site some time ago.  It comes with no documentation so I did an internet search.  I guess that piece of info is from an outdated web
    page obviously, which I kind of guessed.

    http://tdom.org/index.html/doc/trunk/doc/index.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)