Forum: >>> Magnum BBS <<<

PIM/ISO vs R10 number literals

From trijezdci@21:1/5 to Martin Brown on Tue Aug 30 03:38:33 2016

On Tuesday, 30 August 2016 18:22:01 UTC+9, Martin Brown wrote:

I'd hope there is a pragma to permit old style constants or you
immediately orphan all pre-existing Modula 2 code by requiring rework
just to try it out with the new compiler.

I am assuming you mean "number literals" when you say "old style constants".

We decided to abandon suffixed literals because they violate the very design philosophy Wirth has promoted in all of his compiler design texts, namely that design should follow (a) human readability and (b) single input symbol lookahead.

Prefixed literals are both better readable to humans and to machines alike.

Instead of providing a back door to use suffixed literals, we will provide a source-to-source translator. This can then be used to replace all suffixed literals in an input source file into prefixed equivalents in the output file.

Constant delimiters 123'456'789 is there any reason why it can't be , (granted some cultures use "," for "." in FP numbers but M2 doesn't)

You would need to use whitespace to differentiate between:

numbers with digit separators

123,345,789

and comma separated lists of numbers

123, 345, 789

This would increase opportunity for error.

Also, it would decrease readability when you have comma separated lists of numbers with digit separators in them

123,456,789, 321,654,987, ...

Apart from that, it requires two-character lookahead when lexing numbers. Granted we have to use two-character lookahead already when encountering a "." in a number to distinguish decimal point "." from ".." but this shouldn't be considered a get-out-of-
jail-free card for more cases needing such disambiguation.

Other languages that provide digit separators often use "_" as separator, for example Ada uses it. We toyed with that but found it looks less "natural" than using apostrophe as separator.

123_456_789

vs

123'456'789

I hope this makes sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Tue Aug 30 12:17:05 2016

On 2016-08-30, trijezdci <trijezdci@gmail.com> wrote:

I'd hope there is a pragma to permit old style constants or you
immediately orphan all pre-existing Modula 2 code by requiring rework
just to try it out with the new compiler.

I am assuming you mean "number literals" when you say "old style constants".

We decided to abandon suffixed literals because they violate the very
design philosophy Wirth has promoted in all of his compiler design texts, namely that design should follow (a) human readability and (b) single
input symbol lookahead.

All literals are one symbol (done by tokenizer/scanner, not parser), so IMHO this is an bogus argument. And I don't think readability is better or worse.

So basically it leaves Cification as unmentioned but clear driving force. Or
at best new/change of syntax as a goal in itself.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Tue Aug 30 06:31:34 2016

On Tuesday, 30 August 2016 21:17:06 UTC+9, Marco van de Voort wrote:

We decided to abandon suffixed literals because they violate the very design philosophy Wirth has promoted in all of his compiler design texts, namely that design should follow (a) human readability and (b) single
input symbol lookahead.

All literals are one symbol (done by tokenizer/scanner, not parser),

Scientifically speaking there is no difference between lexing and parsing. The distinction is entirely arbitrary. It is merely made for the convenience of the implementor. Some parsing methods don't make the distinction. See Dick Grune's seminal work on
parsing techniques for a more detailed discussion.

And I don't think readability is better or worse.

Readability, or at least some effects of it (or lack thereof) can be measured.

Suffix notation causes more eye movement which increases mental load which is an indicator for lesser readability. The effect is not noticeable with very short literals such as 040H, but it increases significantly with literal length.

Consequently this wasn't a serious issue in an era where all hardware was limited to an address space that could be encoded in four digit base-8 numbers.

However, in this day and age we are dealing with significantly longer literals where it is an issue and we addressed this by switching to the superior prefix notation of C.

There are a very few occasions where Kernighan and Ritchie simply had a better idea than Wirth. This is one of them. We can choose to be mature and acknowledge when somebody else simply had the better idea even if we don't like much else about that
somebody's work. Or we can choose to be dismissive against evidence. The latter will only hurt ourselves.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Tue Aug 30 15:16:40 2016

On 2016-08-30, trijezdci <trijezdci@gmail.com> wrote:

namely that design should follow (a) human readability and (b) single
input symbol lookahead.

All literals are one symbol (done by tokenizer/scanner, not parser),

Scientifically speaking there is no difference between lexing and parsing. The distinction is entirely arbitrary. It is merely made for the
convenience of the implementor. Some parsing methods don't make the distinction. See Dick Grune's seminal work on parsing techniques for a
more detailed discussion.

Regardless of wordplay, was Wirth's remark actually in such wider context?

And I don't think readability is better or worse.

Readability, or at least some effects of it (or lack thereof) can be measured.

Suffix notation causes more eye movement which increases mental load which
is an indicator for lesser readability. The effect is not noticeable with very short literals such as 040H, but it increases significantly with
literal length.

Afaik that only counts for untrained people. In trained people the shift
from characterwise to more wordwise reading compensates.

This is why natural languages also have suffixes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Tue Aug 30 09:37:58 2016

On Wednesday, 31 August 2016 00:16:41 UTC+9, Marco van de Voort wrote:

Scientifically speaking there is no difference between lexing and parsing. The distinction is entirely arbitrary. It is merely made for the convenience of the implementor. Some parsing methods don't make the distinction. See Dick Grune's seminal work on parsing techniques for a more detailed discussion.

Regardless of wordplay, was Wirth's remark actually in such wider context?

I do not accept the premise that there is a fundamental difference between lexing and parsing. As Dick Grune's work will attest, I am in good company.

Under this premise there is then no such thing as a narrower or wider context. There is only one context and that is the context of symbol stream processing.

What justification would there be that a sound principle of efficient symbol stream processing (read one symbol ahead at a time to decide how to proceed without having to backtrack) is worthwhile adhering to when processing a symbol stream with tokens
representing lexemes of length > 1 but it is not worthwhile adhering to when processing a symbol stream with tokens representing themselves and are lexemes of length = 1?

Suffix notation causes more eye movement which increases mental load which is an indicator for lesser readability. The effect is not noticeable with very short literals such as 040H, but it increases significantly with literal length.

Afaik that only counts for untrained people. In trained people the shift
from characterwise to more wordwise reading compensates.

Like I said, the effect increases with length. Eye movement studies with texts in languages with very long words such as Cymraeg (aka Welsh), Finnish and German have shown that the effect remains even for proficient readers.

This is why natural languages also have suffixes.

And the vast majority of suffixed use cases in natural languages are short words. Aplty, briefly, candidly, distantly, easily, fairly, grizly, highly, etc etc etc. Very long words are generally nouns and they are used without suffixes.
Fussbodenschleifmaschinenverleih, not fussbodenschleifmaschinenverleihlich.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Wed Aug 31 08:29:55 2016

On 2016-08-30, trijezdci <trijezdci@gmail.com> wrote:

convenience of the implementor. Some parsing methods don't make the
distinction. See Dick Grune's seminal work on parsing techniques for a
more detailed discussion.

Regardless of wordplay, was Wirth's remark actually in such wider context?

I do not accept the premise that there is a fundamental difference between lexing and parsing. As Dick Grune's work will attest, I am in good company.

Well, it does matter if it is one char or one symbol lookahead.

This is why natural languages also have suffixes.

And the vast majority of suffixed use cases in natural languages are short words. Aplty, briefly, candidly, distantly, easily, fairly, grizly,
highly, etc etc etc. Very long words are generally nouns and they are
used without suffixes. Fussbodenschleifmaschinenverleih, not fussbodenschleifmaschinenverleihlich.

Those are aggregates, not single words. Moreover -lich is a suffix. (-like)

Anyway, most constants remain short. A few rare times you define a mask
with all like $FFFF or $7FFF (and its 64-bit variants), and those are repeating.

I still think the whole reasoning for this change is totally bogus and dragged-by-the-hairs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Wed Aug 31 03:58:16 2016

On Wednesday, 31 August 2016 17:29:56 UTC+9, Marco van de Voort wrote:

Well, it does matter if it is one char or one symbol lookahead.

No it doesn't. Processing the symbol stream treats either as a single unit, which is the whole reason to use a tokeniser as a front end to a parser.

Those are aggregates, not single words.

The point was that they are nouns.

Moreover -lich is a suffix. (-like)

And I used it as such.

Anyway, most constants remain short. A few rare times you define a mask
with all like $FFFF or $7FFF (and its 64-bit variants), and those are repeating.

Funny that you chose to use prefixed literals there apparently out of preference while at the same time calling that very choice "totally bogus".

In any event, the world has moved on to those totally bogus prefix literals in just about every notation imaginable while accepting the arguments presented in their favour. By contrast, those who are attached to not totally bogus or totally not bogus
suffix literals are in a vanishingly tiny minority.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to rug...@gmail.com on Thu Sep 1 17:50:05 2016

On Friday, 2 September 2016 08:42:09 UTC+9, rug...@gmail.com wrote:

Funny that you chose to use prefixed literals there apparently out
of preference while at the same time calling that very choice
"totally bogus".

He's a core FreePascal member, so it's not surprising that he mentions
(Turbo Pascal-ish) '$' (hex) prefixes. Heck, I almost mentioned it myself!

Yes I am aware of that. I pointed this out because of the apparent hypocrisy involved: If Borland or the FPC project make that choice, apparently that is alright. But if we make that choice then it is "totally bogus". This ain't right.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rugxulo@gmail.com@21:1/5 to trijezdci on Thu Sep 1 16:42:09 2016

Hi,

On Wednesday, August 31, 2016 at 5:58:17 AM UTC-5, trijezdci wrote:

On Wednesday, 31 August 2016 17:29:56 UTC+9, Marco van de Voort wrote:

Anyway, most constants remain short. A few rare times you define
a mask with all like $FFFF or $7FFF (and its 64-bit variants),
and those are repeating.

Funny that you chose to use prefixed literals there apparently out
of preference while at the same time calling that very choice
"totally bogus".

He's a core FreePascal member, so it's not surprising that he mentions
(Turbo Pascal-ish) '$' (hex) prefixes. Heck, I almost mentioned it myself!

In any event, the world has moved on to those totally bogus prefix
literals in just about every notation imaginable while accepting the arguments presented in their favour. By contrast, those who are attached
to not totally bogus or totally not bogus suffix literals are in a vanishingly tiny minority.

Intel-style x86 assembly still overwhelmingly uses 0Ah notation for hex.
Many assemblers support various styles (including all of those mentioned above), but it's still very common to see, e.g. "stack 8000h" or
"test al,1100b" (FASM).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Fri Sep 2 09:28:26 2016

On 2016-09-02, trijezdci <trijezdci@gmail.com> wrote:

He's a core FreePascal member, so it's not surprising that he mentions
(Turbo Pascal-ish) '$' (hex) prefixes. Heck, I almost mentioned it myself!

Yes I am aware of that. I pointed this out because of the apparent
hypocrisy involved: If Borland or the FPC project make that choice, apparently that is alright. But if we make that choice then it is
"totally bogus". This ain't right.

"Totally bogus" was a reference to the reasons to change it, not suffix
syntax in general. To my best knowledge Borland (and related dialects)
never changed from suffix from prefix.

If tomorrow sb arrived on the FPC lists/forum with similar arguments to
change the syntax to suffix, I'd react the same.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Fri Sep 2 05:07:20 2016

On Friday, 2 September 2016 18:28:27 UTC+9, Marco van de Voort wrote:

Yes I am aware of that. I pointed this out because of the apparent hypocrisy involved: If Borland or the FPC project make that choice, apparently that is alright. But if we make that choice then it is
"totally bogus". This ain't right.

"Totally bogus" was a reference to the reasons to change it, not suffix syntax in general. To my best knowledge Borland (and related dialects)
never changed from suffix from prefix.

If tomorrow sb arrived on the FPC lists/forum with similar arguments to change the syntax to suffix, I'd react the same.

That's a strawman. If it really was as you claim, that would be an argument from indifference, but you didn't display any indifference. In fact, the words "totally bogus" alone do not go along with indifference.

A true argument from indifference might have been something along the following lines ...

"I do not believe the effects of the research you mentioned to be that strong and important, therefore I believe this is more of a matter of preference than it is a matter of readability. But at the end of the day I can live with either choice."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Fri Sep 2 13:47:19 2016

On 2016-09-02, trijezdci <trijezdci@gmail.com> wrote:

If tomorrow sb arrived on the FPC lists/forum with similar arguments to
change the syntax to suffix, I'd react the same.

That's a strawman.

"I don't subscribe to your linguistic analysis of MY answer."

I hope I have now put it in a format you can understand.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Fri Sep 2 11:39:48 2016

On Friday, 2 September 2016 22:47:20 UTC+9, Marco van de Voort wrote:

"I don't subscribe to your linguistic analysis of MY answer."

I hope I have now put it in a format you can understand.

I was trying to be polite but you just don't seem to get it.

YOU WERE BEING RUDE with your language. And in case you were wondering, the clue lies in the words "totally bogus". Please learn some manners.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rugxulo@gmail.com@21:1/5 to Marco van de Voort on Fri Sep 2 14:27:38 2016

Hi,

Just to refocus this thread on technical concerns ....

On Friday, September 2, 2016 at 4:28:27 AM UTC-5, Marco van de Voort wrote:

On 2016-09-02, trijezdci <trijezdci@spam.sux> wrote:

He's a core FreePascal member, so it's not surprising that he mentions
(Turbo Pascal-ish) '$' (hex) prefixes. Heck, I almost mentioned it myself!

Yes I am aware of that. I pointed this out because of the apparent hypocrisy involved: If Borland or the FPC project make that choice, apparently that is alright. But if we make that choice then it is
"totally bogus". This ain't right.

"Totally bogus" was a reference to the reasons to change it, not suffix syntax in general. To my best knowledge Borland (and related dialects)
never changed from suffix from prefix.

If tomorrow sb arrived on the FPC lists/forum with similar arguments to change the syntax to suffix, I'd react the same.

Overall, it doesn't really matter. Small changes like this are the least
of anyone's worries.

I still prefer 0--h for hex, but that's just me. I've seen people use
0x and $ in Intel assembly, but I always found it odd. The overwhelming majority seems to prefer 0--h there. Of course, I also shun AT&T syntax,
but some people still prefer that (even though GAS has supported both
since many years), so who knows.

Just for clarity, FPC (actually, 3.0.0's ppcross8086) supports inline
assembly, but it only supports 0--h or $ and thus not 0x at all.
(This may vary based upon bin writer or external assembler, of course.
I honestly don't know.)

Here, I'll just point to this to pretend to be exhaustive: http://www.freepascal.org/docs-html/ref/refse6.html

How does Ada do it? A quick search shows 16#FF# for 255. Similar (but
not quite) to Extended Pascal (16#FF) and Modula-3 (16_FF).

But almost anything is better than octal! :-P

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to trijezdci on Sat Sep 3 14:10:15 2016

On 2016-09-02, trijezdci <trijezdci@gmail.com> wrote:

I hope I have now put it in a format you can understand.

I was trying to be polite but you just don't seem to get it.

YOU WERE BEING RUDE with your language.

Pot, meet kettle. It was a reaction to your aggressive stance and the so
called evidence and more importantly the application of it to suffix
literals in languages.

The literals are typically very short (and IMHO 64-bit compiler don't change that much, can't remember the last time I hardcoded a pointer value) and besides that simple in structure. A whole number like 10000 or something repetitive as 99999 is much more common than a random number 153872 and
clearly fall in the category of reading them in one glance, wordwise.

So you whole reasoning sounded as something pulled out of a hat to justify
some decision already taken based on other grounds.

I still stand by that opinion, and this is my last response on the matter.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to Marco van de Voort on Sat Sep 3 10:02:00 2016

On Saturday, 3 September 2016 23:10:22 UTC+9, Marco van de Voort wrote:

I was trying to be polite but you just don't seem to get it.

YOU WERE BEING RUDE with your language.

Pot, meet kettle.

You resorted to vulgarity, I didn't.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to rug...@gmail.com on Fri Sep 2 16:36:29 2016

On Saturday, 3 September 2016 06:27:39 UTC+9, rug...@gmail.com wrote:

Just to refocus this thread on technical concerns ....

Overall, it doesn't really matter. Small changes like this are the least
of anyone's worries.

Have you heard of TQM? Or Six Sigma, Kaizen, Lean?

These are quality management methodologies whose aim it is to reliably raise quality to levels so close to 100% that it is for all practical purposes indistinguishable from 100% even if 100% is not actually reachable.

Six Sigma for example sets a goal of 99.99966% quality or 3.4 defects in one million opportunities. When you get so close to 100, the weirdest things happen. If you are coming from a natural science background, a good analogy might be quantum mechanics.
Observable events simply seem to defy common sense.

For example, a car factory under such a quality regime might find that their quality significantly dropped below target after they changed the subcontractor who cleans the towels for the towel drums in the bath rooms. Nobody can figure out what's wrong
with those towels, they seem just fine, but the change in the towel cleaning firm messes up the quality of the cars made at the plant.

It would seem totally whacko but this is the kind of thing you will observe when taking quality to the extreme. There is nothing that can be ruled out as "doesn't really matter". Everything has a potential effect on the product. Everything matters, no
matter how tiny.

The management methodologies deal predominantly with measurement and attitude. Since you cannot really predict what whacko thing will mess up your quality target, you need to cultivate an attitude where everyone is alert and nobody considers anything as "
doesn't really matter".

We have taken that attitude to heart during our revision and design. To us, every tiny little whacko thing matters. If we can avoid only one bug in 1 million lines of code by reducing every imaginable ever so tiny opportunity for error/defect, then it
will have been worthwhile the bother.

Before this background, prefix literals belong to the more important changes in our revision and they will have a significant impact, regardless of what some people who have expressed opinions to the contrary here think. You are entitled to your opinion,
but can you present any scientific research to back it up? We have spent quite a bit of reading over the yeaars on various research that has led us to accept that these types of design decisions have an impact significant enough to bother.

Last but not least, mental load is something that adds up, little by little. Small distractions lower our ability to handle mental load. Being tired or frustrated about something will further lower it. An ever so slightly distracting feature in a
notation that didn't cause a bug on a good day will cause a bug on a not so good day. It is all about reducing opportunity for error/defect.

I still prefer 0--h for hex, but that's just me.

Not all suffixes are equally bad. I'm afraid to say so, but classic Modula-2's suffix literals feature an accumulation of the worst possible design choices.

Digits 0-9, capital A-F and H all have a similar visual footprint, using the full ascent and filling every corner of the bounding box. This makes the H suffix blend in with the digits. It doesn't stand out at all.

By contrast, in traditional assembly notation, where the base-16 specific digits A-F are also capitalised but a lowercase h is used, the h has a much smaller footprint because although it uses the full ascend, its ascender is but a single arm and it
doesn't fill every corner of the bounding box. This makes the lowercase h stand out significantly better than the capital H.

In other words, 123Fh is far better readable than classic Modula-2's 123FH.

Likewise, 0abcdh is far worse than 0ABCDh.

For the same reason, 0xabcd and 0XABCD are both far worse than 0xABCD.

And the same holds true for decimal numbers with exponents: 1.23e45 is better than 1.23E45.

The low readability is greatly exacerbated by the fact that the suffixes for base-8 literals are also valid digits in base-16 literals in classic Modula-2. This is the WORST design for literals of any kind in any programming language.

This could of course be eliminated by simply removing base-8 literals, but then you still need to switch from H to h and once you have done that, its incompatible anyway, so you may as well go a little further still and use prefix literals. As I said, it
is all about reducing every possible opportunity for distraction and ultimately error/defect.

Also, in embedded development (which is one area we specifically wanted to support) it is quite common to use binary literals. When you add those, then there is the question what suffix to use. If the letter B is to be used, again, 0110b is far better
than 0110B, especially so since B is also a valid base-16 digit. But here again, once we've broken compatibility we may as well go the whole distance and use prefix literals.

Last but not least, you want to use a different prefix or suffix again for character code points. If you were to use the letter C, it should at least be lowercase: 040c is far better than 040C for the aforementioned reasons. But if you swich the radix
and keep the same suffix, there is great potential for confusion about the radix. In the days of unicode we may as well use the letter U though and again, lowercase is better than uppercase for prefixes/suffixes. 040u or 0u40 is better then 040U or 0U40.

0x and $ in Intel assembly, but I always found it odd. The overwhelming majority seems to prefer 0--h there. Of course, I also shun AT&T syntax,
but some people still prefer that (even though GAS has supported both
since many years), so who knows.

With non-letter prefixes or suffixes it is difficult to find three separate and distinct symbols (base-2, base-16, unicode) you could use as prefix or suffix and still have some mnemonic value to hint at the meaning. However, non-letter prefixes and
suffixes have the advantage that they stand out more. There you have to make a trade-off between mnemonic value and visual cue value.

How does Ada do it? A quick search shows 16#FF# for 255. Similar (but
not quite) to Extended Pascal (16#FF) and Modula-3 (16_FF).

Indeed, Ada, Extended Pascal and Modula-3 all use PREFIX literals.

Their designers all made that "totally bogus" choice.

Perhaps it was the other way round and Modula-2 took the odd choice here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rugxulo@gmail.com@21:1/5 to trijezdci on Sat Sep 3 16:47:59 2016

Hi,

On Friday, September 2, 2016 at 6:36:30 PM UTC-5, trijezdci wrote:

On Saturday, 3 September 2016 06:27:39 UTC+9, rug...@gmail.com wrote:

Overall, it doesn't really matter. Small changes like this are
the least of anyone's worries.

It would seem totally whacko but this is the kind of thing you
will observe when taking quality to the extreme. There is nothing
that can be ruled out as "doesn't really matter". Everything has
a potential effect on the product. Everything matters, no matter
how tiny.

I still don't think it's worthy of major attention or worry. Bugs
happen, but you're more likely to run into other issues (e.g.
mishandled dynamic memory) than this.

But, in hindsight, I agree that these problems can occur
if you're not careful.

Let me just show a small example from one old assembler:

===========================================================
; A86
org 100h ; hex

mov ax,111b ; binary
mov bx,0111b ; leading zero is hex (default, but +D uses binary)
mov cx,0111xb ; binary
mov dx,111bxh ; hex

nop

;radix 10
mov ax,5150 ; decimal

radix 16
mov ax,5150 ; hex

radix 8
mov ax,5150 ; octal

ret
===========================================================

So the problem is several-fold:

* trying to be parsimonious (as dmr remarked about both himself and Wirth)
* trying to be (mostly) compatible
* trying to be unambiguous
* having defaults that can be (obscurely) changed
* non-standard (thus unfamiliar) extensions

Most assemblers handle "0011b" as binary just fine. I guess EJI
considered it a waste of space to specify leading zeros for binary.
Thus, he decided (by default, although this can be changed with
cmdline switch) to make leading zero indicate hex! AFAIK, nobody
else does that. Perhaps since 0[A-F]\+[hH] needs it anyways that
he figured he could save the 'h' suffix entirely. Of course, he
also allowed unambiguous 1100xb, but that's unsupported in
most other assemblers.

So what do you do? Well, for one old piece of code that I
was sharing between assemblers, I just always used hex,
no binary at all. Of course, you can always use a third-party
preprocessor (or sed or whatever) before-the-fact to translate
minor stuff like this. But it seemed easier to just use hex.
(Especially since char/word literals like 'DC' were in
different endian order depending on assembler, so that was
problematic as well.)

Assemblers themselves are (mostly) case insensitive (like Pascal),
so it's not true that 'h' is always preferred (even if I partially
agree with you).

BTW, although it was supported, I never saw anybody use octal.
Oh, just for the record, octal supposedly helps with decoding x86
opcodes, but most people (myself included) don't see much
convenience there. Even octal dump (od), I normally just use in
hex (-v -Ax -tx1). (Actually, I wrote my own, but hex output only!)

I don't know if any of this proved any points, but I still thought
it was interesting.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco van de Voort@21:1/5 to rugxulo@gmail.com on Thu Sep 8 02:53:33 2016

On 2016-09-07, rugxulo@gmail.com <rugxulo@gmail.com> wrote:

I mostly agree, though I'm sure someone somewhere can still make
a case for octal. But surely it shouldn't be preferred over hex.

The only non freak reason I can think of are Unix permissions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From rugxulo@gmail.com@21:1/5 to trijezdci on Wed Sep 7 15:25:46 2016

Hi again,

On Sunday, September 4, 2016 at 4:36:00 AM UTC-5, trijezdci wrote:

On Sunday, 4 September 2016 08:47:59 UTC+9, rug...@gmail.com wrote:

But, in hindsight, I agree that these problems can occur
if you're not careful.

The point is that while insignificant looking items may have less
impact when they are seen in ISOLATION, they have a significantly
higher impact when PUT TOGETHER. The sheer number of little things
contribute significantly to the total. Therefore, it is worthwhile
also paying attention to minute details.

I agree.

Let me just show a small example from one old assembler:

mov ax,111b ; binary
mov bx,0111b ; leading zero is hex (default, but +D uses binary)

Well, that's not even an argument about prefix or suffix because
it uses both prefix and suffix at the same time and they are in
conflict with each other.

I wouldn't consider '0' a prefix for binary. Most assemblers treat
it as part of the number, not a special char. My point was that I
assume he thought extra 0000 (before any 1) was superfluous, thus
the (convenient? confusing!) adjustment for other bases (which
thankfully can be overriden). He also supports "1k" (kilobyte)
and "bit 5" (32);

That's just bad design. A design should stick to either prefix
or suffix.

Like I said, I suppose he was trying to be "parsimonious",
i.e. laconic. Besides, apparently his defaults were tailored to
him, so he found them superior, using them almost exclusively.

NASM, for instance, apparently supports prefix or suffix, in various
ways.

(Maybe x86 assembly was a bad example, but the point is that even there
it's far from "standard" on how things are done. Only the crudest of
basics are agreed upon, which makes things harder for exchanging code.)

So the problem is several-fold:

* trying to be (mostly) compatible

Indeed, backwards compatibility is more often a curse than it is
a blessing. And if you break it, better break it for good, don't
make it look like it is still is compatible when it actually isn't.
This comes down to the principle of least surprise again.

I consider compatibility a noble virtue, but sometimes the burden
is too heavy. For small projects, it's probably easier to eschew
any formalities and just hack out whatever works.

But overall, I'd prefer (de facto or de jure) "standard". This
is why some compilers support various competing dialects. Sure,
it's harder to achieve good compatibility (accept good code,
reject bad code), but overall I think it's worth it.
(GPC and FPC are good exemplars of this virtue.)

Our design principles expressly state that backwards compatibiity
must not interfere with any other design goals. It has the lowest
priority by definition.

Yes and no. I understand sometimes it's a heavy burden to shoehorn
everything into a small compatible subset. But overall I'm tired of non-portable code. I'm still a fan of x86 assembly, but overall
it's worthless (except for small size, fun, nostalgia, etc).

(Of course, assembly will never go away, esp. with SIMD, BMI, etc.
But it may be relegated to being inlined only or intrinsics.)

Compiler-specific code isn't much better than assembly (regarding
portability). Anything that's worth doing is probably worth doing
portably (to as many targets as possible).

Of course, somebody has to write the compiler, libs, etc.
So somebody still has to deal with assembly, but it probably
won't be the average developer.

* trying to be unambiguous

I disagree on that one. Ambiguity is a very significant contributor
to error.

I think you misunderstood (due to my poor wording). I'm in no way
advocating for ambiguity as a good thing.

The problem arises when you allow other less important
goals such as backwards compatibility to interfere and the only
way to reconcile conflicting design goals is to produce an
unnecessarily complex and confusing design.

Sometimes you have to straddle the line (or do without, which is
rarely a good solution to anything). Extreme compatibility involves
all kinds of weird tricks.

* non-standard (thus unfamiliar) extensions

Standard does not always mean familiarity. De facto standard
perhaps.

Hence GPC vs. FPC (or MASM vs TASM vs whatever else).

But the de facto standard for number literals is the
0x, 0u, 0b prefix convention. Many more languages use that than
any other. And millions of practitioners across a large spectrum
of languages are familiar with it.

Let's not overstate the universality of it. Nothing is totally
widely accepted everywhere.

By contrast, there are only a few hundred people on the entire
planet who are familiar with the literals of classic Modula-2's.

Again, this might be a bit exaggerated (but the number has probably
decreased heavily since the '80s).

Of course Wirth later omitted base-8 in Oberon. Thirty years on,
it is about time we removed base-8 from all languages. In our day
and age it has no practical use whatsoever. It has become a
ridiculous artefact.

I mostly agree, though I'm sure someone somewhere can still make
a case for octal. But surely it shouldn't be preferred over hex.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to rug...@gmail.com on Wed Sep 7 20:08:33 2016

On Thursday, 8 September 2016 07:25:46 UTC+9, rug...@gmail.com wrote:

mov ax,111b ; binary
mov bx,0111b ; leading zero is hex (default, but +D uses binary)

Well, that's not even an argument about prefix or suffix because
it uses both prefix and suffix at the same time and they are in
conflict with each other.

I wouldn't consider '0' a prefix for binary. Most assemblers treat
it as part of the number, not a special char. My point was that I
assume he thought extra 0000 (before any 1) was superfluous, thus
the (convenient? confusing!) adjustment for other bases (which
thankfully can be overriden).

I don't want to get stuck on terminology here.

My advocacy is for the LL(1) principle.

A notation that lets you discover the type of a compound symbol at the start of the symbol is superior to a notation that forces you to read all the atomic symbols of the compound symbols until you reach the end before you know the type. That's the gist
of it.

In left-to-right writing systems, this means whatever indicates the type of the literal should be leftmost. In right-to-left writing systems (such as Arabic) it means whatever indicates the type of the literal should be rightmost.

The translation of my earlier statement

That's just bad design. A design should stick to either prefix
or suffix.

is thus: A notation where the type of a compound symbol is indicated both by the leftmost and rightmost atomic symbols where the two can be in conflict is just bad design. A notation should stick to either leftmost or rightmost.

Indeed, backwards compatibility is more often a curse than it is
a blessing. And if you break it, better break it for good, don't
make it look like it is still is compatible when it actually isn't.
This comes down to the principle of least surprise again.

I consider compatibility a noble virtue, but sometimes the burden
is too heavy. For small projects, it's probably easier to eschew
any formalities and just hack out whatever works.

My observation has been that backwards compatibility is far more costly than breaking clean and provide a translator tool to convert legacy code or data.

When backwards compatibility conflicts with what would be good design decisions, they incur what is know as technical debt. And like financial debt, you incur interest on technical debt. Also, the longer you carry the debt around, the more expensive the
interest payments.

A translator tool and conversion of legacy code and data is equivalent to paying off the debt early and be free from interest payments in the future.

But overall, I'd prefer (de facto or de jure) "standard". This
is why some compilers support various competing dialects. Sure,
it's harder to achieve good compatibility (accept good code,
reject bad code), but overall I think it's worth it.
(GPC and FPC are good exemplars of this virtue.)

Compatibility across compilers was more important in the days when compilers were expensive proprietary products that often only worked on a single platform and targeted a single platform. This way you had to use a different compiler from a different
vendor to deploy to a different target platform.

In the days of open source compilers (or compiler infrastructure backends) which are written to run on multiple platforms and generate code for multiple platforms this is no longer as important as it once was.

Our bootstrap compiler (or compiler suite) supports PIM3 and PIM4 because the bulk of Modula-2 literature is based on PIM3 and PIM4 and those books are neither going to be updated nor are they going to be replaced any time soon. But for bootstrapping to
M2R10 we added an extended mode with select features from R10 because it is more convenient to write the self-hosting compiler in a subset of itself.

Yet if you want cross platform portability, you will not find any M2 compiler that can be deployed to and target as many platforms.

M2C is written in C99 and generates C99 which makes it already very portable, but for targeting the JVM and the CLR I have begun cloning two derivative compilers, M2J (transcribed to Java and generating Java) and M2Sharp (transcribed to C# and generating
C#).

The self-hosting compilers bootstrapped from them will generate LLVM, JVM bytecode and CLR bytecode respectively. Plus, Gaius Mulley has pledged to implement M2R10 in GM2, thereby also providing access to gcc targets.

It doesn't get any more portable than this. Thus, the need to support different dialects of past and present M2 compilers is greatly diminished.

Our design principles expressly state that backwards compatibiity
must not interfere with any other design goals. It has the lowest
priority by definition.

Yes and no. I understand sometimes it's a heavy burden to shoehorn
everything into a small compatible subset. But overall I'm tired of non-portable code.

Compiler-specific code isn't much better than assembly (regarding portability). Anything that's worth doing is probably worth doing
portably (to as many targets as possible).

As I mentioned, it is far less expensive to build a converter to convert from other dialects. In fact, a PIM to R10 converter could be cloned from the existing M2C front end with relatively moderate effort. One of the most time consuming aspects of M2C
was figuring out how to generate readable C output while trying to stick as much as possible to C naming conventions. How do you translate Modula-2 identifiers of a hierarchical namespace to C identifiers in a flat namespace? That had me bogged down for
months and I am only now getting back to working on the actual code generator. When generating Modula-2 of one dialect from another, identifiers can be used verbatim, and structurally everything except variant records is a 1:1 mapping, thereby greatly
reducing the effort required.

All the while, the self-hosting compilers will be much simpler as a result because they do not need to carry around code for different dialects, nor different backends except for the LLVM one which will also generate C. They will share the same source
code for their front end though.

* trying to be unambiguous

I disagree on that one. Ambiguity is a very significant contributor
to error.

I think you misunderstood (due to my poor wording). I'm in no way
advocating for ambiguity as a good thing.

Fair enough. I had a bit of trouble figuring out what you wanted to say. It didn't quite seem to fit. ;-)

But the de facto standard for number literals is the
0x, 0u, 0b prefix convention. Many more languages use that than
any other. And millions of practitioners across a large spectrum
of languages are familiar with it.

Let's not overstate the universality of it. Nothing is totally
widely accepted everywhere.

I didn't say it is. But the 0x convention is several orders of magnitude more widespread than Wirth's H and X suffixes.

By contrast, there are only a few hundred people on the entire
planet who are familiar with the literals of classic Modula-2's.

Again, this might be a bit exaggerated (but the number has probably
decreased heavily since the '80s).

Maybe there are more than a thousand Modula-2 practitioners still around. Maybe there are more than a thousand Oberon practitioners around. Maybe the combined number is 5000. I think that would be extremely generous but even in that case, it is still
several orders of magnitude less than the combined practitioners of languages using the 0x convention. And there ARE millions of Java/Csharp/C++ developers. Drone factories calling themselves universities are spitting them out in an assembly line process
on an industrial scale. I am not exaggerating.

I mostly agree, though I'm sure someone somewhere can still make
a case for octal. But surely it shouldn't be preferred over hex.

I did make the case for octal. The only case there is to make: When using architectures whose register size and addresses are based on multiples of six.

There is no case to be made for octal on architectures whose register size and addresses are based on multiples of eight. Well, except when you are writing an emulator of an architecture that uses multiples of six. :P

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From trijezdci@21:1/5 to rug...@gmail.com on Sun Sep 4 02:35:59 2016

On Sunday, 4 September 2016 08:47:59 UTC+9, rug...@gmail.com wrote:

I still don't think it's worthy of major attention or worry. Bugs
happen, but you're more likely to run into other issues (e.g.
mishandled dynamic memory) than this.

But, in hindsight, I agree that these problems can occur
if you're not careful.

The point is that while insignificant looking items may have less impact when they are seen in ISOLATION, they have a significantly higher impact when PUT TOGETHER. The sheer number of little things contribute significantly to the total. Therefore, it is
worthwhile also paying attention to minute details.

Let me just show a small example from one old assembler:

mov ax,111b ; binary
mov bx,0111b ; leading zero is hex (default, but +D uses binary)

Well, that's not even an argument about prefix or suffix because it uses both prefix and suffix at the same time and they are in conflict with each other.

That's just bad design. A design should stick to either prefix or suffix.

So the problem is several-fold:

Overall, I would consider inconsistency to be the biggest contributor to opportunity for error, so the first rule should be to be consistent and to follow the principle of least surprise.

* trying to be parsimonious (as dmr remarked about both himself and Wirth)

Certainly nobody can exuse us of having been parsimonious. We kept revisiting and refining even minute details again and again.

* trying to be (mostly) compatible

Indeed, backwards compatibility is more often a curse than it is a blessing. And if you break it, better break it for good, don't make it look like it is still is compatible when it actually isn't. This comes down to the principle of least surprise again.

Our design principles expressly state that backwards compatibiity must not interfere with any other design goals. It has the lowest priority by definition.

* trying to be unambiguous

I disagree on that one. Ambiguity is a very significant contributor to error. The problem arises when you allow other less important goals such as backwards compatibility to interfere and the only way to reconcile conflicting design goals is to produce
an unnecessarily complex and confusing design.

* having defaults that can be (obscurely) changed

Aka violating the principle of least surprise.

* non-standard (thus unfamiliar) extensions

Standard does not always mean familiarity. De facto standard perhaps. But the de facto standard for number literals is the 0x, 0u, 0b prefix convention. Many more languages use that than any other. And millions of practitioners across a large spectrum of
languages are familiar with it. By contrast, there are only a few hundred people on the entire planet who are familiar with the literals of classic Modula-2's. Thus, this would be a strong argument in favour of the former.

Assemblers themselves are (mostly) case insensitive (like Pascal),
so it's not true that 'h' is always preferred (even if I partially
agree with you).

I didn't say it is preferred. I said 0123Fh is better than 01234FH. Even if the translator itself is case insensitive, using a lowercase h even in the presence of capitalised A-F digits is certainly a widely used convention. By contrast, Modula-2 being
case sensitive, does not allow you to use that convention.

BTW, although it was supported, I never saw anybody use octal.

Octal notation was only ever useful on the 12-bit, 18-bit and 36-bit architectures of the 1950s and 1960s because the bit widths of character codes, words and addresses on these systems were all divisible by three. They could be presented in groups of
three bits and each 3-bit group could be conveniently expressed in base-8.

6-bit character codes:
00 = 000 000
77 = 111 111

12-bit words and addresses:
0000 = 000 000 000 000
7777 = 111 111 111 111

18-bit words and addresses:
000:000 = 000 000 000 | 000 000 000
777:777 = 111 111 111 | 111 111 111

36-bit addresses:
0000:0000:0000 = 000 000 000 000 | 000 000 000 000 | 000 000 000 000 7777:7777:7777 = 111 111 111 111 | 111 111 111 111 | 111 111 111 111

As you can see, using base-8 was as convenient and useful back then as base-16 is to us today.

But this all changed when architectures moved to register sizes and addresses based on multiples of eight.

for 8-bit character codes:
0 0 0 = 00 000 000
3 7 7 = 11 111 111

is far less convenient and useful than

00 = 0000 0000
FF = 1111 1111

for 16-bit words and addresses:
0 0 0 0 0 0 = 0 000 000 000 000 000
1 7 7 7 7 7 = 1 111 111 111 111 111

is far less convenient and useful than

0000 = 0000 0000 0000 0000
FFFF = 1111 1111 1111 1111

and it gets more and more inconvenient for larger multiples of eight.

However, the programming language designers of the 1960s and 1970s, including Wirth and Ritchie, learned their trade on architectures with 6-bit character codes and 12-bit, 18-bit and 36-bit addressing. Base-8 notation was as natural to them as Base-16
notation is to us today. Their inertia clinging on to base-8 for longer than necessary is thus not surprising.

Also, C was initially developed on a PDP-8 (12-bit architecture) where it made sense to use base-8. By contrast, Modula-2 was first developed on a PDP-11 (16-bit architecture) where it made no sense to use base-8. The adoption of base-8 literals in
Modula-2 comes down to inertia.

Of course Wirth later omitted base-8 in Oberon. Thirty years on, it is about time we removed base-8 from all languages. In our day and age it has no practical use whatsoever. It has become a ridiculous artefact.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Gretchiie
  Mon May 5 05:52:10 2025
  from Derry, Nh via Telnet
- Centurion
  Mon May 5 01:26:55 2025
  from Berea, Ohio via Telnet
- Arden66
  Mon May 5 00:17:32 2025
  from Portland Oregon via Raw
- Fred Blogs
  Sun May 4 18:35:23 2025
  from Uk via SSH
- Keyop
  Sun May 4 18:35:11 2025
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 4 18:33:41 2025
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 4 18:32:56 2025
  from Huddersfield, West Yorkshire via SSH
- Plume
  Sun May 4 06:37:13 2025
  from Uk via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	468
Nodes:	16 (2 / 14)
Uptime:	19:34:30
Calls:	9,440
Calls today:	3
Files:	13,594
Messages:	6,109,829

PIM/ISO vs R10 number literals

Who's Online

Recent Visitors

System Info