• Proposal: Many Core Forth CPU for FPGA development.

    From Christopher Lozinski@21:1/5 to All on Sat Mar 11 23:07:45 2023
    FPGA’s are great for building parallel applications, there is a huge growth in the market for FPGA’s in the data centres, but they are a nightmare to develop for. A grid of Forth cores would be much more fun.

    I had no idea how bad it was, until I recently started my masters degree in Electrical Engineering, Digital Design in Katowice Poland. I thought Verily and VHDL looked easy, but I completely failed to understand how they work under the hood. One has to
    study EE to understand how they convert a design into digital circuitry (synthesis), how to write good code, and how to recognize disastrous code. They take forever to create a large design. The simulators do not always work correctly, so the simulator
    can say it works, but in reality it does not, and then debugging on a live FPGA changes the system. Successful projects have to spend at least twice as much on verification as on design. What a nightmare!

    I think it would be a lot more fun to develop prototypes on a grid of Forth CPUs. Changes could be tested instantly. Once the application was developed, optimizations could be made.

    Why not just use a multicore processor running 8th????
    Because then I cannot add in some custom logic blocks. Also Multicore processors are optimized for speed of each individual processor, and not for cooperation between processors. The guys who made linux run on multi-core had a very difficult time.
    There are caches that can become inconsistent, and messaging between cores is limited. In contrast with a grid of Forth cores on an FPGA, each gets its own memory area, and can talk to its neighbors. Much conceptually simpler. KISS.

    In an ideal world, each Forth CPU would have access to its own external memory, to maximize external memory bandwidth. But in the real world, an FPGA board has just one or two memory banks, so good to do something like vision processing, where one image
    frame gets loaded, and shared with all of the processors. Indeed the Lattice boards even have interfaces for digital cameras.

    So let us talk about a particular vision application. Vision processing in a moving vehicle. Or even when the camera is being rotated. The vision processing develops a model of the system being observed. Here are the straight edges, here are the
    flat surfaces, here are the curved edges. As the image moves, the model needs to be handed over to the adjacent processor to be updated. At least to calculate velocities. There is a lot of communication between adjacent nodes. So hard to do on a
    multicore cpu.

    Next question: Which Forth core should I use?

    There are so many Forth FPGA cpus, and so many versions of Forth, it is hard to choose which one to use. Many Forth CPU’s are optimized to fit on an FPGA, many forth virtual machines are optimized to run on the C programming language, or on an Intel
    CPU. The one person who worked equally on both FPGA’s and C compilers was Dr. Ting. I am sorry that he is now gone. But I believe that he optimized the eForth language and CPU to both work best together. So I think that the EP 32/24/16 with eForth
    is the right way to go. There is a large stack of eForth software out there. He also did a brilliant job documenting his designs. Makes it very accessible for the beginner. For vision with 3x8 bit color channels, the EP24 makes sense.

    Let me make a few more observations. It is very difficult for a single Forth CPU to compete against commercial mainstream register machines. On the other hand, lots of small Forth cores should be able to outperform a single RISC-V cpu on
    parallelizable applications. In the Forth Days 2022 video, Dan Golding reports that their Forth CPU is 1/10 the size of a RISC-V cpu. I suspect that it is more than 1/10th as fast. So potentially 10 Forth cores or even 5 could outperform a single
    RISC core.

    Clearly a dedicated FPGA application will outperform anything we can build on a grid of Forth CPU’s but I also believe that a lot of small developers will be able to release many innovative FPGA applications quickly, and then if the market is there,
    potentially migrate them to a more optimised implementation.

    I was very discouraged that Dr. Ting passed away before I had a chance to work with him. I am very mindful that at the recent Forth2020 video conference there was a lot of grey hair. I hope that we can find a project that grows this community, before
    all of the expertise disengages.

    And of course the dream is to build up a large enough community around such a platform, that it eventually gets built in silicon.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jurgen Pitaske@21:1/5 to Christopher Lozinski on Sun Mar 12 00:47:11 2023
    On Sunday, 12 March 2023 at 07:07:47 UTC, Christopher Lozinski wrote:
    FPGA’s are great for building parallel applications, there is a huge growth in the market for FPGA’s in the data centres, but they are a nightmare to develop for. A grid of Forth cores would be much more fun.

    I had no idea how bad it was, until I recently started my masters degree in Electrical Engineering, Digital Design in Katowice Poland. I thought Verily and VHDL looked easy, but I completely failed to understand how they work under the hood. One has to
    study EE to understand how they convert a design into digital circuitry (synthesis), how to write good code, and how to recognize disastrous code. They take forever to create a large design. The simulators do not always work correctly, so the simulator
    can say it works, but in reality it does not, and then debugging on a live FPGA changes the system. Successful projects have to spend at least twice as much on verification as on design. What a nightmare!

    I think it would be a lot more fun to develop prototypes on a grid of Forth CPUs. Changes could be tested instantly. Once the application was developed, optimizations could be made.

    Why not just use a multicore processor running 8th????
    Because then I cannot add in some custom logic blocks. Also Multicore processors are optimized for speed of each individual processor, and not for cooperation between processors. The guys who made linux run on multi-core had a very difficult time.
    There are caches that can become inconsistent, and messaging between cores is limited. In contrast with a grid of Forth cores on an FPGA, each gets its own memory area, and can talk to its neighbors. Much conceptually simpler. KISS.

    In an ideal world, each Forth CPU would have access to its own external memory, to maximize external memory bandwidth. But in the real world, an FPGA board has just one or two memory banks, so good to do something like vision processing, where one
    image frame gets loaded, and shared with all of the processors. Indeed the Lattice boards even have interfaces for digital cameras.

    So let us talk about a particular vision application. Vision processing in a moving vehicle. Or even when the camera is being rotated. The vision processing develops a model of the system being observed. Here are the straight edges, here are the flat
    surfaces, here are the curved edges. As the image moves, the model needs to be handed over to the adjacent processor to be updated. At least to calculate velocities. There is a lot of communication between adjacent nodes. So hard to do on a multicore cpu.


    Next question: Which Forth core should I use?

    There are so many Forth FPGA cpus, and so many versions of Forth, it is hard to choose which one to use. Many Forth CPU’s are optimized to fit on an FPGA, many forth virtual machines are optimized to run on the C programming language, or on an Intel
    CPU. The one person who worked equally on both FPGA’s and C compilers was Dr. Ting. I am sorry that he is now gone. But I believe that he optimized the eForth language and CPU to both work best together. So I think that the EP 32/24/16 with eForth is
    the right way to go. There is a large stack of eForth software out there. He also did a brilliant job documenting his designs. Makes it very accessible for the beginner. For vision with 3x8 bit color channels, the EP24 makes sense.

    Let me make a few more observations. It is very difficult for a single Forth CPU to compete against commercial mainstream register machines. On the other hand, lots of small Forth cores should be able to outperform a single RISC-V cpu on parallelizable
    applications. In the Forth Days 2022 video, Dan Golding reports that their Forth CPU is 1/10 the size of a RISC-V cpu. I suspect that it is more than 1/10th as fast. So potentially 10 Forth cores or even 5 could outperform a single RISC core.

    Clearly a dedicated FPGA application will outperform anything we can build on a grid of Forth CPU’s but I also believe that a lot of small developers will be able to release many innovative FPGA applications quickly, and then if the market is there,
    potentially migrate them to a more optimised implementation.

    I was very discouraged that Dr. Ting passed away before I had a chance to work with him. I am very mindful that at the recent Forth2020 video conference there was a lot of grey hair. I hope that we can find a project that grows this community, before
    all of the expertise disengages.

    And of course the dream is to build up a large enough community around such a platform, that it eventually gets built in silicon.

    There is one project I saw that might fit in https://www.facebook.com/groups/1304548976637542/?multi_permalinks=1639420426483727&notif_id=1678535445817102&notif_t=group_activity&ref=notif

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher Lozinski@21:1/5 to Jurgen Pitaske on Sun Mar 12 03:24:45 2023
    Jurgen Pitaske wrote:
    There is one project I saw that might fit in https://www.facebook.com/groups/1304548976637542/?multi_permalinks=1639420426483727&notif_id=1678535445817102&notif_t=group_activity&ref=notif

    I am of course quite aware of them, and in touch with one of the authors.

    So how does this differ from what they are doing?
    Officially they are more focused on building a tablet to make Forth accessible to more people. I am more interested in something like the GA144, a large number of tiny stack machines working together to solve computationally intensive problems.
    They are Forth advocates. I am less interested in Forth itself, and more interested in advocating for lots of tiny stack processors versus fewer larger register machines . Of course the stack processors will be running Forth, but they could conceivably
    run something else.


    They have a primary processor, maybe some more for an AI application. I am more interested
    in very parallel applications divided among cores.
    They are building a new processor, maybe a new Forth. I want to use existing cores, with an existing eForth user base.
    They are in Verilog, EP24 is in VHDL.

    and of course.
    Their project exists, this is just vapor ware.

    Does that explain the difference?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Christopher Lozinski on Sun Mar 12 04:02:10 2023
    On Sunday, March 12, 2023 at 11:24:47 AM UTC+1, Christopher Lozinski wrote: [..]
    So how does this differ from what they are doing?
    Officially they are more focused on building a tablet to make Forth accessible to
    more people. I am more interested in something like the GA144, a large number of tiny stack machines working together to solve computationally intensive problems.
    They are Forth advocates. I am less interested in Forth itself, and more interested in
    advocating for lots of tiny stack processors versus fewer larger register machines.
    Of course the stack processors will be running Forth, but they could conceivably run
    something else.
    They have a primary processor, maybe some more for an AI application. I am more
    interested in very parallel applications divided among cores.
    They are building a new processor, maybe a new Forth. I want to use existing cores,
    with an existing eForth user base.
    They are in Verilog, EP24 is in VHDL.

    [..]

    Epiphany multicore chips from Adapteva. The Parallella dev. board can be had for 200$
    or so. ( https://www.digikey.nl/en/products/filter/evaluation-boards-embedded-mcu-dsp/786?s=N4IgTCBcDaIA4EMBOCA2qCm6EgLoF8g )

    I think "stack processor" is a red herring. As any optimizing Forth compiler will show you,
    stacks can be emulated on a register machine.

    The important insight is that Forth may have decisive advantages on parallel hardware.
    You will realize that when you have to debug a parallel program on parallel, networked,
    hardware.

    I almost bought the Epihany boards but found an idiotic excuse: it needs a weird power
    supply :--) I have a few other Forth projects that I really want to do first.

    Remember, the only projects worth doing are the ones that experts tell you
    are an impossible dead end. Let me tell you, eForth is not suitable.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to calozinski@gmail.com on Sun Mar 12 13:45:24 2023
    In article <395ef5de-14b9-46fd-86cd-613c9fd3b66an@googlegroups.com>, Christopher Lozinski <calozinski@gmail.com> wrote:

    Marcel Hendrix wrote:

    Epiphany multicore chips from Adapteva. . The Parallella dev. board can be had for 200$
    or so. ( https://www.digikey.nl/en/products/filter/evaluation-boards-embedded-mcu-dsp/786?s=N4IgTCBcDaIA4EMBOCA2qCm6EgLoF8g )


    Hugely interesting. A friend wrote Lisp for the device. I thought Adapteva closed down. And now they have been
    bought, by a company in stealth mode, and the boards are once again available. HUGELY tempting.

    I think "stack processor" is a red herring.
    The core 1 team reports that their stack processor is 1/10th the size of a Risc-v core. Meaning that i can fit a lot more
    of them on a device. And a forth instruction executes in a single clock cycle. So why is it a red herring?

    Let me tell you, eForth is not suitable.
    I would love to know why. I am still learning. The Core 1 guys said th ep16 did not have an I/O op code. They started
    out porting the EP16 to ALtera, then ditched the design. I still have not figured out why. I wonder if Dr. Ting was an
    electrical engineer who knew what he was doing or not. Not at all intuitive how device synthesis works.


    You misunderstood. eForth is not suitable -> the experts saying don't do it -> that is the project you should embark in.

    (Unless I misunderstood.)
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher Lozinski@21:1/5 to Marcel Hendrix on Sun Mar 12 05:19:33 2023
    Marcel Hendrix wrote:

    Epiphany multicore chips from Adapteva. . The Parallella dev. board can be had for 200$
    or so. ( https://www.digikey.nl/en/products/filter/evaluation-boards-embedded-mcu-dsp/786?s=N4IgTCBcDaIA4EMBOCA2qCm6EgLoF8g )


    Hugely interesting. A friend wrote Lisp for the device. I thought Adapteva closed down. And now they have been bought, by a company in stealth mode, and the boards are once again available. HUGELY tempting.

    I think "stack processor" is a red herring.
    The core 1 team reports that their stack processor is 1/10th the size of a Risc-v core. Meaning that i can fit a lot more of them on a device. And a forth instruction executes in a single clock cycle. So why is it a red herring?

    Let me tell you, eForth is not suitable.
    I would love to know why. I am still learning. The Core 1 guys said th ep16 did not have an I/O op code. They started out porting the EP16 to ALtera, then ditched the design. I still have not figured out why. I wonder if Dr. Ting was an
    electrical engineer who knew what he was doing or not. Not at all intuitive how device synthesis works.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Sun Mar 12 06:27:10 2023
    On Sunday, March 12, 2023 at 1:45:27 PM UTC+1, none albert wrote:
    [..]
    You misunderstood. eForth is not suitable -> the experts saying don't do it ->
    that is the project you should embark in.

    (Unless I misunderstood.)

    Also remember the way ants build a bridge, and how to become
    a millionaire.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Marcel Hendrix on Mon Mar 13 20:42:05 2023
    On 13/03/2023 12:27 am, Marcel Hendrix wrote:
    On Sunday, March 12, 2023 at 1:45:27 PM UTC+1, none albert wrote:
    [..]
    You misunderstood. eForth is not suitable -> the experts saying don't do it ->
    that is the project you should embark in.

    (Unless I misunderstood.)

    Also remember the way ants build a bridge, and how to become
    a millionaire.

    An expert is one who has learnt how to get others to take all the risk.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jurgen Pitaske@21:1/5 to dxforth on Mon Mar 13 03:33:48 2023
    On Monday, 13 March 2023 at 09:42:07 UTC, dxforth wrote:
    On 13/03/2023 12:27 am, Marcel Hendrix wrote:
    On Sunday, March 12, 2023 at 1:45:27 PM UTC+1, none albert wrote:
    [..]
    You misunderstood. eForth is not suitable -> the experts saying don't do it ->
    that is the project you should embark in.

    (Unless I misunderstood.)

    Also remember the way ants build a bridge, and how to become
    a millionaire.

    An expert is one who has learnt how to get others to take all the risk.

    This might be your approach in your business.

    I am in our business more used to the fact,
    that the expert understands the issues,
    puts them all on the table for the ones involved,
    and helps with expert advice to move the project forward.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Jurgen Pitaske on Mon Mar 13 10:46:00 2023
    Jurgen Pitaske <jpitaske@gmail.com> writes:
    On Monday, 13 March 2023 at 09:42:07 UTC, dxforth wrote:
    An expert is one who has learnt how to get others to take all the risk.

    This might be your approach in your business.

    I am in our business more used to the fact,
    that the expert understands the issues,=20
    puts them all on the table for the ones involved,
    and helps with expert advice to move the project forward.

    The sentence from dxforth appeared to be a typical dxforthisms,
    although this appeared particularly nonsensical even among
    dxforthisms. Your explanation made clear to me what I had missed,
    resulting in a dxforthism with the usual portion of nonsense. What I
    had missed was that dxforth probably had a setting in mind where a
    decision maker seeks the advice of one or more experts to help him decide.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Anton Ertl on Mon Mar 13 23:02:46 2023
    On 13/03/2023 9:46 pm, Anton Ertl wrote:
    Jurgen Pitaske <jpitaske@gmail.com> writes:
    On Monday, 13 March 2023 at 09:42:07 UTC, dxforth wrote:
    An expert is one who has learnt how to get others to take all the risk.

    This might be your approach in your business.

    I am in our business more used to the fact,
    that the expert understands the issues,=20
    puts them all on the table for the ones involved,
    and helps with expert advice to move the project forward.

    The sentence from dxforth appeared to be a typical dxforthisms,
    although this appeared particularly nonsensical even among

    We are sensitive today. As you made no comment upon them, can
    we assume you took seriously the comments to which I responded?

    On 13/03/2023 12:27 am, Marcel Hendrix wrote:
    On Sunday, March 12, 2023 at 1:45:27 PM UTC+1, none albert wrote:
    [..]
    You misunderstood. eForth is not suitable -> the experts saying don't do it ->
    that is the project you should embark in.

    (Unless I misunderstood.)

    Also remember the way ants build a bridge, and how to become
    a millionaire.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Mon Mar 13 14:36:56 2023
    Hi Christopher,

    before you embark on the huge challenge, maybe try small single core designs to dive in and get a feeling for FPGAs!

    You can try Mecrisp-Ice (custom stack processor) or Mecrisp-Quintus (RISC-V processor), and you should really have a look at the tutorial by Bruno Levy that shows you the path from a simple blinky to a RISC-V processor design.

    https://github.com/brunolevy/learn-fpga
    https://mecrisp.sourceforge.net/

    Matthias

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher Lozinski@21:1/5 to Matthias Koch on Thu Mar 16 13:40:24 2023
    On Monday, March 13, 2023 at 2:36:59 PM UTC+1, Matthias Koch wrote:


    You can try Mecrisp-Ice (custom stack processor)
    That is a great idea.
    i am getting educated about these issues. The EP16/24/32 seem to have two issues.
    The stack uses a very large shift register, instead of some memory.
    It uses latches for registers, advised against in my class.
    Both issues are red flags for me.

    There must be a reason the J1 CPU is so popular. I need to read more about it.
    I guess I am in the process of exploring these different stack machines.

    The work on the CORE-1 is also very interesting. I particularly like how they do not use a cross compiler, they just put the Forth code in a memory region to start off with.

    Thank you
    Chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lorem Ipsum@21:1/5 to Christopher Lozinski on Thu Mar 16 15:55:24 2023
    On Thursday, March 16, 2023 at 4:40:26 PM UTC-4, Christopher Lozinski wrote:
    On Monday, March 13, 2023 at 2:36:59 PM UTC+1, Matthias Koch wrote:


    You can try Mecrisp-Ice (custom stack processor)
    That is a great idea.
    i am getting educated about these issues. The EP16/24/32 seem to have two issues.
    The stack uses a very large shift register, instead of some memory.
    It uses latches for registers, advised against in my class.
    Both issues are red flags for me.

    I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.

    sync: process(clk,clr)
    begin
    if clr='1' then -- master reset
    inten <='0';
    slot <= 0;
    sp <= "00000";
    sp1 <= "00001";
    rp <= "00000";
    rp1 <= "00001";
    t <= (others => '0');
    r <= (others => '0');
    a <= (others => '0');
    p <= (others => '0');
    i <= (others => '0');
    for ii in s_stack'range loop
    s_stack(ii) <= (others => '0');
    r_stack(ii) <= (others => '0');
    end loop;
    elsif (clk'event and clk='1') then
    if reset='1' or slot=3 then
    slot <= 0;
    else slot <= slot+1;
    end if;
    if intload='1' then
    inten <= intset;
    end if;
    if iload='1' then
    i <= data_i(width-1 downto 0);
    end if;
    if pload='1' then
    p <= p_in;
    end if;
    if tload='1' then
    t <= t_in;
    end if;
    if rload='1' then
    r <= r_in;
    end if;
    if aload='1' then
    a <= a_in;
    end if;
    if spush='1' then
    s_stack(conv_integer(sp1)) <= t;
    sp <= sp+1;
    sp1 <= sp1+1;
    elsif spopp='1' then
    sp <= sp-1;
    sp1 <= sp1-1;
    end if;
    if rpush='1' then
    r_stack(conv_integer(rp1)) <= r;
    rp <= rp+1;
    rp1 <= rp1+1;
    elsif rpopp='1' then
    rp <= rp-1;
    rp1 <= rp1-1;
    end if;
    end if;
    end process sync;

    Google Groups will strip the multiple spaces, so it will be a bit hard to read. I'll present the relevant bits as I talk about them.

    sync: process(clk,clr)
    begin
    if clr='1' then -- master reset
    ...
    elsif (clk'event and clk='1') then
    ...

    This is the part that indicates the registers are registers, and not latches. I expect you read the document ep16inVHDL.pdf and saw all the uses of the verb latch. He is speaking generally, without distinguishing between registers or latches. He is
    not saying he used latches over clocked registers. Latches are hard to find in FPGAs. They have to be constructed from combinational logic. Virtually no one uses them.

    Here are the bits that show the stack is a memory, not a shift register. The memories are addressed by sp1 and rp1. I didn't dig into the code to tell why the code has sp1 and sp. Maybe it is for reading the stack. The code assigns the s_stack
    contents to 's', which is a terrible variable name as it is so hard to search on, so I didn't bother. There's also no comments to explain any of the variables used.

    if spush='1' then
    s_stack(conv_integer(sp1)) <= t;
    sp <= sp+1;
    sp1 <= sp1+1;
    ...
    end if;
    if rpush='1' then
    r_stack(conv_integer(rp1)) <= r;
    rp <= rp+1;
    rp1 <= rp1+1;
    ...
    end if;

    If it were a shift register, it would operate without an address, only having access to the top items on the stack through ports.


    There must be a reason the J1 CPU is so popular. I need to read more about it.
    I guess I am in the process of exploring these different stack machines.

    It is small, both in lines of code and in size in the FPGA. It's also pretty well documented.


    The work on the CORE-1 is also very interesting. I particularly like how they do not use a cross compiler, they just put the Forth code in a memory region to start off with.

    That is standard operating procedure for many MCUs and Forths. Cross compilers are messy and a PITA.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher Lozinski@21:1/5 to Lorem Ipsum on Thu Mar 16 22:09:49 2023
    On Thursday, March 16, 2023 at 11:55:26 PM UTC+1, Lorem Ipsum wrote:
    The EP16/24/32 seem to have two issues.
    The stack uses a very large shift register, instead of some memory.
    It uses latches for registers, advised against in my class.

    I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.

    Thank you for correcting me. I got it not from the code, but from the documentation.

    "In this design, the CPU latches all data into appropriate registers and stacks on the rising edge of a single phase master clock. Such a synchronous design ensures that all instructions are executed quickly and reliably in a single clock cycle. When
    the master clock is held steady, the microprocessor retains all data in registers, stacks and memory, consuming very little power. It is thus possible to further reduce its power consumption by reducing the clock rate, or stopping the clock completely.
    ."

    I can't find the line where it said that the stacks are implemented as shift registers.

    "Read the code" is good advice.

    You can try Mecrisp-Ice (custom stack processor)
    I did take a look at the Mecrisp-Ice. It runs on the J1a cpu.
    In the past I did look at the J1 CPU. I looked again. It now has 32 bit version, and Python simulators, and verilator to C++ compiler, and even a VHDL version. But it has two problems, one specific, one abstract. If I recall correctly, the specific
    problem, is that it has no interrupts. The abstract problem is that it was designed to squeeze into a tiny space for a commercial app.

    I am less interested in building an end user application, and more interested in building a wonderful development environment. A grid of Forth cpu's talking to each other, maybe even interrupting each other. So i think interrupts are critical.

    In other news, I have been learning Verilog, and am most unhappy with it.

    Here is a large list of Verilog Gotcha's. https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf

    From the "Zen of Python"
    Explicit is better than implicit.

    I am not a full time EE, so an explicit language makes much more sense to me. I think I much prefer VHDL. Of course the grass is always greener on the other side of the fence.

    Thank you everyone for all of the great advice. I feel like you care about these issues. Out of caution, I do not even talk about it at school.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lorem Ipsum@21:1/5 to Christopher Lozinski on Thu Mar 16 22:47:47 2023
    On Friday, March 17, 2023 at 1:09:51 AM UTC-4, Christopher Lozinski wrote:
    On Thursday, March 16, 2023 at 11:55:26 PM UTC+1, Lorem Ipsum wrote:
    The EP16/24/32 seem to have two issues.
    The stack uses a very large shift register, instead of some memory.
    It uses latches for registers, advised against in my class.
    I don't find any evidence of either of these being true. I can't find anything that indicates the stacks are shift registers. Looking at ep16.vhd, here is the code for the entire CPU register set.
    Thank you for correcting me. I got it not from the code, but from the documentation.

    "In this design, the CPU latches all data into appropriate registers and stacks on the rising edge of a single phase master clock. Such a synchronous design ensures that all instructions are executed quickly and reliably in a single clock cycle. When
    the master clock is held steady, the microprocessor retains all data in registers, stacks and memory, consuming very little power. It is thus possible to further reduce its power consumption by reducing the clock rate, or stopping the clock completely.
    ."

    I can't find the line where it said that the stacks are implemented as shift registers.

    "The T register connects parameter stack and return stack as a giant shift register.
    Data can be shifted towards the return stack by a PUSH instruction, and shifted towards the parameter stack by a POP instruction."

    I suspect this is what you read. He is simply describing how the >R and R> instructions (which he calls PUSH and POP) move data back and forth. When you move a data value, it causes the rest of the data stack to move data in the same direction as well.

    But even if he was using a shift register, why would that matter?


    "Read the code" is good advice.
    You can try Mecrisp-Ice (custom stack processor)

    I think you munged the attributions. This was from Matthias, not me.


    I did take a look at the Mecrisp-Ice. It runs on the J1a cpu.
    In the past I did look at the J1 CPU. I looked again. It now has 32 bit version, and Python simulators, and verilator to C++ compiler, and even a VHDL version. But it has two problems, one specific, one abstract. If I recall correctly, the specific
    problem, is that it has no interrupts. The abstract problem is that it was designed to squeeze into a tiny space for a commercial app.

    A CPU in an FPGA often does not need interrupts. My CPU design has an interrupt. I tried to optimize my design for ease of calculating speeds, so every instruction is one clock cycle, including the interrupt. It is just a forced call instruction that
    also pushes the processor status word onto the data stack. It allows very fast servicing of interrupts for hard real time apps.


    I am less interested in building an end user application, and more interested in building a wonderful development environment. A grid of Forth cpu's talking to each other, maybe even interrupting each other. So i think interrupts are critical.

    Chuck Moore would not agree with you. The GA144 has no interrupts. It has processors that are dedicated to tasks, so they stop and wait for data or even just synchronization.


    In other news, I have been learning Verilog, and am most unhappy with it.

    Here is a large list of Verilog Gotcha's. https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf

    From the "Zen of Python"
    Explicit is better than implicit.

    Not sure what that means, other than I guess you are not happy with the fact that Verilog has many assumptions that you can override... if you know they are there. I've never learned Verilog, because I've never found a book that explains all the gotchas.



    I am not a full time EE, so an explicit language makes much more sense to me.
    I think I much prefer VHDL. Of course the grass is always greener on the other side of the fence.

    VHDL is *very* wordy. It has relaxed a bit in the last decade or two with many new features that make life easy. I noticed the EP16 code uses (clk'event and clk='1') rather than (rising_edge(clk)). Very old school. It is deprecated, because it can
    trigger on odd events, such as a transition from HIGH to '1'. But it works normally. There are still gotchas. VHDL uses delta delays (think infinitesimal time intervals) to order events that happen when the clock has not advanced. Without that, you
    can have several FFs toggle on the same clock edge, with some being updated and feeding into others that will be evaluated next. If you run a clock through a buffer, it adds a delta delay, assuring the downstream FFs will receive data updated on the
    previous delta cycle. Opps! So watch out for any assignments (buffers) in clock paths.


    Thank you everyone for all of the great advice. I feel like you care about these issues. Out of caution, I do not even talk about it at school.

    LOL I was in a presentation from a vendor for a new ARM MCU (back when not everyone was selling ARM MCUs). They asked about our applications and I mentioned Forth. The presenter actually laughed and said he didn't think anyone still used it! I just
    find it more usable than dealing with the tools of C or other HLLs.

    I use VHDL, but I guess I'm used to that. Writing code for hardware design is not like writing software... exactly. I think in terms of what I want built, rather than only what it does. VHDL has code that is executed sequentially, but mostly it's
    concurrent... in parallel. That's the part people have trouble getting used to when they switch from software.

    There are many who play/work in Forth because they like it. I guess that's why you are using it for your project. You just like it.

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Coombs@21:1/5 to Christopher Lozinski on Fri Mar 17 18:18:19 2023
    On Thu, 16 Mar 2023 22:09:49 -0700 (PDT)
    Christopher Lozinski <calozinski@gmail.com> wrote:

    In other news, I have been learning Verilog, and am most unhappy with it.

    Here is a large list of Verilog Gotcha's. https://lcdm-eng.com/papers/snug06_Verilog%20Gotchas%20Part1.pdf

    I have a few stack processors translated into MyHDL[1] if you'd like
    another starting point. This Python library can simulate as fast[2]
    as other free simulators, export wave traces to aid debug, and export
    Verilog and VHDL for hardware synthesis.

    One of the reasons for it's existence is to avoid the gotcha's and/or
    verbosity of other development environments[3][4].

    Jan Coombs
    --

    [1] MyHDL - From Python to Silicon!
    https://myhdl.org/

    [2] Performance
    https://myhdl.org/docs/performance.html

    [3] Why MyHDL?
    https://myhdl.org/start/why.html

    [4] What MyHDL is not
    https://myhdl.org/start/whatitisnot.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Sun Mar 19 15:42:53 2023
    Keep in mind that while Mecrisp-Ice (which comes in 16, 32 and 64 bits) is a direct descendant of Swapforth/J1 by James Bowman, I continued development, and it certainly has interrupts. There are also different variants of the processor, and you can use
    either shift registers or BRAMs for the stacks. The shift register stacks help when one is short on RAM blocks, and also allow higher maximum frequency in some cases.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lorem Ipsum@21:1/5 to Matthias Koch on Sun Mar 19 11:04:57 2023
    On Sunday, March 19, 2023 at 10:42:56 AM UTC-4, Matthias Koch wrote:
    Keep in mind that while Mecrisp-Ice (which comes in 16, 32 and 64 bits) is a direct descendant of Swapforth/J1 by James Bowman, I continued development, and it certainly has interrupts. There are also different variants of the processor, and you can
    use either shift registers or BRAMs for the stacks. The shift register stacks help when one is short on RAM blocks, and also allow higher maximum frequency in some cases.

    When you say "shift registers", what feature in the FPGA are you referring to? Some other brands of FPGAs can use the LUT as a (memory based) shift register, by adding a counter to point to the input and output. At least I think that's what they do.
    It's been a long time since I've looked at Xilinx devices. Others have this as well, but I'm not sure who. Maybe the Lattice (non-iCE) parts. But the iCE40 devices do *not* use the LUTs as memory or shift registers. I see their "General Purpose FPGAs"
    have "distributed" RAM, which is using the LUTs. But I see no mention of shift registers.

    So, are you using the fabric FFs for the stack memory, if not the BRAMs?

    In devices that can use the LUTs as RAM, it's not a bad fit to the stacks. A 4LUT gives you a 16 deep stack and a 6LUT gives you a 64 deep stack. I think the 6LUT can be used as 32 x 2 bits, but don't quote me on that.

    Using the features of a given line of parts can save on resource utilization, but ties the design to that architecture. For some, that's not a problem.

    --

    Rick C.

    -- Get 1,000 miles of free Supercharging
    -- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Tue Mar 21 06:06:15 2023
    When you say "shift registers", what feature in the FPGA are you referring to?

    The normal flipflops available in fabric.

    But the iCE40 devices do *not* use the LUTs as memory or shift registers. I see their "General Purpose FPGAs" have "distributed" RAM, which is using the LUTs. But I see no mention of shift registers.

    True.

    So, are you using the fabric FFs for the stack memory, if not the BRAMs?

    Fabric FFs.

    Using the features of a given line of parts can save on resource utilization, but ties the design to that architecture. For some, that's not a problem.

    Fabric FFs as shift register stacks are portable and run on Lattice iCE40, ECP5 and on Altera/Intel MAX10, the families I have used so far. The LUTs are not used, as Lattice iCE40 does not support the "lutram" design.

    You correctly point out that using fabric FFs as memory is a waste of resources; but the smallest targets of Mecrisp-Ice come with very limited BRAM. HX1K offers 8 kb of dualport BRAM, UP5K offers 15 kb BRAM (plus 128 kb in total over four fixed-
    configuration single port memories), and HX8K offers 16 kb BRAM.

    In these configurations I decided to give Forth the full amount of dualport BRAM memory available, and use 16 elements * 16 bits * 2 stacks = 512 LUT+FF for stack elements in the smallest HX1K, and 32*16*2=1024 LUT+FF for UP5K and HX8K.

    The ports to larger FPGAs came later in time, and for these, you decide if you like shift registers, or want to use BRAMs as stacks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lorem Ipsum@21:1/5 to Matthias Koch on Mon Mar 20 22:52:00 2023
    On Tuesday, March 21, 2023 at 1:06:19 AM UTC-4, Matthias Koch wrote:
    When you say "shift registers", what feature in the FPGA are you referring to?
    The normal flipflops available in fabric.
    But the iCE40 devices do *not* use the LUTs as memory or shift registers. I see their "General Purpose FPGAs" have "distributed" RAM, which is using the LUTs. But I see no mention of shift registers.
    True.
    So, are you using the fabric FFs for the stack memory, if not the BRAMs?
    Fabric FFs.
    Using the features of a given line of parts can save on resource utilization, but ties the design to that architecture. For some, that's not a problem.
    Fabric FFs as shift register stacks are portable and run on Lattice iCE40, ECP5 and on Altera/Intel MAX10, the families I have used so far. The LUTs are not used, as Lattice iCE40 does not support the "lutram" design.

    You correctly point out that using fabric FFs as memory is a waste of resources; but the smallest targets of Mecrisp-Ice come with very limited BRAM. HX1K offers 8 kb of dualport BRAM, UP5K offers 15 kb BRAM (plus 128 kb in total over four fixed-
    configuration single port memories), and HX8K offers 16 kb BRAM.

    In these configurations I decided to give Forth the full amount of dualport BRAM memory available, and use 16 elements * 16 bits * 2 stacks = 512 LUT+FF for stack elements in the smallest HX1K, and 32*16*2=1024 LUT+FF for UP5K and HX8K.

    The ports to larger FPGAs came later in time, and for these, you decide if you like shift registers, or want to use BRAMs as stacks.

    Still, when you say shift registers, you literally turn the FFs into shift registers? Yeah, I guess that's what a stack is when it comes down to it. It shifts in two directions, so each stack word needs a 2:1 mux on the input, and Bob's your uncle.
    Each FF has a LUT to make up the mux, so it fits well.

    I don't recall ever actually using an iCE40 part in a design that came to fruition. I looked at them a lot, but mostly for their very low power. In the end, the design was more power than other ways of solving it, mostly MCUs. They can be very low
    power when not clocked. The iCE65 line from SiliconBlue was low double digit uA idle current, but the iCE40 line ended up being more like 100 uA. I don't know of SiBlue was being overly aggressive, or if Lattice decided it was not an important feature,
    and easier to produce specified at a higher number.

    I believe someone told me there are more recent members of the iCE40 lines that are in the lower double digit uA again.

    Have you looked at the Gowin parts? They are real and shipping in many varieties. Word is they are essentially a spin off from Lattice, but there's only so much similarity. The docs do suffer a bit from the language barrier. I was going to use one of
    their parts in a design I'm doing now, but my client sells a lot to the US Government and Gowin had been on a list by the US Military considered to be too close to the Chinese Military. So I'm planning to use an Efinix part.

    Once I get the design done, I may spend some time rolling a stack processor for this FPGA. But I might also be so burnt out that I swear off electronics forever!

    --

    Rick C.

    -+ Get 1,000 miles of free Supercharging
    -+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Tue Mar 21 07:04:40 2023
    Still, when you say shift registers, you literally turn the FFs into shift registers? Yeah, I guess that's what a stack is when it comes down to it. It shifts in two directions, so each stack word needs a 2:1 mux on the input, and Bob's your uncle.
    Each FF has a LUT to make up the mux, so it fits well.

    Exactly, yes. On some targets I use a 3:1 MUX for the data stack instead to drop two elements at once, useful for example in store opcode.

    Have you looked at the Gowin parts?

    Might be interesting in future, I am following the progress of Project Apicula: https://github.com/YosysHQ/apicula

    But I might also be so burnt out that I swear off electronics forever!

    I am sorry to read that, and I wish you the best!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)