• Experimental aggregate_by/4 was dismissed (Was: India & France had thei

    From Mild Shock@21:1/5 to Mild Shock on Sun Feb 16 12:25:50 2025
    I was exploring group_by/4 respectively a new
    aggregate_by/4 for some machine learning statistics.
    But the slowdown is not that aggravated if the extra
    parameter _H is ground. The cost is smaller

    factor then. So I had a change of mind in favor
    of more declarative aggregate/3. Too much new predicates
    isn't healthy so I dismissed the idea of supporting
    distinct/2, group_by/4 and a new aggregate_by/4.

    But somehow I fell in love with the idea of a new
    firstof/2 predicate, instead of distinct/2, it could
    be bootstrapped as follows:
    ```
    firstof(X, Q) :-
    bagof(X, Q, L),
    L = [X|_].
    ```
    Except it can be implemented like distinct/2 more
    eagerly fitting to some other predicates from
    the library(sequence):
    ```
    p(1,a).
    p(1,b).
    p(2,c).
    p(2,d).
    p(2,e).

    ?- firstof(Y,p(X,Y)).
    Y = a, X = 1;
    Y = c, X = 2;
    fail.
    ```

    Mild Shock schrieb:
    Just noticed that group_by/4 calculates variables
    and then delegates to bagof/3. But the later predicate
    calculates also varables, so I suspect quite an overhead:

    /* SWI-Prolog 9.3.19 */
    group_by(By, Template, Goal, Bag) :-
        ordered_term_variables(Goal, GVars),
        ordered_term_variables(By+Template, UVars),
        ord_subtract(GVars, UVars, ExVars),
        bagof(Template, ExVars^Goal, Bag).

    I went with another soluton. First I provided a variant
    of aggregate/3 by the name aggregate_by/4 where one can
    offload the internal term_variables/2 calculation.
    Then use this bootstrapping:

    /* Dogelog Player 1.3.0 */
    group_by(Witness, Template, Goal, List) :-
       aggregate_by(Witness, bag(Template), Goal, List).

    Here is some testing:

    /* SWI-Prolog 9.3.19 */
    ?- length(_H,4000), time((between(1,2000,_),
            group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
            fail; true)).
    % 1,153,998 inferences, 0.562 CPU in 0.568 seconds (99% CPU, 2051552 Lips) true.
    ?- length(_H,8000), time((between(1,2000,_),
            group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
            fail; true)).
    % 1,153,998 inferences, 1.047 CPU in 1.060 seconds (99% CPU, 1102326 Lips) true.

    /* Dogelog Player 1.3.0 */
    ?- length(_H,4000), time((between(1,2000,_),
            group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
            fail; true)).
    % Zeit 399 ms, GC 0 ms, Lips 16987636, Uhr 10.02.2025 10:49
    true.
    ?- length(_H,8000), time((between(1,2000,_),
            group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
            fail; true)).
    % Zeit 400 ms, GC 1 ms, Lips 16945167, Uhr 10.02.2025 10:50
    true.

    The old version suffers from some term_variables/2
    dependency whereas the new version is totally immune
    on the size of the given goal, since any internal
    term_variables/2 has been offloaded.

    I couldn’t name aggregate_by/4 as aggregate/4, since
    the later already exists in SWI-Prolog and SICStus Prolog
    and has a different semantics, it is not the analog of
    distinct/2, where one can specify Witnesses.

    Mild Shock schrieb:
    Hi,

    India & France had their AI Bikini Moment.
    Facinating behavior:

    Macron Says He And PM Modi Will Push
    https://www.youtube.com/watch?v=LwCK8yAnlkA

    But don't be fooled, things are possibly
    more connected:

    Synthesia: France's 109-billion-euro AI investment
    https://www.youtube.com/watch?v=_uyo4RG0Q6I

    Bye


    Mild Shock schrieb:
    Hi,

    Suddently I got an allergy to name a predicate
    distinct/2. It is not so obvious that distinct/1 and
    distinct/2 are related. There is no constant C such that:

    distinct(X) :- distinct(C, X).

    Just joking, but for some consistency with the introduction
    of group_by/4 and aggregate_by/4 I went for the
    name first_by/2. The name is more intuitive:

    ?- [user].
    p(1,a).
    p(1,b).
    p(2,c).
    p(2,d).
    p(2,e).
    ^Z
    true.

    Now some queries:

    ?- p(X,Y), write(X-Y), nl, fail; true.
    1-a
    1-b
    2-c
    2-d
    2-e
    true.

    ?- first_by(X, p(X,Y)), write(X-Y), nl, fail; true.
    1-a
    2-c
    true.

    Cool! The name is also used here with the same semantics:

    https://deephaven.io/core/docs/reference/table-operations/group-and-aggregate/firstBy/





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From olcott@21:1/5 to Mild Shock on Sun Feb 16 12:54:44 2025
    On 2/16/2025 5:25 AM, Mild Shock wrote:
    I was exploring group_by/4 respectively a new
    aggregate_by/4 for some machine learning statistics.
    But the slowdown is not that aggravated if the extra
    parameter _H is ground. The cost is smaller

    factor then. So I had a change of mind in favor
    of more declarative aggregate/3. Too much new predicates
    isn't healthy so I dismissed the idea of supporting
    distinct/2, group_by/4 and a new aggregate_by/4.

    But somehow I fell in love with the idea of a new
    firstof/2 predicate, instead of distinct/2, it could
    be bootstrapped as follows:
    ```
    firstof(X, Q) :-
       bagof(X, Q, L),
       L = [X|_].
    ```
    Except it can be implemented like distinct/2 more
    eagerly fitting to some other predicates from
    the library(sequence):
    ```
    p(1,a).
    p(1,b).
    p(2,c).
    p(2,d).
    p(2,e).

    ?- firstof(Y,p(X,Y)).
    Y = a, X = 1;
    Y = c, X = 2;
    fail.
    ```

    Mild Shock schrieb:
    Just noticed that group_by/4 calculates variables
    and then delegates to bagof/3. But the later predicate
    calculates also varables, so I suspect quite an overhead:

    /* SWI-Prolog 9.3.19 */
    group_by(By, Template, Goal, Bag) :-
         ordered_term_variables(Goal, GVars),
         ordered_term_variables(By+Template, UVars),
         ord_subtract(GVars, UVars, ExVars),
         bagof(Template, ExVars^Goal, Bag).

    I went with another soluton. First I provided a variant
    of aggregate/3 by the name aggregate_by/4 where one can
    offload the internal term_variables/2 calculation.
    Then use this bootstrapping:

    /* Dogelog Player 1.3.0 */
    group_by(Witness, Template, Goal, List) :-
        aggregate_by(Witness, bag(Template), Goal, List).

    Here is some testing:

    /* SWI-Prolog 9.3.19 */
    ?- length(_H,4000), time((between(1,2000,_),
             group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
             fail; true)).
    % 1,153,998 inferences, 0.562 CPU in 0.568 seconds (99% CPU, 2051552
    Lips)
    true.
    ?- length(_H,8000), time((between(1,2000,_),
             group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
             fail; true)).
    % 1,153,998 inferences, 1.047 CPU in 1.060 seconds (99% CPU, 1102326
    Lips)
    true.

    /* Dogelog Player 1.3.0 */
    ?- length(_H,4000), time((between(1,2000,_),
             group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
             fail; true)).
    % Zeit 399 ms, GC 0 ms, Lips 16987636, Uhr 10.02.2025 10:49
    true.
    ?- length(_H,8000), time((between(1,2000,_),
             group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
             fail; true)).
    % Zeit 400 ms, GC 1 ms, Lips 16945167, Uhr 10.02.2025 10:50
    true.

    The old version suffers from some term_variables/2
    dependency whereas the new version is totally immune
    on the size of the given goal, since any internal
    term_variables/2 has been offloaded.

    I couldn’t name aggregate_by/4 as aggregate/4, since
    the later already exists in SWI-Prolog and SICStus Prolog
    and has a different semantics, it is not the analog of
    distinct/2, where one can specify Witnesses.

    Mild Shock schrieb:
    Hi,

    India & France had their AI Bikini Moment.
    Facinating behavior:

    Macron Says He And PM Modi Will Push
    https://www.youtube.com/watch?v=LwCK8yAnlkA

    But don't be fooled, things are possibly
    more connected:

    Synthesia: France's 109-billion-euro AI investment
    https://www.youtube.com/watch?v=_uyo4RG0Q6I

    Bye


    Mild Shock schrieb:
    Hi,

    Suddently I got an allergy to name a predicate
    distinct/2. It is not so obvious that distinct/1 and
    distinct/2 are related. There is no constant C such that:

    distinct(X) :- distinct(C, X).

    Just joking, but for some consistency with the introduction
    of group_by/4 and aggregate_by/4 I went for the
    name first_by/2. The name is more intuitive:

    ?- [user].
    p(1,a).
    p(1,b).
    p(2,c).
    p(2,d).
    p(2,e).
    ^Z
    true.

    Now some queries:

    ?- p(X,Y), write(X-Y), nl, fail; true.
    1-a
    1-b
    2-c
    2-d
    2-e
    true.

    ?- first_by(X, p(X,Y)), write(X-Y), nl, fail; true.
    1-a
    2-c
    true.

    Cool! The name is also used here with the same semantics:

    https://deephaven.io/core/docs/reference/table-operations/group-and-
    aggregate/firstBy/




    test

    --
    Copyright 2025 Olcott

    "Talent hits a target no one else can hit;
    Genius hits a target no one else can see."
    Arthur Schopenhauer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)