Forum: >>> Magnum BBS <<<

Experimental aggregate_by/4 was dismissed (Was: India & France had thei

From Mild Shock@21:1/5 to Mild Shock on Sun Feb 16 12:25:50 2025

I was exploring group_by/4 respectively a new
aggregate_by/4 for some machine learning statistics.
But the slowdown is not that aggravated if the extra
parameter _H is ground. The cost is smaller

factor then. So I had a change of mind in favor
of more declarative aggregate/3. Too much new predicates
isn't healthy so I dismissed the idea of supporting
distinct/2, group_by/4 and a new aggregate_by/4.

But somehow I fell in love with the idea of a new
firstof/2 predicate, instead of distinct/2, it could
be bootstrapped as follows:
```
firstof(X, Q) :-
bagof(X, Q, L),
L = [X|_].
```
Except it can be implemented like distinct/2 more
eagerly fitting to some other predicates from
the library(sequence):
```
p(1,a).
p(1,b).
p(2,c).
p(2,d).
p(2,e).

?- firstof(Y,p(X,Y)).
Y = a, X = 1;
Y = c, X = 2;
fail.
```

Mild Shock schrieb:

Just noticed that group_by/4 calculates variables
and then delegates to bagof/3. But the later predicate
calculates also varables, so I suspect quite an overhead:

/* SWI-Prolog 9.3.19 */
group_by(By, Template, Goal, Bag) :-
    ordered_term_variables(Goal, GVars),
    ordered_term_variables(By+Template, UVars),
    ord_subtract(GVars, UVars, ExVars),
    bagof(Template, ExVars^Goal, Bag).

I went with another soluton. First I provided a variant
of aggregate/3 by the name aggregate_by/4 where one can
offload the internal term_variables/2 calculation.
Then use this bootstrapping:

/* Dogelog Player 1.3.0 */
group_by(Witness, Template, Goal, List) :-
   aggregate_by(Witness, bag(Template), Goal, List).

Here is some testing:

/* SWI-Prolog 9.3.19 */
?- length(_H,4000), time((between(1,2000,_),
        group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
        fail; true)).
% 1,153,998 inferences, 0.562 CPU in 0.568 seconds (99% CPU, 2051552 Lips) true.
?- length(_H,8000), time((between(1,2000,_),
        group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
        fail; true)).
% 1,153,998 inferences, 1.047 CPU in 1.060 seconds (99% CPU, 1102326 Lips) true.

/* Dogelog Player 1.3.0 */
?- length(_H,4000), time((between(1,2000,_),
        group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
        fail; true)).
% Zeit 399 ms, GC 0 ms, Lips 16987636, Uhr 10.02.2025 10:49
true.
?- length(_H,8000), time((between(1,2000,_),
        group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
        fail; true)).
% Zeit 400 ms, GC 1 ms, Lips 16945167, Uhr 10.02.2025 10:50
true.

The old version suffers from some term_variables/2
dependency whereas the new version is totally immune
on the size of the given goal, since any internal
term_variables/2 has been offloaded.

I couldn’t name aggregate_by/4 as aggregate/4, since
the later already exists in SWI-Prolog and SICStus Prolog
and has a different semantics, it is not the analog of
distinct/2, where one can specify Witnesses.

Mild Shock schrieb:

Hi,

India & France had their AI Bikini Moment.
Facinating behavior:

Macron Says He And PM Modi Will Push
https://www.youtube.com/watch?v=LwCK8yAnlkA

But don't be fooled, things are possibly
more connected:

Synthesia: France's 109-billion-euro AI investment
https://www.youtube.com/watch?v=_uyo4RG0Q6I

Bye

Mild Shock schrieb:

Hi,

Suddently I got an allergy to name a predicate
distinct/2. It is not so obvious that distinct/1 and
distinct/2 are related. There is no constant C such that:

distinct(X) :- distinct(C, X).

Just joking, but for some consistency with the introduction
of group_by/4 and aggregate_by/4 I went for the
name first_by/2. The name is more intuitive:

?- [user].
p(1,a).
p(1,b).
p(2,c).
p(2,d).
p(2,e).
^Z
true.

Now some queries:

?- p(X,Y), write(X-Y), nl, fail; true.
1-a
1-b
2-c
2-d
2-e
true.

?- first_by(X, p(X,Y)), write(X-Y), nl, fail; true.
1-a
2-c
true.

Cool! The name is also used here with the same semantics:

https://deephaven.io/core/docs/reference/table-operations/group-and-aggregate/firstBy/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From olcott@21:1/5 to Mild Shock on Sun Feb 16 12:54:44 2025

On 2/16/2025 5:25 AM, Mild Shock wrote:

I was exploring group_by/4 respectively a new
aggregate_by/4 for some machine learning statistics.
But the slowdown is not that aggravated if the extra
parameter _H is ground. The cost is smaller

factor then. So I had a change of mind in favor
of more declarative aggregate/3. Too much new predicates
isn't healthy so I dismissed the idea of supporting
distinct/2, group_by/4 and a new aggregate_by/4.

But somehow I fell in love with the idea of a new
firstof/2 predicate, instead of distinct/2, it could
be bootstrapped as follows:
```
firstof(X, Q) :-
   bagof(X, Q, L),
   L = [X|_].
```
Except it can be implemented like distinct/2 more
eagerly fitting to some other predicates from
the library(sequence):
```
p(1,a).
p(1,b).
p(2,c).
p(2,d).
p(2,e).

?- firstof(Y,p(X,Y)).
Y = a, X = 1;
Y = c, X = 2;
fail.
```

Mild Shock schrieb:

Just noticed that group_by/4 calculates variables
and then delegates to bagof/3. But the later predicate
calculates also varables, so I suspect quite an overhead:

/* SWI-Prolog 9.3.19 */
group_by(By, Template, Goal, Bag) :-
     ordered_term_variables(Goal, GVars),
     ordered_term_variables(By+Template, UVars),
     ord_subtract(GVars, UVars, ExVars),
     bagof(Template, ExVars^Goal, Bag).

I went with another soluton. First I provided a variant
of aggregate/3 by the name aggregate_by/4 where one can
offload the internal term_variables/2 calculation.
Then use this bootstrapping:

/* Dogelog Player 1.3.0 */
group_by(Witness, Template, Goal, List) :-
    aggregate_by(Witness, bag(Template), Goal, List).

Here is some testing:

/* SWI-Prolog 9.3.19 */
?- length(_H,4000), time((between(1,2000,_),
         group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
         fail; true)).
% 1,153,998 inferences, 0.562 CPU in 0.568 seconds (99% CPU, 2051552
Lips)
true.
?- length(_H,8000), time((between(1,2000,_),
         group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
         fail; true)).
% 1,153,998 inferences, 1.047 CPU in 1.060 seconds (99% CPU, 1102326
Lips)
true.

/* Dogelog Player 1.3.0 */
?- length(_H,4000), time((between(1,2000,_),
         group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
         fail; true)).
% Zeit 399 ms, GC 0 ms, Lips 16987636, Uhr 10.02.2025 10:49
true.
?- length(_H,8000), time((between(1,2000,_),
         group_by(X,Y,(nonvar(_H),between(1,10,Y),between(1,10,X)),L),
         fail; true)).
% Zeit 400 ms, GC 1 ms, Lips 16945167, Uhr 10.02.2025 10:50
true.

The old version suffers from some term_variables/2
dependency whereas the new version is totally immune
on the size of the given goal, since any internal
term_variables/2 has been offloaded.

I couldn’t name aggregate_by/4 as aggregate/4, since
the later already exists in SWI-Prolog and SICStus Prolog
and has a different semantics, it is not the analog of
distinct/2, where one can specify Witnesses.

Mild Shock schrieb:

Hi,

India & France had their AI Bikini Moment.
Facinating behavior:

Macron Says He And PM Modi Will Push
https://www.youtube.com/watch?v=LwCK8yAnlkA

But don't be fooled, things are possibly
more connected:

Synthesia: France's 109-billion-euro AI investment
https://www.youtube.com/watch?v=_uyo4RG0Q6I

Bye

Mild Shock schrieb:

Hi,

Suddently I got an allergy to name a predicate
distinct/2. It is not so obvious that distinct/1 and
distinct/2 are related. There is no constant C such that:

distinct(X) :- distinct(C, X).

Just joking, but for some consistency with the introduction
of group_by/4 and aggregate_by/4 I went for the
name first_by/2. The name is more intuitive:

?- [user].
p(1,a).
p(1,b).
p(2,c).
p(2,d).
p(2,e).
^Z
true.

Now some queries:

?- p(X,Y), write(X-Y), nl, fail; true.
1-a
1-b
2-c
2-d
2-e
true.

?- first_by(X, p(X,Y)), write(X-Y), nl, fail; true.
1-a
2-c
true.

Cool! The name is also used here with the same semantics:

https://deephaven.io/core/docs/reference/table-operations/group-and-
aggregate/firstBy/

test

--
Copyright 2025 Olcott

"Talent hits a target no one else can hit;
Genius hits a target no one else can see."
Arthur Schopenhauer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Lonewolf
  Sat Jun 14 15:55:10 2025
  from Little Flock, Ar via Telnet
- Centurion
  Sat Jun 14 09:29:30 2025
  from Berea, Ohio via Telnet
- Plume
  Sat Jun 14 05:15:48 2025
  from Uk via SSH
- Centurion
  Sat Jun 14 04:44:25 2025
  from Berea, Ohio via Telnet
- Centurion
  Sat Jun 14 03:55:25 2025
  from Berea, Ohio via Telnet
- Adam Fancher
  Fri Jun 13 23:00:16 2025
  from Winsted, Ct via Telnet
- Plume
  Fri Jun 13 22:17:22 2025
  from Uk via SSH
- Centurion
  Fri Jun 13 18:58:14 2025
  from Berea, Ohio via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	491
Nodes:	16 (3 / 13)
Uptime:	147:21:23
Calls:	9,695
Calls today:	5
Files:	13,732
Messages:	6,178,671

Experimental aggregate_by/4 was dismissed (Was: India & France had thei

Who's Online

Recent Visitors

System Info