Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être, while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
On Mon, 13 Jun 2022, Michał Górny wrote:
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".
On Mon, 13 Jun 2022, Michał Górny wrote:
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être, >>> while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
Can this be done without requesting changes to package managers?
On 13/06/2022 10.29, Michał Górny wrote:
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM, where some voices where in agreement that EGO_SUM has its raison d'être, while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".
I understand this comment so that there was already a discussion about deprecating and removing EGO_SUM. I usually try to follow what's going
on Gentoo and I remember the discussion about introducing dependency tarballs. But I apparently have missed the part where EGO_SUM was slated
for removal. And it appears I am not the only one, at least Ionen also
wrote "Missed bits and pieces but was never quite sure why this went
toward full deprecation, just discouraged may have been fair enough, …".
In any case, I am sorry for bringing this discussion up again. But since
I started rehashing this, no arguments why EGO_SUM should be removed
have been provided. And so far, I failed to find the old discussions
where I'd hope to find some rationale behind the deprecation of EGO_SUM. :/
On Mon, 13 Jun 2022, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM, >>>> where some voices where in agreement that EGO_SUM has its raison d'être, >>>> while there where no arguments in favor of eventually removing EGO_SUM, >>>> I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
Can this be done without requesting changes to package managers?
What is 'this' here?
The patchset does not make changes to any package manager, just the
go-module eclass.
Note that this is not about finding about an alternative to dependency tarballs. It is just about re-allowing EGO_SUM in addition to
dependency tarballs for packaging Go software in Gentoo.
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM, where some voices where in agreement that EGO_SUM has its raison d'être, while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".
On Mon, 2022-06-13 at 10:29 +0200, Michał Górny wrote:
On Mon, 2022-06-13 at 09:44 +0200, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM,
where some voices where in agreement that EGO_SUM has its raison d'être, >>> while there where no arguments in favor of eventually removing EGO_SUM,
I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
"We've been rehashing the discussion until all opposition got tired
and stopped replying, then we claim everyone agrees".
First of all, I am sorry for my tone.
I have been thinking about it and I was wrong to oppose this change.
I have been conflating two problem: EGO_SUM and Manifest sizes.
However, while EGO_SUM might be an important factor contributing to
the latter, I think we shouldn't single it out and instead focus
on addressing the actual problem.
That said, I believe it's within maintainer's right to decide what API
to deprecate and what API to support. So I'd suggest getting William's approval for this rather than changing the supported API of that eclass
via drive-by commits.
So my idea here is: instead of chucking EGO_SUM (automaticallyI think that this idea that has been pitched already (see for example
generated declarative dependency management) out the window, can we not separate the two and instead of uploading the tarball upload the
dependency set instead?
On 14.06.22 18:33, Holger Hoffstätte wrote:
So my idea here is: instead of chucking EGO_SUM (automaticallyI think that this idea that has been pitched already (see for example Robin's post [1]), although in a broader non-Go-specific sense and it is
generated declarative dependency management) out the window, can we not separate the two and instead of uploading the tarball upload the
dependency set instead?
one obvious way to move forward.
An, and probably the largest, obstacle is that this can not be
implemented in an eclass alone. Due the sandboxing during the build
process, fetching distfiles, which is what we are talking about, is the package managers job and hence, I believe, this would require adustments
to the package manager and package manager specification (PMS).
The basic idea, at least to my understanding (or how I would propose
it), is to have a new top-level ebuild variable
SRC_URI_FILE="https://example.org/manifests/restic-0.13.1.files"
where restic-0.13.1.files contains lines like
<SRC_URI> <SIZE> <HASH> [<TARGET_FILENAME>]
which is, as you nicely demonstrated on the restic ebuild, where the
bytes contributing to the ebuild size bloat originate from.
Those bytes are now outsourced from ::gentoo, can be fetched on-demand, allowing the package manager to download the individual distfiles into DISTDIR, where an, e.g., the go eclass can process them further within
the constraints of the security sandbox.
On Mon, 13 Jun 2022, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM, >>>> where some voices where in agreement that EGO_SUM has its raison d'être, >>>> while there where no arguments in favor of eventually removing EGO_SUM, >>>> I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
Can this be done without requesting changes to package managers?
What is 'this' here?
Undeprecating EGO_SUM.
The patchset does not make changes to any package manager, just the go-module eclass.
Note that this is not about finding about an alternative to dependency tarballs. It is just about re-allowing EGO_SUM in addition to
dependency tarballs for packaging Go software in Gentoo.
On Tue, 2022-06-14 at 19:03 +0200, Florian Schmaus wrote:
On 14.06.22 18:33, Holger Hoffstätte wrote:
So my idea here is: instead of chucking EGO_SUM (automaticallyI think that this idea that has been pitched already (see for example Robin's post [1]), although in a broader non-Go-specific sense and it is one obvious way to move forward.
generated declarative dependency management) out the window, can we not separate the two and instead of uploading the tarball upload the dependency set instead?
An, and probably the largest, obstacle is that this can not be
implemented in an eclass alone. Due the sandboxing during the build process, fetching distfiles, which is what we are talking about, is the package managers job and hence, I believe, this would require adustments to the package manager and package manager specification (PMS).
The basic idea, at least to my understanding (or how I would propose
it), is to have a new top-level ebuild variable
SRC_URI_FILE="https://example.org/manifests/restic-0.13.1.files"
where restic-0.13.1.files contains lines like
<SRC_URI> <SIZE> <HASH> [<TARGET_FILENAME>]
which is, as you nicely demonstrated on the restic ebuild, where the
bytes contributing to the ebuild size bloat originate from.
Those bytes are now outsourced from ::gentoo, can be fetched on-demand, allowing the package manager to download the individual distfiles into DISTDIR, where an, e.g., the go eclass can process them further within
the constraints of the security sandbox.
Anything that involves breaking the Portage plan-depgraph / fetch&build separately would require major architectural changes, so can be rejected immediately as "not going to be implemented in our lifetimes".
Hi,
I've been working on adding a go based ebuild to Gentoo yesterday and I
got this warning form portage saying that EGO_SUM is deprecated and
should be avoided. Since I remember there was an intense discussion
about this on the ML I went back and have re-read the threads before
writing this piece. I'd like to provide my perspective as user, a
proxied maintainer, and overlay owner. I also run a private mirror on my
LAN to serve my hosts in order to reduce load on external mirrors.
Before diving in I think it's worth reading mgorny's blog post "The
modern packager’s security nightmare"[1] as it's relevant to the discussion, and something I deeply agree with.
With all that being said, I feel that the tarball idea is a bad due to
many reasons.
From security point of view, I understand that we still have to trust maintainers not to do funky stuff, but I think this issue goes beyond
that.
First of all one of the advantages of Gentoo is that it gets it's source code from upstream (yes, I'm aware of mirrors acting as a cache layer), which means that poisoning source code needs to be done at upstream
level (effectively means hacking GitHub, PyPi, or some standalone
project's Gitea/cgit/gitlab/etc. instance or similar), sources which
either have more scrutiny or have a limited blast radius.
Additionally if an upstream dependency has a security issue it's easier
to scan all EGO_SUM content and find packages that potentially depend on
a broken dependency and force a re-pinning and rebuild. The tarball
magic hides this completely and makes searching very expensive.
In fact using these vendor tarballs is the equivalent of "static
linking" in the packaging space. Why are we introducing the same issue
in the repository space? This kills the reusability of already
downloaded dependencies and bloats storage requirements. This is
especially bad on laptops, where SSD free space might be limited, in
case the user does not nuke their distfiles after each upgrade.
Considering that BTRFS (and possibly other filesystems) support on the
fly compression the physical cost of a few inflated ebuilds and
Manifests is actually way smaller than the logical size would indicate. Compare that to the huge incompressible tarballs that now we need to
store.
As a proxied maintainer or overlay owner hosting these huge tarballs
also becomes problem (i.e. we need some public space with potentially gigabytes of free space and enough bandwidth to push that to users).
Pushing toward vendor tarballs creates an extra expense on every level (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).
If bloating portage is a big issue and we frown upon go stuff anyway (or only a few users need these packages), why not consider moving all go packages into an officially supported go packages only overlay? I
understand that this would not solve the kernel buffer issue where we
run out of environment variable space, but it would debloat the main
portage tree.
It also breaks reproducibility. With EGO_SUM I can check out an older version of portage tree (well to some extent) and rebuild packages since dependency upstream is very likely to host old versions of their source. With the tarballs this breaks since as soon as an ebuild is dropped from mainline portage the vendor tarballs follow them too. There is no way
for the user to roll back a package a few weeks back (e.g. if new
version has bugs), unlike with EGO_SUM.
In fact I feel this goes against the spirit of portage too, since now instead of "just describing" how to obtain sources and build them, now
it now depends on essentially ephemeral blobs, which happens to be externalized from the portage tree itself. I'm aware that we have
ebuilds that pull in patches and other stuff from dev space already, but
we shouldn't make this even worse.
Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
the EGO_SUM for maintainers which has made maintenance easier. However I haven't found any new guidance yet on how to maintain go packages with
the new tarball method (e.g. what needs to go into the vendor tarball,
what changes are needed in ebuilds). Overall this complifates further
ebuild development and verification of PRs.
In summary, IMHO the EGO_SUM way of handling of go packages has more benefits than drawbacks compared to the vendor tarballs.
Cheers,
Zoltan
[1] https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
Rephrasing this just to ensure I'm understanding it correctly: you're suggesting to move _everything_ that uses Go into its own overlay. Let's
call it gentoo-go for the sake of the example.
If the above is accurate, then I hard disagree.
The biggest package that I have that uses Go is docker (and accompanying tools). Personal distaste of docker aside, it's a very popular piece of software, and I don't think it's fair to require all the people who want
to use it to first enable and sync gentoo-go before they can install it.
And what about transitive dependencies? Suppose app-misc/cool-package is written in some language that isn't Go, but it has a dependency on sys-apps/cool-util which has a dependency on something written in Go.
Should a user wanting to install cool-package have to enable the
gentoo-go overlay now too? Even though app-misc/cool-package would look
like it doesn't need the overlay unless you dig into the deps.
Not a dev, just a user who really likes Gentoo :)
- Oskari
First of all one of the advantages of Gentoo is that it gets it's source code from upstream (yes, I'm aware of mirrors acting as a cache layer), which means that poisoning source code needs to be done at upstream
level (effectively means hacking GitHub, PyPi, or some standalone
project's Gitea/cgit/gitlab/etc. instance or similar), sources which
either have more scrutiny or have a limited blast radius.
Additionally if an upstream dependency has a security issue it's easier
to scan all EGO_SUM content and find packages that potentially depend on
a broken dependency and force a re-pinning and rebuild. The tarball
magic hides this completely and makes searching very expensive.
Considering that BTRFS (and possibly other filesystems) support on the
fly compression the physical cost of a few inflated ebuilds and
Manifests is actually way smaller than the logical size would indicate. Compare that to the huge incompressible tarballs that now we need to
store.
As a proxied maintainer or overlay owner hosting these huge tarballs
also becomes problem (i.e. we need some public space with potentially gigabytes of free space and enough bandwidth to push that to users).
Pushing toward vendor tarballs creates an extra expense on every level (Gentoo infra, mirrors, proxy maintainers, overlay owners, users).
It also breaks reproducibility. With EGO_SUM I can check out an older version of portage tree (well to some extent) and rebuild packages since dependency upstream is very likely to host old versions of their source. With the tarballs this breaks since as soon as an ebuild is dropped from mainline portage the vendor tarballs follow them too. There is no way
for the user to roll back a package a few weeks back (e.g. if new
version has bugs), unlike with EGO_SUM.
Finally with EGO_SUM we had a nice tool get-ego-vendor which produced
the EGO_SUM for maintainers which has made maintenance easier. However I haven't found any new guidance yet on how to maintain go packages with
the new tarball method (e.g. what needs to go into the vendor tarball,
what changes are needed in ebuilds). Overall this complifates further
ebuild development and verification of PRs.
In summary, IMHO the EGO_SUM way of handling of go packages has more benefits than drawbacks compared to the vendor tarballs.
Cheers,
Zoltan
[1] https://blogs.gentoo.org/mgorny/2021/02/19/the-modern-packagers-security-nightmare/
On Mon, Jun 27, 2022 at 01:43:19AM +0200, Zoltan Puskas wrote:
In summary, IMHO the EGO_SUM way of handling of go packages has more
benefits than drawbacks compared to the vendor tarballs.
EGO_SUM can cause portage to break; that is the primary reason support
is going away.
We attempted another solution that was refused, so the only option we
have currently is to build the dependency tarballs.
That reads as if you wrote it under the assumption that we can only
either use dependency tarballs or use EGO_SUM. At the same time, I have
not seen an argument why we can not simply do *both*.
EGO_SUM has numerous advantages over dependency tarballs, but can not be
used if the size of the EGO_SUM value crosses a threshold. So why not
mandate dependency tarballs if a point is crossed and otherwise allow EGO_SUM? That way, we could have the best of both worlds.
- Flow
On 16.7.2022 14.24, Florian Schmaus wrote:
That reads as if you wrote it under the assumption that we can only
either use dependency tarballs or use EGO_SUM. At the same time, I have
not seen an argument why we can not simply do *both*.
EGO_SUM has numerous advantages over dependency tarballs, but can not be used if the size of the EGO_SUM value crosses a threshold. So why not mandate dependency tarballs if a point is crossed and otherwise allow EGO_SUM? That way, we could have the best of both worlds.
- Flow
++ this sounds most sensible. This is also how I've understood your
proposal.
I want to give another option. Both ways are allowed by eclass, but byYes; this is the option I was trying to propose as an intermediate step
QA policy (or some other decision), it is prohibited to use EGO_SUM in
main ::gentoo tree.
As a result, overlays and ::guru can use the EGO_SUM or dist distfile (remember, they don't have access to hosting on dev.g.o).
On Sat, Jul 16, 2022 at 02:58:04PM +0300, Joonas Niilola wrote:
On 16.7.2022 14.24, Florian Schmaus wrote:
++ this sounds most sensible. This is also how I've understood your
proposal.
Remember that with EGO_SUM all of the bloated manifests and ebuilds are
on every user's system.
I added mgorny as a cc to this message because he made it pretty clear
at some point in the previous discussion that the size of these ebuilds
and manifests is unacceptable.
William
On Sat, Jul 16, 2022 at 09:31:35PM +0300, Arthur Zamarin wrote:
I want to give another option. Both ways are allowed by eclass, but by
QA policy (or some other decision), it is prohibited to use EGO_SUM in
main ::gentoo tree.
As a result, overlays and ::guru can use the EGO_SUM or dist distfile (remember, they don't have access to hosting on dev.g.o).Yes; this is the option I was trying to propose as an intermediate step
until we have indirect Manifests that provide the best of both worlds
(not bloating the tree, and not requiring creation of dep tarballs).
On Sat, 16 Jul 2022, William Hubbs wrote:
I could force this in the eclass with the following flow if I know how
to tell if the ebuild inheriting it is in the main tree or not:
# in_main_tree is a place holder for a test to see if the ebuld running
# this is in the tree
if [[ -n ${EGO_SUM} && in_main_tree ]]; then
eqawarn "EGO_SUM is not allowed in the main tree"
eqawarn "This will become a fatal error in the future"
fi
The only question is, is there a way to reliably tell whether or not
we are in the main tree?
The only question is, is there a way to reliably tell whether or notOn Sat, 16 Jul 2022, William Hubbs wrote:
we are in the main tree?
An eclass has no legitimate way to find out in which repository it is.
The rationale is that users should be able to copy ebuilds and eclasses
to their local overlays, and they should work there in the same way.
There is an internal (and undocumented) Portage variable, but that
shouldn't be used.
On Wed, 28 Sep 2022, Florian Schmaus wrote:
I would like to continue discussing whether we should entirely
deprecate EGO_SUM without the desire to offend anyone.
We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic
is a very popular backup software written in Go. The PR drops EGO_SUM
in favor of a vendor tarball created by the proxied maintainer.
However, I am unaware of any tool that lets you practically audit the
35 MiB source contained in the tarball. And even if such a tool
exists, this would mean another manual step is required, which is, potentially, skipped most of the time, weakening our user's security.
This is because I believe neither our tooling, e.g., go-mod.eclass,
nor any Golang tooling, does authenticate the contents of the vendor
tarball against upstream's go.sum. But please correct me if I am
wrong.
I wonder if we can reach consensus around un-depreacting EGO_SUM, but discouraging its usage in certain situations. That is, provide EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
I would like to continue discussing whether we should entirely deprecate EGO_SUM without the desire to offend anyone.
We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic is
a very popular backup software written in Go. The PR drops EGO_SUM in
favor of a vendor tarball created by the proxied maintainer. However, I
am unaware of any tool that lets you practically audit the 35 MiB source contained in the tarball. And even if such a tool exists, this would
mean another manual step is required, which is, potentially, skipped
most of the time, weakening our user's security. This is because I
believe neither our tooling, e.g., go-mod.eclass, nor any Golang
tooling, does authenticate the contents of the vendor tarball against upstream's go.sum. But please correct me if I am wrong.
I wonder if we can reach consensus around un-depreacting EGO_SUM, but discouraging its usage in certain situations. That is, provide EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer maintains the package
In case of 3, I would encourage proxy maintainers to create and provide
the vendor tarball.
The suggested EGO_SUM limits result from a histogram that I created analyzing ::gentoo at 2022-01-01, i.e., a few months before EGO_SUM was deprecated.
- Flow
1: https://github.com/gentoo/gentoo/pull/27050
On Wed, 28 Sep 2022, Florian Schmaus wrote:
I would like to continue discussing whether we should entirely
deprecate EGO_SUM without the desire to offend anyone.
We now have a pending GitHub PR that bumps restic to 0.14 [1]. Restic
is a very popular backup software written in Go. The PR drops EGO_SUM
in favor of a vendor tarball created by the proxied maintainer.
However, I am unaware of any tool that lets you practically audit the
35 MiB source contained in the tarball. And even if such a tool
exists, this would mean another manual step is required, which is, potentially, skipped most of the time, weakening our user's security.
This is because I believe neither our tooling, e.g., go-mod.eclass,
nor any Golang tooling, does authenticate the contents of the vendor tarball against upstream's go.sum. But please correct me if I am
wrong.
I wonder if we can reach consensus around un-depreacting EGO_SUM, but discouraging its usage in certain situations. That is, provide EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the 99-percentile are 1 and 22, respectively.)
On 28/09/2022 23.23, John Helmert III wrote:
On Wed, Sep 28, 2022 at 05:28:00PM +0200, Florian Schmaus wrote:
I would like to continue discussing whether we should entirely
deprecate
EGO_SUM without the desire to offend anyone.
We now have a pending GitHub PR that bumps restic to 0.14 [1].
Restic is
a very popular backup software written in Go. The PR drops EGO_SUM in
favor of a vendor tarball created by the proxied maintainer. However, I
am unaware of any tool that lets you practically audit the 35 MiB
source
contained in the tarball. And even if such a tool exists, this would
mean another manual step is required, which is, potentially, skipped
most of the time, weakening our user's security. This is because I
believe neither our tooling, e.g., go-mod.eclass, nor any Golang
tooling, does authenticate the contents of the vendor tarball against
upstream's go.sum. But please correct me if I am wrong.
I wonder if we can reach consensus around un-depreacting EGO_SUM, but
discouraging its usage in certain situations. That is, provide EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied maintainer
maintains the package
I'm not sure I agree on these limits, given the authenticity problem
exists regardless of how many dependencies there are.
It's not really about authentication, you always have to trust
upstream to some degree (unless you audit every line of code). But I
believe that code distributed via official channels is viewed by more
eyes and significantly more secure.
EGO_SUM entries are directly fetched from the official distribution
channels of Golang. Hence, there is a higher chance that malicious
code in one of those is detected faster, simply because they are
consumed by more entities. Compared to the dependency tarball that is
just used by Gentoo. In contrast to the official sources, "nobody" is
looking at the code inside the tarball.
For proxied packages, where the dependency tarball is published by the proxied maintainer, the tarball also allows another entity to inject
code into the final result of the package. And compared to a few small patches in FILESDIR, such a dependency tarball requires more effort to review. This further weakens security in comparison to EGO_SUM.
- Flow
Hi,
On 2022/09/30 16:53, Florian Schmaus wrote:
jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/
644MÂ Â Â /var/db/repos/gentoo/
I'm not against exploding this by another 200 or even 300 MB personally, >> but I do agree that pointless bloat is bad, and ideally we want to
shrink the size requirements of the portage tree rather than enlarge.
What is the problem if it is 400 MB more? ? What if we double theHow long is a piece of string?
size? Would something break for you? Does that mean we should not add
more packages to ::gentoo? Where do you draw the line? Would you
rather have interested persons contribute to Gentoo or drive them away
due the struggle that the EGO_SUM deprecation causes?
I agree with you entirely. But if the tree gets to 10GB?
At some point it may be worthwhile to split the tree similar to what
Debian does (or did, haven't checked in a while) where there is a core, non-core repo etc ... except I suspect it may be better to split into
classes of packages, eg, x11 (aka desktop) style packages etc, and keep ::gentoo primarily to system stuff (which is also getting harder and
harder to define). And this also makes it harder for maintainers. And this is really already what separate overlays does except the don't (as
far as I know) have the rigorous QA that ::gentoo has.
But again - at what point do you do this - and this also adds extra
burden on maintainers and developers alike.
And of course I could set a filter to not even --sync say /x11-* at
all. For example. Or /dev-go or /dev-php etc ...
So perhaps you're right, this is a moot discussion. Perhaps we should
just say let's solve the problem when (if?) people complain the tree is
too big. No, I'm not being sarcastic, just blunt (;
The majority of Gentoo users (in my experience) are probably of the
developer oriented mindset either way, or have very specific itches that
need scratching that's hard to scratch with other distributions. Let's
face it, Gentoo to begin with should probably not be considered an
"easy" distribution. But it is a highly flexible, pro-choice, extremely customizable, rolling release distribution. Which scratches my itch.
Incidentally, the only categories currently to individually exceed 10MB
are these:
11MÂ Â Â media-libs
11MÂ Â Â net-misc
12MÂ Â Â dev-util
13MÂ Â Â dev-ruby
16MÂ Â Â dev-libs
30MÂ Â Â dev-perl
31MÂ Â Â dev-python
And by far the biggest consumer of space:
124MÂ Â Â metadata
Kind Regards,
Jaco
jkroon@plastiekpoot ~ $ du -sh /var/db/repos/gentoo/How long is a piece of string?
644MÂ Â Â /var/db/repos/gentoo/
I'm not against exploding this by another 200 or even 300 MB personally,
but I do agree that pointless bloat is bad, and ideally we want to
shrink the size requirements of the portage tree rather than enlarge.
What is the problem if it is 400 MB more? ? What if we double the
size? Would something break for you? Does that mean we should not add
more packages to ::gentoo? Where do you draw the line? Would you
rather have interested persons contribute to Gentoo or drive them away
due the struggle that the EGO_SUM deprecation causes?
On 30/09/2022 02.36, William Hubbs wrote:
On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developerOn Wed, 28 Sep 2022, Florian Schmaus wrote:
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)
The numbers may appear large when compared to the whole tree, but I
think a fair comparison would be within the related programming language ecosystem, e.g., Golang or Rust.
For example, analyzing ::gentoo yields the following histogram for 2022-01-01: https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
To stay with your example, restic has a 300k manifest, multiple 30k+ ebuilds and897 distfiles.
I'm thinking the limit would have to be much lower. Say, around 256
entries in EGO_SUM_SRC_URI.
A limit of 256 appears to be to low to be of any use. It is slightly
above the 50th percentile, half of the packages could not use it.
We have to realize that programming language ecosystems that only build static binaries tend to produce software projects that have a large
number of dependencies. For example, app-misc/broot, a tool written in
Rust, has currently 310 entries in its Manifest. Why should we threat
one programming language different from another? Will be see voices that
ask for banning Rust packages in ::gentoo in the future? With the rising popularity of Golang and Rust, we will (hopefully) only ever see an
increase of such packages in ::gentoo. And most existing packages in
this category will at best keep their dependency count constant, but are also likely to accumulate further dependencies over time.
And quite frankly, I don't see a problem with "large" Manifests and/or ebuilds. Yes, it means our FTPs are hosting many files, in some cases
even many small files. And yes, it means that in some cases ebuild
parsing takes a bit longer. But I spoke with a few developers in the
past few months and was not presented with any real world issues that EGO_SUM caused. If someone wants to fill in here, then now is a good
time to speak up. But my impression is that the arguments against
EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am wrong.
I would like to continue discussing whether we should entirely >
deprecate
EGO_SUM without the desire to offend anyone.
We now have a pending GitHub PR that bumps restic to 0.14 [1].
Restic > is
a very popular backup software written in Go. The PR drops EGO_SUM
in
favor of a vendor tarball created by the proxied maintainer.
However, > I
am unaware of any tool that lets you practically audit the 35 MiB >
source
contained in the tarball. And even if such a tool exists, this
would
mean another manual step is required, which is, potentially,
skipped
most of the time, weakening our user's security. This is because I
believe neither our tooling, e.g., go-mod.eclass, nor any Golang
tooling, does authenticate the contents of the vendor tarball
against
upstream's go.sum. But please correct me if I am wrong.
I wonder if we can reach consensus around un-depreacting EGO_SUM,
but
discouraging its usage in certain situations. That is, provide >
EGO_SUM
as option but disallow its use if
1.) *upstream* provides a vendor tarball
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo
developer
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied >
maintainer
maintains the package
In case of 3, I would encourage proxy maintainers to create and >
provide
the vendor tarball.
The suggested EGO_SUM limits result from a histogram that I created analyzing ::gentoo at 2022-01-01, i.e., a few months before EGO_SUM
wasdeprecated.
- Flow
1: https://github.com/gentoo/gentoo/pull/27050
On 30/09/2022 02.36, William Hubbs wrote:
On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developerOn Wed, 28 Sep 2022, Florian Schmaus wrote:
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)
The numbers may appear large when compared to the whole tree, but I
think a fair comparison would be within the related programming language ecosystem, e.g., Golang or Rust.
For example, analyzing ::gentoo yields the following histogram for 2022-01-01: https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
To stay with your example, restic has a 300k manifest, multiple 30k+ ebuilds and897 distfiles.
I'm thinking the limit would have to be much lower. Say, around 256
entries in EGO_SUM_SRC_URI.
A limit of 256 appears to be to low to be of any use. It is slightly
above the 50th percentile, half of the packages could not use it.
We have to realize that programming language ecosystems that only build static binaries tend to produce software projects that have a large
number of dependencies. For example, app-misc/broot, a tool written in
Rust, has currently 310 entries in its Manifest. Why should we threat
one programming language different from another? Will be see voices that
ask for banning Rust packages in ::gentoo in the future? With the rising popularity of Golang and Rust, we will (hopefully) only ever see an
increase of such packages in ::gentoo. And most existing packages in
this category will at best keep their dependency count constant, but are
also likely to accumulate further dependencies over time.
And quite frankly, I don't see a problem with "large" Manifests and/or ebuilds. Yes, it means our FTPs are hosting many files, in some cases
even many small files. And yes, it means that in some cases ebuild
parsing takes a bit longer. But I spoke with a few developers in the
past few months and was not presented with any real world issues that
EGO_SUM caused. If someone wants to fill in here, then now is a good
time to speak up. But my impression is that the arguments against
EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am wrong.
- Flow
I don't know for certain about a vendor tarball, but I do know thereIt is indeed not possible to verify vendor tarballs[1]. The proposed
are instances where a vendor tarball wouldn't work.
app-containers/containerd is a good example of this, That is why the
vendor tarball idea was dropped.
Upstream doesn't need to provide a tarball, just an up-to-dateUpstreams doing this sounds like a mess, because then they'd have to
"vendor" directory at the top level of the project. Two examples that
do this are docker and kubernetes.
On 30 Sep 2022, at 15:53, Florian Schmaus <flow@gentoo.org> wrote:
On 30/09/2022 02.36, William Hubbs wrote:
On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developerOn Wed, 28 Sep 2022, Florian Schmaus wrote:
maintains the package
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)
The numbers may appear large when compared to the whole tree, but I think a fair comparison would be within the related programming language ecosystem, e.g., Golang or Rust.
For example, analyzing ::gentoo yields the following histogram for 2022-01-01:
https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
Manifest. Why should we threat one programming language different from another? Will be see voices that ask for banning Rust packages in ::gentoo in the future? With the rising popularity of Golang and Rust, we will (hopefully) only ever see an increaseTo stay with your example, restic has a 300k manifest, multiple 30k+
ebuilds and897 distfiles.
I'm thinking the limit would have to be much lower. Say, around 256
entries in EGO_SUM_SRC_URI.
A limit of 256 appears to be to low to be of any use. It is slightly above the 50th percentile, half of the packages could not use it.
We have to realize that programming language ecosystems that only build static binaries tend to produce software projects that have a large number of dependencies. For example, app-misc/broot, a tool written in Rust, has currently 310 entries in its
And quite frankly, I don't see a problem with "large" Manifests and/or ebuilds. Yes, it means our FTPs are hosting many files, in some cases even many small files. And yes, it means that in some cases ebuild parsing takes a bit longer. But I spoke witha few developers in the past few months and was not presented with any real world issues that EGO_SUM caused. If someone wants to fill in here, then now is a good time to speak up. But my impression is that the arguments against EGO_SUM are mostly of
On Fri, Sep 30, 2022 at 7:53 AM Florian Schmaus <flow@gentoo.org> wrote:
On 30/09/2022 02.36, William Hubbs wrote:
On Wed, Sep 28, 2022 at 06:31:39PM +0200, Ulrich Mueller wrote:
2.) the number of EGO_SUM entries exceeds 1000 and a Gentoo developer >>> maintains the packageOn Wed, 28 Sep 2022, Florian Schmaus wrote:
3.) the number of EGO_SUM entries exceeds 1500 and a proxied
maintainer maintains the package
These numbers seem quite large, compared to the mean number of 3.4
distfiles for packages in the Gentoo repository. (The median and the
99-percentile are 1 and 22, respectively.)
The numbers may appear large when compared to the whole tree, but I
think a fair comparison would be within the related programming language ecosystem, e.g., Golang or Rust.
For example, analyzing ::gentoo yields the following histogram for 2022-01-01: https://dev.gentoo.org/~flow/ego_sum_entries_histogram-2020-01-01.png
To stay with your example, restic has a 300k manifest, multiple 30k+ ebuilds and897 distfiles.
I'm thinking the limit would have to be much lower. Say, around 256 entries in EGO_SUM_SRC_URI.
A limit of 256 appears to be to low to be of any use. It is slightly
above the 50th percentile, half of the packages could not use it.
We have to realize that programming language ecosystems that only build static binaries tend to produce software projects that have a large
number of dependencies. For example, app-misc/broot, a tool written in Rust, has currently 310 entries in its Manifest. Why should we threat
one programming language different from another? Will be see voices that ask for banning Rust packages in ::gentoo in the future? With the rising popularity of Golang and Rust, we will (hopefully) only ever see an increase of such packages in ::gentoo. And most existing packages in
this category will at best keep their dependency count constant, but are also likely to accumulate further dependencies over time.
And quite frankly, I don't see a problem with "large" Manifests and/or ebuilds. Yes, it means our FTPs are hosting many files, in some cases
even many small files. And yes, it means that in some cases ebuild
parsing takes a bit longer. But I spoke with a few developers in the
past few months and was not presented with any real world issues that EGO_SUM caused. If someone wants to fill in here, then now is a good
time to speak up. But my impression is that the arguments against
EGO_SUM are mostly of cosmetic nature. Again, please correct me if I am wrong.
I thought the problem was that EGO_SUM ends up in SRC_URI, which ends
up in A. A ends up in the environment, and then exec() fails with
E2BIG because there is an imposed limit on environment variables (and
also command line argument length.)
Did this get fixed?
https://bugs.gentoo.org/719202
Hey,
On Friday, 30 September 2022 02:36:05 CEST William Hubbs wrote:
I don't know for certain about a vendor tarball, but I do know thereIt is indeed not possible to verify vendor tarballs[1]. The proposed solution Go people had would also require network access.
are instances where a vendor tarball wouldn't work. app-containers/containerd is a good example of this, That is why the
vendor tarball idea was dropped.
Upstream doesn't need to provide a tarball, just an up-to-dateUpstreams doing this sounds like a mess, because then they'd have to maintain multiple source trees in their repositories, if I understand
"vendor" directory at the top level of the project. Two examples that
do this are docker and kubernetes.
what you mean.
An alternative to vendor tarballs is modcache tarballs. These are
absolutely massive (~20 times larger IIRC), though, they are verifiable.
opinion: I see no way around it. Vendor tarballs are the way to go. For trivial cases, this can likely be EGO_SUM, but it scales exceedingly
poorly, to the point of the trivial case being a very small percentage
of Go packages. I proposed authenticated automation on Gentoo infrastructure as a solution to this, and implemented (a slow and unreliable) proof of concept (posted previously). The obvious question
of "how will proxy maintainers deal with this" is also relatively
simple: giving them authorization for a subset of packages that they'd
need to work on. This is an obvious increase in the barrier of entry for fresh proxy maintainers, but it's still likely less than needing
maintainers to rework ebuilds to use vendor tarballs on dev.g.o.
[1]: https://github.com/golang/go/issues/27348
--
Arsen Arsenović
On Sat, 01 Oct 2022, Florian Schmaus wrote:
Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
appears that the ebuild had more than 6000 entries in SRC_URI [1],
from which A is generated from. Hence even a EGO_SUM limit of 3000
entries should provide enough safety margin to avoid any Golang ebuild running into this.
On Sat, 01 Oct 2022, Florian Schmaus wrote:
Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
appears that the ebuild had more than 6000 entries in SRC_URI [1],
That includes double counting and must be divided by the number of
developers in TEXLIVE_DEVS. AFAICS that number was two in 2020. So 3000
is more realistic as a number there.
from which A is generated from. Hence even a EGO_SUM limit of 3000
entries should provide enough safety margin to avoid any Golang ebuild
running into this.
See above, with 3000 entries there may be zero safety margin. It also
depends on total filename length, because the limit is the Linux
kernel's MAX_ARG_STRLEN (which is 128 KiB).
On 01/10/2022 18.36, Ulrich Mueller wrote:
On Sat, 01 Oct 2022, Florian Schmaus wrote:
Bug #719201 was triggered by dev-texlive/texlive-latexextra-2000. It
appears that the ebuild had more than 6000 entries in SRC_URI [1],
That includes double counting and must be divided by the number of developers in TEXLIVE_DEVS. AFAICS that number was two in 2020. So 3000
is more realistic as a number there.
That may be very well the case. I'd appreciate if you would elaborate on
the double counting. If someone knows a good and easy way to compute A
for an ebuild, then please let me know. That would help to get more meaningful data.
from which A is generated from. Hence even a EGO_SUM limit of 3000
entries should provide enough safety margin to avoid any Golang ebuild
running into this.
See above, with 3000 entries there may be zero safety margin. It also depends on total filename length, because the limit is the Linux
kernel's MAX_ARG_STRLEN (which is 128 KiB).
Of course, this is a rough estimation assuming that the filename length
is roughly the same on average. That said, my proposed limit for EGO_SUM
is 1500, which is still half of 3000 and should still provide enough
safety margin.
On Mon, Jun 13, 2022 at 12:26:43PM +0200, Ulrich Mueller wrote:
On Mon, 13 Jun 2022, Florian Schmaus wrote:
Judging from the gentoo-dev@ mailing list discussion [1] about EGO_SUM, >>>>>> where some voices where in agreement that EGO_SUM has its raison d'être,
while there where no arguments in favor of eventually removing EGO_SUM, >>>>>> I hereby propose to undeprecate EGO_SUM.
1: https://archives.gentoo.org/gentoo-dev/message/1a64a8e7694c3ee11cd48a58a95f2faa
Can this be done without requesting changes to package managers?
What is 'this' here?
Undeprecating EGO_SUM.
The patchset does not make changes to any package manager, just the
go-module eclass.
Note that this is not about finding about an alternative to dependency
tarballs. It is just about re-allowing EGO_SUM in addition to
dependency tarballs for packaging Go software in Gentoo.
Like I said on my earlier reply, there have been packages that break
using EGO_SUM.
The most pressing concern about EGO_SUM is that it can make portage
crash because of the size of SRC_URI, so it definitely should not be preferred over dependency tarballs.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:27:36 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,092 |