Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
--
Best regards,
Michał Górny
On 24/02/27 03:45PM, Michał Górny wrote:
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at: https://github.com/pkgxdev/pantry/issues/5358
--
Best regards,
Michał Górny
I completely agree.
Your rationale hits the most important concerns I have about these technologies in open source. There is a significant opportunity for
Gentoo to set the example here.
--
Kenton Groombridge
Gentoo Linux Developer, SELinux Project
On 2024-02-27 14:45, Michał Górny wrote:
In my opinion, at this point the only reasonable course of action
would be to safely ban "AI"-backed contribution entirely. In other
words, explicitly forbid people from using ChatGPT, Bard, GitHub
Copilot, and so on, to create ebuilds, code, documentation, messages,
bug reports and so on for use in Gentoo.
I very much support this idea, for all the three reasons quoted.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you
are careful enough, but we can't really rely on all our contributors
being aware of the risks.
https://arxiv.org/abs/2211.03622
3. Ethical concerns.
...yeah. Seeing as we failed to condemn the Russian invasion of
Ukraine in 2022, I would probably avoid quoting this as a reason for
banning LLM-generated contributions. Even though I do, as mentioned
above, very much agree with this point.
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do[...] or implementing it.
much about upstream projects using it.
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.
1. Copyright concerns.
By making a contribution to this project, I certify that:
1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or
2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or
3. The contribution is a license text (or a file of similar nature),
and verbatim distribution is allowed; or
4. The contribution was provided directly to me by some other person
who certified 1., 2., 3., or 4., and I have not modified it.
2. Quality concerns.
3. Ethical concerns.
I think adding "made by real people" to the list of our advantages
would be a good thing
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on
for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't
do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that
pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff
we
can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you
are
careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations
don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure
shit
doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
--
Best regards,
Michał Górny
On Tue, 27 Feb 2024, Rich Freeman wrote:
On Tue, Feb 27, 2024 at 9:45 AM Michał Górny <mgorny@gentoo.org> wrote:
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.
1. Copyright concerns.
I do think it makes sense to consider some of this.
However, I feel like the proposal is redundant with the existing
requirement to signoff on the DCO, which says:
By making a contribution to this project, I certify that:
1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or
2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or
3. The contribution is a license text (or a file of similar nature),
and verbatim distribution is allowed; or
4. The contribution was provided directly to me by some other person
who certified 1., 2., 3., or 4., and I have not modified it.
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
On Tue, 27 Feb 2024, Rich Freeman wrote:
On Tue, Feb 27, 2024 at 9:45 AM Michał Górny <mgorny@gentoo.org> wrote:
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns.
First of all, I fully support mgorny's proposal.
1. Copyright concerns.
I do think it makes sense to consider some of this.
However, I feel like the proposal is redundant with the existing requirement to signoff on the DCO, which says:
By making a contribution to this project, I certify that:
1. The contribution was created in whole or in part by me, and
I have the right to submit it under the free software license
indicated in the file; or
2. The contribution is based upon previous work that, to the best of >>>> my knowledge, is covered under an appropriate free software license, >>>> and I have the right under that license to submit that work with
modifications, whether created in whole or in part by me, under the
same free software license (unless I am permitted to submit under a
different license), as indicated in the file; or
3. The contribution is a license text (or a file of similar nature), >>>> and verbatim distribution is allowed; or
4. The contribution was provided directly to me by some other person >>>> who certified 1., 2., 3., or 4., and I have not modified it.
I have been thinking about this aspect too. Certainly there is some
overlap with our GLEP 76 policy, but I don't think that it is redundant.
I'd rather see it as a (much needed) clarification how to deal with AI generated code. All the better if the proposal happens to agree with
policies that are already in place.
Ulrich
2. The contribution is based upon previous work that, to the best of
my knowledge, is covered under an appropriate free software license,
[...]
On 2024.02.27 14:45, Michał Górny wrote:
Hello,
[...]
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure
shit
doesn't flow in.
Compare with the shitstorm at: https://github.com/pkgxdev/pantry/issues/5358
Michał,
An excellent piece of prose setting out the rationale.
I fully support it.
What about cases where someone, say, doesn't have an excellent grasp of English and decides to use, for example, ChatGPT to aid in writing documentation/comments (not code) and puts a note somewhere explicitly mentioning what was AI-generated so that someone else can take a closer
look?
I'd personally not be the biggest fan of this if it wasn't in something
like a PR or ml post where it could be reviewed before being made final.
But the most impportant part IMO would be being up-front about it.
On Wed, 28 Feb 2024, Michał Górny wrote:
On Tue, 2024-02-27 at 21:05 -0600, Oskari Pirhonen wrote:
What about cases where someone, say, doesn't have an excellent grasp of
English and decides to use, for example, ChatGPT to aid in writing
documentation/comments (not code) and puts a note somewhere explicitly
mentioning what was AI-generated so that someone else can take a closer
look?
I'd personally not be the biggest fan of this if it wasn't in something
like a PR or ml post where it could be reviewed before being made final.
But the most impportant part IMO would be being up-front about it.
I'm afraid that wouldn't help much. From my experiences, it would be
less effort for us to help writing it from scratch, than trying to
untangle whatever verbose shit ChatGPT generates. Especially that
a person with poor grasp of the language could have trouble telling
whether the generated text is actually meaningful.
On Wed, 28 Feb 2024, Michał Górny wrote:
On Tue, 2024-02-27 at 21:05 -0600, Oskari Pirhonen wrote:
What about cases where someone, say, doesn't have an excellent grasp of English and decides to use, for example, ChatGPT to aid in writing documentation/comments (not code) and puts a note somewhere explicitly mentioning what was AI-generated so that someone else can take a closer look?
I'd personally not be the biggest fan of this if it wasn't in something like a PR or ml post where it could be reviewed before being made final. But the most impportant part IMO would be being up-front about it.
I'm afraid that wouldn't help much. From my experiences, it would be
less effort for us to help writing it from scratch, than trying to
untangle whatever verbose shit ChatGPT generates. Especially that
a person with poor grasp of the language could have trouble telling
whether the generated text is actually meaningful.
But where do we draw the line? Are translation tools like DeepL allowed?
I don't see much of a copyright issue for these.
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
I know that GitHub Copilot can be limited to licenses, and even to just
the current repository. Even though, I'm not sure that the copyright can
be attributed to "me" and not the "AI" - so still gray area.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
But where do we draw the line? Are translation tools like DeepL
allowed? I don't see much of a copyright issue for these.
I'd also like to jump in and play devil's advocate. There's a fair
chance that this is because I just got back from a
supercomputing/research conf where LLMs were the hot topic in every keynote.
As mentioned by Sam, this RFC is performative. Any users that are going
to abuse LLMs are going to do it _anyway_, regardless of the rules. We already rely on common sense to filter these out; we're always going to
have BS/Spam PRs and bugs - I don't really think that the content being generated by LLM is really any worse.
This doesn't mean that I think we should blanket allow poor quality LLM contributions. It's especially important that we take into account the potential for bias, factual errors, and outright plagarism when these
tools are used incorrectly. We already have methods for weeding out low quality contributions and bad faith contributors - let's trust in these
and see what we can do to strengthen these tools and processes.
A bit closer to home for me, what about using a LLMs as an assistive technology / to reduce boilerplate? I'm recovering from RSI - I don't
know when (if...) I'll be able to type like I used to again. If a model
is able to infer some mostly salvagable boilerplate from its context
window I'm going to use it and spend the effort I would writing that to
fix something else; an outright ban on LLM use will reduce my _ability_
to contribute to the project.
What about using a LLM for code documentation? Some models can do a
passable job of writing decent quality function documentation and, in production, I _have_ caught real issues in my logic this way. Why should
I type that out (and write what I think the code does rather than what
it actually does) if an LLM can get 'close enough' and I only need to do light editing?
[...]
As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
and repos, or more likely trained on exclusively open-source
contributions and fine-tuned on Gentoo specifics? I'm in the process of spinning up several models at work to get a handle on the tech / turn
more electricity into heat - this is a real possibility (if I can ever
find the time).
The cat is out of the bag when it comes to LLMs. In my real-world job I
talk to scientists and engineers using these things (for their
strengths) to quickly iterate on designs, to summarise experimental
results, and even to generate testable hypotheses. We're only going to
see increasing use of this technology going forward.
TL;DR: I think this is a bad idea. We already have effective mechanisms
for dealing with spam and bad faith contributions. Banning LLM use by
Gentoo contributors at this point is just throwing the baby out with the bathwater.
As an alternative I'd be very happy some guidelines for the use of LLMs
and other assistive technologies like "Don't use LLM code snippets
unless you understand them", "Don't blindly copy and paste LLM output",
or, my personal favourite, "Don't be a jerk to our poor bug wranglers".
A blanket "No completely AI/LLM generated works" might be fine, too.
Let's see how the legal issues shake out before we start pre-emptively banning useful tools. There's a lot of ongoing action in this space - at
the very least I'd like to see some thorough discussion of the legal
issues separately if we're making a case for banning an entire class of technology.
[...]
Matt
Hello,Are there footholds where you see AI tooling would be acceptable to you
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't doThe Gentoo Foundation (and SPI) are both US legal entities. That means
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly100% agree; The quality of output is the largest concern *right now*.
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don'tIs an ethical AI entity possible? Your argument here is really an
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
--
Best regards,
Michał Górny
The energy waste argument is also one that needs to be made carefully:
Hello,
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Rationale:
1. Copyright concerns. At this point, the copyright situation around generated content is still unclear. What's pretty clear is that pretty
much all LLMs are trained on huge corpora of copyrighted material, and
all fancy "AI" companies don't give shit about copyright violations.
In particular, there's a good risk that these tools would yield stuff we can't legally use.
2. Quality concerns. LLMs are really great at generating plausibly
looking bullshit. I suppose they can provide good assistance if you are careful enough, but we can't really rely on all our contributors being
aware of the risks.
3. Ethical concerns. As pointed out above, the "AI" corporations don't
give shit about copyright, and don't give shit about people. The AI
bubble is causing huge energy waste. It is giving a great excuse for
layoffs and increasing exploitation of IT workers. It is driving enshittification of the Internet, it is empowering all kinds of spam
and scam.
Gentoo has always stood out as something different, something that
worked for people for whom mainstream distros were lacking. I think
adding "made by real people" to the list of our advantages would be
a good thing — but we need to have policies in place, to make sure shit doesn't flow in.
Compare with the shitstorm at:
https://github.com/pkgxdev/pantry/issues/5358
I'm a bit worried this is slightly performative - which is not a dig at
you at all - given we can't really enforce it, and it requires honesty,
but that's also not a reason to not try ;)
Robin H. Johnson posted on Tue, 5 Mar 2024 06:12:06 +0000 as excerpted:
The energy waste argument is also one that needs to be made carefully:
Indeed. In a Gentoo context, condemning AI for the computative energy waste? Maybe someone could argue that effectively. That someone isn't Gentoo. Something about people living in glass houses throwing stones...
Another person approached me after this RFC and asked whether tooling restricted to the current repo would be okay. For me, that'd be mostly acceptable, given it won't make suggestions based on copyrighted code.
On Fri, 2024-03-08 at 03:59 +0000, Duncan wrote:
Robin H. Johnson posted on Tue, 5 Mar 2024 06:12:06 +0000 as excerpted:
The energy waste argument is also one that needs to be made
carefully:
Indeed. In a Gentoo context, condemning AI for the computative energy
waste? Maybe someone could argue that effectively. That someone isn't
Gentoo. Something about people living in glass houses throwing
stones...
Could you support that claim with actual numbers? Particularly,
on average energy use specifically due to use of Gentoo on machines vs. energy use of dedicated data centers purely for training LLMs? I'm not
even talking of all the energy wasted as a result of these LLMs at work.
Given the recent spread of the "AI" bubble, I think we really need to
look into formally addressing the related concerns. In my opinion,
at this point the only reasonable course of action would be to safely
ban "AI"-backed contribution entirely. In other words, explicitly
forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
create ebuilds, code, documentation, messages, bug reports and so on for
use in Gentoo.
Just to be clear, I'm talking about our "original" content. We can't do
much about upstream projects using it.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 18:22:17 |
Calls: | 9,487 |
Calls today: | 6 |
Files: | 13,617 |
Messages: | 6,121,091 |