Forum: >>> Magnum BBS <<<

Concerns regarding the "Open Source AI Definition" 1.0-RC2

From Mo Zhou@21:1/5 to All on Sat Oct 26 20:10:01 2024

Hi folks,

While diverse issues persist, the world and the software ecosystem is
still proceeding
with the advancement of AI. As a particular type of software, AI is
quite different
from the paradigm of traditional software, since there are more
components involved
as an integral parts of the AI system. People gradually realize the Open
Source
Definition[3], derived from DFSG[4], could no longer cover AI software
very well.

To answer the question "what kind of AI is free software / open source",
there are
multiple relevant efforts in recent years. Six years ago we discussed
the same question[6],
and as a result, I drafted an unofficial document named ML-Policy[5]; In
the recent
one or two years, OSI started the drafting process of "Open Source AI Definition" (OSAID),
and its 1.0-RC2 version[1] is available for public review, and about to
be formally
released; FSF is working on a similar effort concurrently[2].

I think the upcoming of release of OSAID will make a big impact on the
open source
ecosystem. However, while OSAID starts from DFSG and the software
freedom definition,
it is very concerning to me. Here I'll only discuss the most pressing
issue -- data.

The current OSAID-1.0-RC2 only requires "data information", but not the "original
training data" to be available. That effectively allows "Open Source AI"
to hide
their original training datasets. A group of people expressed their
concerns and
disagreement about the draft on OSI's forum[7][8][9][10], emphasizing
the negative
impacts of allowing "Open Source AI" to hide their original training
datasets.

Allowing "Open Source AI" to hide their original training dataset is
nothing different
than setting up a dataset barrier protecting the monopoly. The "open
source community"
around such "Open Source AI" is only able to conduct further development
based on
such AI, but not able to inspect the process of how the original piece
of "Open
Source AI" is produced, and not able to improve the "Open Souce AI" itself. This leads to many implications including but not limited to security
and bias issues.
For instance, without being able to access the original training data of
an "Open
Source AI", once those "Open Source AI" starts to say harmful or toxic
things,
or starts to deliver advertisements, nobody other than the first party
is able
to diagnose and fix the bias issue or rip the advertisement off and
produce an
improved AI. In the sense of traditional open source software this looks ridiculous
because you can easily modify its source code, ripping off the
advertisement pop up
window, and re-compile it.

My mind remains mostly the same from 6 years ago. And after 5~6 years,
the most
important concept in ML-Policy remains to be ToxicCandy, which is
exactly AI released under
open source license with their training data hidden.

I felt OSI destines to draft something I disagree with some time ago.
And upon the
release of OSAID-1.0, it will make a huge, irreversible impact. I could
not convince
OSI to change their mind, but I do not want to see free software
communities being
impacted by the OSAID and start to compromise software freedom.

No data, no trust. No data, no security. No data, no freedom[11].

Maybe it is time for us to build a consensus on how we tell whether a
piece of
AI is DFSG-compliant or not, instead of waiting for ftp-masters to
interpret those
binary blobs case-by-case.

Do we need a GR to reach a consensus?

[1] https://opensource.org/ai/drafts/the-open-source-ai-definition-1-0-rc2
[2] https://www.fsf.org/news/fsf-is-working-on-freedom-in-machine-learning-applications
[3] https://opensource.org/osd
[4] https://www.debian.org/social_contract
[5] https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst
[6] https://lwn.net/Articles/760142/
[7] https://discuss.opensource.org/t/training-data-access/152
[8] https://discuss.opensource.org/t/list-of-unaddressed-issues-of-osaid-rc2/650 [9] https://discuss.opensource.org/t/what-does-preferred-form-really-mean-in-open-source/625
[10] https://discuss.opensource.org/t/the-open-source-ish-ai-definition-osaid/580 [11] The freedom to study, change, and improve the AI.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to Mo Zhou on Sat Oct 26 22:30:01 2024

On Sat, Oct 26, 2024 at 10:41:12AM -0700, Mo Zhou wrote:

I drafted an unofficial document named ML-Policy[5]

[5]: https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

[...]

Maybe it is time for us to build a consensus on how we tell whether a
piece of AI is DFSG-compliant or not, instead of waiting for
ftp-masters to interpret those binary blobs case-by-case.

Do we need a GR to reach a consensus?

A vote is not a good tool to build consensus (quite the contrary), but
it could be an effective decision-making tool.

Before getting in to that procedural question, though, do you (or
anyone) know what ftpmasters think of the ML-Policy? Because if, say,
they agree with it, it would be enough for them to adopt/endorse that
policy to turn it into the an official Debian policy on this matter.

(In case of doubt: mine is a real question, I have no idea what
fptmasters think about this matter. It just seems important to me to
find that out, before considering a GR that overlaps with ftpmasters'
delegated responsibilities.)

Cheers
--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmcdUF0ACgkQfH5Cj5NB J5kkRA//Zle5QNG0J9FhLhnS4rhUFSEEzGSWnBxAlk4VlaZRIZsmjaAnA9GQeZWN NMYZwSGgFiwgPgxCTqWb/41ByOunawkaDjIxYp1Nz327DlPCIavm2hqdicbYmj9e MTtpTwX9pP8WrBBqvDhEl29oT/lIZAr0YyJTi2wHIxTcQgRutfeNBQXmyoDlbLZw 2LL+PwIRKayQaaML95wmcOq/Fs5CPCqUzaCCM25ZAyvhI7z6MjO8xTccXu0ZYk2y 3LPpcgf3VAZYviNeetCifmqepriXZFrQwhBar3ayG9QVhf/5nleR/nFhAtHLsuwL f2yqhEVrgmTzw9xSZ2SYFDtC4nrVnW1DqDlw/5lfYdKJkicgCsiHxcqkFattwsgJ lUG4ZqOZR0QzA6LrrvPwfvKXKvlk6e8gPvWvpCkjOHyjna7XLSJQFBjcztJqsIpt s47hgbKdAsmLSRl8AG8qzKrfybhfJ2z72uwmwHGWQOChTzlA84t2pxjLxJLXz5M4 5aR/7L3SLc1pEBzPeKEyAw

From David Bremner@21:1/5 to Mo Zhou on Sun Oct 27 14:00:01 2024

Mo Zhou <lumin@debian.org> writes:

My mind remains mostly the same from 6 years ago. And after 5~6 years,
the most important concept in ML-Policy remains to be ToxicCandy,
which is exactly AI released under open source license with their
training data hidden.

Although I'm not involved with machine learning (aka "AI") in Debian, I
do feel pretty strongly that secret training data is wrong for software
in Debian main. So consider this a note of support.

While I agree with Stefano's later followup that GR's are not good tools
for building concensus, I'm not sure such policy decision is really in
the spirit of the FTP master delegation. I recognize that my skepticism
is influenced by the fact that I would consider following the proposed
"OSAID" model to be a substantial weakening of the DFSG.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to David Bremner on Sun Oct 27 16:50:01 2024

On Sun, Oct 27, 2024 at 09:00:07AM -0300, David Bremner wrote:

While I agree with Stefano's later followup that GR's are not good tools
for building concensus, I'm not sure such policy decision is really in
the spirit of the FTP master delegation. I recognize that my skepticism
is influenced by the fact that I would consider following the proposed "OSAID" model to be a substantial weakening of the DFSG.

But who's saying that Debian will follow OSAID? Historically, it is OSI
that has followed Debian, not the other way around. And the only mention
of "open source" in our founding documents is to point out that the OSD
was based on the DFSG; everything else is about "free software". AFAICT
OSI decisions do not influence Debian policies in any way.

Re: GR, I'd be totally fine with one. (And I'd personally vote in favor
of a text that states that Debian-acceptable ML models should come with DFSG-compliant training datasets.) I just not want to use the GR tool as
a hammer against a delegated team if no override is needed.

I don't understand your argument that this decision is not in the realm
of the ftpmaster activities. How could it *not* be, given they are the
team deciding NEW queue acceptance, and that most notably they do so
based on licensing aspects?

Cheers
--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmceXPIACgkQfH5Cj5NB J5lcGA/6Avw7j8tixIhr0AabbJlLz3bvEMmw1UAfhtYJmMFZfbS3QuVg0mES1/Am UFkdPDinuRfa+KiypUoeDeiTpsstXe+FDUZihWydxzbAnw+7sZEakBNr+c5tBLG/ UCUumIHXj1J8u1pCm3kx2EQ6/a8rHz7TLR2LklEy/tnwljF3CMKRFSItJtfAHjXC csUC4IiiSCqaHUj4dYsi649Xdm1AOrZJhr/LSvin5syCseBjZnwYL7JDJvxdGqEC t1kOfSOv1bl2r+MdUeEqMAIzQnc/kcKexTtco69LAdGq2KDylWFoe3g2JND8eXhx yudaRzrFxWrxRmdzWcZcK8SZGXdlDme9TYBZg2ze8xJn/jfQev3PM6XA8LQUDKgM F03bCBgVSUWVO2+hckacAsGXM6p9v6c27MJD5VeI3fUMqGf0gsjZEraSkP+gmEkh 26tCg57/alefcu5/rlEgp4Za9PD+32fbv6FE2Vf3cIOuJJZVjaX1pdAEDgwt1Icx f8F6N2C4TAVfTKpdFBzAd3

From David Bremner@21:1/5 to Stefano Zacchiroli on Sun Oct 27 17:50:01 2024

Stefano Zacchiroli <zack@debian.org> writes:

I don't understand your argument that this decision is not in the realm
of the ftpmaster activities. How could it *not* be, given they are the
team deciding NEW queue acceptance, and that most notably they do so
based on licensing aspects?

I didn't really mean to trigger a governance debate, just express that I thought the decision at hand went beyond what I think of as ftp-master's mission.

Policy 2.2.1 says "Every package in main must comply with the DFSG.",
and then points to the ftp-master REJECT-FAQ as "the project’s current working interpretation of the DFSG." By "spirit of the delegation", I
meant "interpreting the DFSG". For me training-data-less models are
clearly incompatible with the DFSG, so I am uncomfortable with
potentially accepting them as "interpreting the DFSG".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Charles Plessy@21:1/5 to All on Mon Oct 28 13:20:01 2024

Thanks Mo for the heads-up,

and I agree with what you wrote.

Also I find it very worrisome that the FSF and the OSI come with two definitions about AI software freedom that are radically different.

Until now, FSF-free and OSI-open definitions have kept a such a large
overlap in practice, that we can write on our hompeage that "Debian is
made of free and open source software", without anybody finding this
definition confusing. And as we also write "and will always be 100%
free", I wonder if this was a masterpiece of farseeing.

I worry that if there are two competing definitions, OSI's version will
become the refuge of those who deliberately want to leverage as many
obstacles as possible to the freedom of their users, while profiting
from calling themselves Open Source.

If the divide persists, I want Debian to chose the side of freedom. And
maybe the consequence will be that we stop calling ourselves open source because the OSI would be killing its brand. If the OSI cares about it,
maybe we can send them a message telling that?

Also, we can also call to the people and organisations who are currently
listed as endorsers to think again about the divide they are about to
support if the current OSI draft is accepted?

https://opensource.org/ai/endorsements

Cheers,

Charles

--
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from work, https://fediscience.org/@charles_plessy Tooting from home, https://framapiaf.org/@charles_plessy

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to Jonathan Carter on Tue Oct 29 12:20:01 2024

On Mon, Oct 28, 2024 at 09:53:31PM +0200, Jonathan Carter wrote:

The companies [...] want to restrict what you can actually use it
for, and call it open source? And then OSI makes a definition that
seems carefully crafted to let these kind of licenses slip through?

The licensing terms for the Meta Llama models are indeed horrific, but I
don't understand your point here. In order to be OSAID compliant, Meta
will precisely have to change those licensing terms and make them DFSG-compliant. That would be a *good* thing for the world and would fix
the main thing you are upset about.

And Meta is not liking that idea. Meta is, right now, lobbying EU
regulators to convince them that what should count as "open source AI"
for the purposes of the EU AI Act is their (Meta's) version, rather than
OSAID.

I have personally fought (and lost) during the OSAID definition process
to make access to training data mandatory in the definition. So while
I'm certainly not against criticizing OSAID, we should do that for the
right reasons.

Cheers

PS To make Llama models OSAID-compliant Meta, in addition to (1)
changing the model license, will also have to: (2) provide "a listing
of all publicly available training data and where to obtain it", and
(3) release under DFSG-compatible terms their entire training
pipeline (currently unreleased). I don't think they will ever get
there. But if they do, these would also be good things for the world.
Not *as good* as having access to the entire training dataset, but
good nonetheless.

--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmcgwPUACgkQfH5Cj5NB J5kxRQ//RPrzPS6Q1x98FuI+Dyh0ZEWDWDUjagGYb54dxavdRFeotG6urf1tg5l1 t/EY/uPLLUTWKUuq8dj9E/olI5OUAUYihr8BnlChXgw80XExeQECCjdZzdwlEQOD kJtX+Icc5cxasWMWYfB/RjXeVJ7AR94YJspqXr4ZmZQ4uL4yENkKrZnuNAIcSJfc LDD61pCLmCZSevUAAfcOIzPYW5UBVb0uvXtOCLJmU/5RDFv6FDmNtfpNSTOS/rfK Mkv59ICJDPrReErqGziZbQOG28pHEyGkPa++38XkdQo1ubc6pQCHZUuJrUziJCUo vzaxOcdKro6Y2tuRIUinZt0GMFflBB8a4gu5yjOI2rC5JoFz4RHDLDM4OqwrwcgV wBPlLnabSPJR7F32I2JWmNU7Efn3HRzGVASM5I4r7ZtdGw7pNtQdJZgAta77xUWK +fQ8ZWo0kegxoEcELNHNJKpKQ9w2mRPcPdRRqcSv1mO0fwRJoc1WwID63cKI0uXK K0wlNz7brSpcUgs6FP5DQI

From Jeremy Stanley@21:1/5 to Jonathan Carter on Tue Oct 29 17:20:01 2024

On 2024-10-29 17:45:20 +0200 (+0200), Jonathan Carter wrote:
[...]

What is the OSI's motivation for creating such an incredibly lax definition for open source AI? Meta is already calling their absolutely-not-open-source model Open Source and promoting it as such, without as much as a *peep* from the OSI condemning the abuse of the term. (although, while doing a quick search to make sure that's true, I found this link from OSI to an article that keeps insisting that LLama3 is open source: https://opensource.org/press-mentions/meta-inches-toward-open-source-ai-with-new-llama-3-1)

[...]

The earliest comment I'm aware of from them on that specific point
is this article (2023-07-20):

https://opensource.org/blog/metas-llama-2-license-is-not-open-source

Meta is confusing “open source” with “resources available to
some users under some conditions,” two very different things.
We’ve asked them to correct their misstatement.

--
Jeremy Stanley

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmchCJ9fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WClftg//dOo/6B10AtdZiPHLsaqNo2u0++2epcti68oXIskVRWmZck4fa5/iYyoP p4uFVp56L8e6pts5YP9KgHgBdLUyZ+Vi+yMTgWi+imrwHF4UyUg86ixBOLm07EIa rNCSvoyZ3PIzsrRGgpiv2UCipcmq72Grkhy0ArYZjs5RAJQF5e0F3Kh83CbAmeS3 k7SiXThIc5Xm9HQNtZJjOsU+3i6HDUWHG4U76ow+MKtU6zLogpU4/UYodmn8ZsAg tP6PSHPqEYSuCk9Z6icp0qtD6jJQcBjJGWrs8yv3tRmhTRGpGLQGTQ10FhohKE6t Wy1aN775a4SjIzxegBSythyiDEStdxkiX0C3a+v7mJIrtq5WKVh2J3wZwU8ockYH uuhPhfGQ+xh4yp08+Fo0DbcYyWNqzeNX8oQw41JdkaosrDHfgLUVcMaJ0+cxPdkN qFm997daGbz8fiC/XkCg5G4/wVCzJfSOpnWHPoIyst/ucOWcIFVs/N7i1W/nKYy/ BwlT3dmi1CqYsN04yhGmnd+nFCED+nyK4QXJOU6IyET23QJ7nMLemmnaJ7NcSceo s9FYPVE8ulcF29rhCj70X+HX040rv/d7T9vUh7JzV3TcNMggPoOaQUGtjMe7kwbA nBX4vfJBcIYGJV8HrcwogVD2NlT5Ly+knFz1hTG2g6PR8r864So=
=8kBB
-----END PGP SIGNATURE-----

--- SoupGate-Win32

From Stefano Zacchiroli@21:1/5 to Jonathan Carter on Tue Oct 29 18:30:02 2024

[ reordering quoted text ]

Hello Jonathan,

On Tue, Oct 29, 2024 at 05:45:20PM +0200, Jonathan Carter wrote:

On 2024/10/29 13:03, Stefano Zacchiroli wrote:

To make Llama models OSAID-compliant Meta [...] will also have to:
[...] (3) release under DFSG-compatible terms their entire training pipeline (currently unreleased).

Again, the OSAID doesn't particularly care about DFSG-compatible, so
not sure where point 3 comes in here, but if there's something obvious
I missed, I'm all ears.

"DFSG-compliant" was a Debian-slip of mine. I meant "OSD-compliant" (the standard OSD, not OSAID). Sorry about that, but the two definitions are
de facto equivalent for the purpose of our discussion here.

Now, about the code of the training pipeline, OSAID [1] has this to say:

Code: The complete source code used to train and run the system. The
Code shall represent the full specification of how the data was
processed and filtered, and how the training was done. Code shall be
made available under OSI-approved licenses.

Where "OSI-approved licenses" refers to [2] (sure, an explicit link or
mention would be better, but that is what that expression has always
meant in the context of OSD).

[1]: https://opensource.org/ai/open-source-ai-definition
[2]: https://opensource.org/licenses

In order to be OSAID compliant, Meta will precisely have to change
those licensing terms and make them DFSG-compliant. That would be a
*good* thing for the world and would fix the main thing you are
upset about.

Unfortunately that's not the case. Meta won't have to make Llama3 DFSG compliant in order to be OSAID compliant, since OSAID as not as robust as
the OSD.

That's not-not the case :-). Here is what OSAID says about model
parameters (highlight mine):

Parameters: The model parameters, such as weights or other
configuration settings. Parameters shall be made available under *OSI-approved terms*.

[...]

The Open Source AI Definition does not require a specific legal
mechanism for assuring that the model parameters are *freely available
to all*. They may be free by their nature or a license or other legal instrument may be required to ensure their freedom.

AFAIR, in the early days of the OSAID process, the requirements for the
weights were the same of the training code, i.e., under an "OSI-approved license". Then it was pointed out by lawyers and legal scholars that
there is not always an applicable *license* for a matrix of floats. They
might not be protectable by "intellectual property" at all (we don't
know yet), or be in the public domain, or any other number of weird
legal cases. Hence it was not appropriate to use the "OSI-approved
license" expression and OSI picked the alternative expression
"OSI-approved terms". But the intent is that, no matter what legal
regime applies to the weights, they should grant to users the
traditional 4 freedoms, which are defined earlier on in the OSAID.

I agree that it could be better written in the definition, or at least clarified in the FAQ. But there is no doubt whatsoever that a violation
of any OSD point on the licensing terms (or whatever else applies) of
the model weights would disqualify an AI system to be OSAID-compliant.

Hope this clarifies,
Cheers
--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmchGSYACgkQfH5Cj5NB J5nNvg/+P75U+2oKIeUGXIZ4Wn8y6T8R3V5M/kaFnijfM5Ho4g9+aYfu5jZPM3E+ Ep4M+t5LBNQGMk280x1r+6szq3kR2PTsu8wlZQmJvw97cY8uqwBO19zlyT0+vBUP Fjq1GUMsPu/1m3WKtOD6qbIiUNNLJ4iTZ93h3pBekyCD++gUjWCTznajDpQYNGtM +mY2xdfK6aplDDN9NDBTX61o4oKvJDPSNg+LRxZjOF+BlpdlTkZEhU5QgsVGGn51 oM7sxovMO4Fn1mGs0aU0EeQGG/gm89iCI4fvBJ0bu6JA5u1eog7NkgW6FwNbV8Pe 7oT1VmaIHezD0pJgePRHA+ngxl3LlpVEPix6uHu94KVZhvCrEm07W1KqmcSl42nQ VW4n1dl5FUa1/6wOjIaW7gRx+laCVVFZFZr7Xtw7bRGQVejjDYl3uj65dBt+dhNI 0wSx6AKz6lcVggn+DRxzomE9It+XRU9gAfd/WGcZJ20DAon/zv/IWTPvOJcDUDQw h2RgW8XxwgZzxrGZiz42c6

From Gunnar Wolf@21:1/5 to All on Wed Oct 30 08:50:01 2024

Hello Mo Zhou,

Mo Zhou dijo [Sat, Oct 26, 2024 at 10:41:12AM -0700]:

Hi folks,

While diverse issues persist, the world and the software ecosystem is still proceeding with the advancement of AI. As a particular type of software, AI is
quite different from the paradigm of traditional software, since there are more components involved as an integral parts of the AI system. People gradually realize the Open Source Definition[3], derived from DFSG[4], could no longer cover AI software very well.
(...)

I thank you wholeheartedly for bringing up this important topic, and for doing so this quickly (given the OSAID was just approved this past weekend).

I am still starting to read the answers to your mail. I believe starting a GR is
a step in the right direction, as it is our only way to (formally) query the opinion of all of the project's members and coming up with a strong declaration (of course, risking it to be... for, against, or many flavors of NotA 😉). I agree, of course, with Zack: A vote is not a good tool to build consensus. It is, however, the right tool to find the opinion of everybody who cares to voice it. And it is useful for speaking out a clear voice in the name of the whole project.

As for the scope of this decision coinciding with the ftpmasters' delegation, I agree it is a tough spot; it would have to be carefully worded (and, if possible, done so with the participation of ftpmasters, when we know their position on this issue).

But I believe this issue goes beyond just interpreting the OSAID regarding works
to be included in (or kept from being so) in Debian: while ftpmasters are the team deciding NEW acceptance, they could base their decision on a project's GR -- and a GR on this topic would also send a clear signal, even to works not yet submitted to Debian, on where our collective position is. This can steer people to license their datasets in ways better aligned with our understanding of what should be called Free Software (the DFSG) or, in any case, a document better accepted than the OSAID. (of course, I'm assuming the project would vote against
accepting the OSAID as valid for us -- I can be, naturally, surprised). Debian can be a referent to other Free Software projects on a position to take before our ecosystem is forever changed.

I'll do my best to follow this thread, as it is IMO fundamental to a very large application area we have to cater for.

- Gunnar.

-----BEGIN PGP SIGNATURE-----

iHUEABYIAB0WIQRgswk9lhCOXLlxQu/i9jtDU/RZiQUCZyHVawAKCRDi9jtDU/RZ icODAQD9R27nZ5a+myW5b3or4eNqqBLL3H0dxho5X3OLQF4hUgD/cfJ5tDU05i4N OvDKdDBhk5tUXbZ5UQz11esadyST7Qg=
=E38x
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Mo Zhou@21:1/5 to Gerardo Ballabio on Tue Nov 5 03:20:01 2024

I'm planning to draft a GR for this, but that is only going to happen
after I get through some busy weeks.

On 11/4/24 02:22, Gerardo Ballabio wrote:

The OSAID 1.0 has now been released (with no modifications from the RC2).
Are we still going to take any actions or will we let this go?

Gerardo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sean Whitton@21:1/5 to Mo Zhou on Sat Jan 25 13:20:01 2025

Hello Lumin,

On Mon 04 Nov 2024 at 05:52pm -08, Mo Zhou wrote:

I'm planning to draft a GR for this, but that is only going to happen after I get through some busy weeks.

Wondered if you'd had another chance to look at this.

--
Sean Whitton

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmeU1GkZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQKCVD/kB2lYBtRC6OtrOfil/S+qa yy/HFicjHJP1lsMVn3K+7n9oTwnSus6GN2w0TfIkDEB7rKtgY4iBIPWRXCJM7tgt uqg4EajdGXZcD6QTYNYIeHz3Zp/QzLTRAfqdJL/abGkJbmoPaiMRFoG8xpI58wVR 3MO0lb1LSPVk9IFTEkSQ0hCPdzy3wMdy4DhVmR0ZEVtI/wYtaif76nFmAJvH47GS NYswOdxqSXgNKrq0UvUMELts+uKq8GuXgv3ZHytj7Ywh68HghTEEzxWkncOZp2rU VUBWsgsxi/WbaJc56JssF9Vqki5uDshbucyCv/+rb+c97ojnyObFDuUcZTVm+rW6 JlOcPHWiYt3igiPNeUDLbKRVjb1YZ2EwVVuwwp0ecWDLpR9cXFA2Aj2qHkYMqBxl lgaPrf4YtjIwZNm0BGHPyxGsG9dwvAuaJP++hDB5dT46SF3QUgle/RwTC8zJEJaR pWJNY+eeOL45nJmNlLdpYrG0jJcEOHxV7joc/1liQPeQ1EaojikSy9g2GcB5rSo8 74+HjF1njSLvSjOt6sNaqlVDlMsgeXXMlgxrCqfEgb0eG6F8XFXCOn9JKuBMDqUi zJLGTX995zBQPaUi24flESKhykbinw3XU5Pi5BFULC6shpRNvXdROttES+dRUeTb uMfAxx5OMmL0bYgJHPooHw==kNiA
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Us

From M. Zhou@21:1/5 to Sean Whitton on Sat Jan 25 16:30:02 2025

On Sat, 2025-01-25 at 12:09 +0000, Sean Whitton wrote:

Wondered if you'd had another chance to look at this.

Ummm... You know what may happen when there is no deadline.

Did you ping this because there are some thoughts from the policy side?
Or just curious?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sean Whitton@21:1/5 to M. Zhou on Sat Jan 25 16:40:01 2025

Hello,

On Sat 25 Jan 2025 at 10:21am -05, M. Zhou wrote:

On Sat, 2025-01-25 at 12:09 +0000, Sean Whitton wrote:

Wondered if you'd had another chance to look at this.

Ummm... You know what may happen when there is no deadline.

Did you ping this because there are some thoughts from the policy side?
Or just curious?

Nothing to do with Debian Policy, no.

I'm just interested in your thoughts on the matter.

--
Sean Whitton

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmeVA1sZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQBvCD/0X3qBqVtqlNwQZCB8xLYb1 3wxXgCTN+RguXW3/Kt/SMCQiAOtWyLW2k1Dul5bEkQI9b9HPZ/rk+7lPaBRKipOl 9gzwMKPgIjiBFGkj7ikrZZb1Cp2As6hHZed7G2Ekg97Ea3o6n7U72UV3qXVUCJNn D6YfwpvKK3T1HwTv9d4e4KbXqkydkWs24tNidKK7B1ntgCBT8mRmAlb3VnLFZaTH 7YYrN8Bi2VXBDVhY0tiYaIsZKK3C+M84R+fcQBXIpy9MTm9959hzZuM1uGq+rg3a s7ppHAAVRasKmQUAaownGr9HgN8s0lG4khUYbf05q8xxHvyzJMwtfDCCRmljkfUw k8YODgVLJxNGWxi/KK+XYe78+JGNxr9AR/0Bx/yF6K7XIvpVMxvlc6r44DcH1IJJ c6oYDCK0IEl8cui+SqSE0SSzD/lrM1oSlBU7ixJHSExqW9KnOXvuxRW3+d1RyAE1 KkqejGpTZzw8dPXWbmslDQRKIh0O2PeAEwN6XC63gxC1QkLjjlnlTrMCfzkVsdro jjNhV/1dF1fyxfC0jNR+i2irve59VF4ctUh9ktXnbl1lb6OaGQ81X/f33SIhW9Au AoccljhZg92I0jHnUtKgkLlN93WDc2uptt2Wc69WKoO572df0GtjY+UugmrbE7XJ 3UPqWc+ODfls1HBdqd9r4w==4xct
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Us

From M. Zhou@21:1/5 to Sean Whitton on Sat Jan 25 17:20:01 2025

On Sat, 2025-01-25 at 15:29 +0000, Sean Whitton wrote:

Nothing to do with Debian Policy, no.

I'm just interested in your thoughts on the matter.

If we look at the history. Richard Stallman started the GNU
project at a time point where individual developers and free
software communities can really create software, and define
what "free software" is.

If "free software" was defined at a time point where individuals
and free software communities could not write software on their
own, that definition may turn into a Utopia declaration.

Similarly, we are not yet in an era where individuals and free
software communities can easily create original, useful, and
competitive AI from scratch (to ensure, e.g., reproducibility
and trustworthiness). Particularly for language models.
Currently, the model creators can do any decision regarding
their decisions, regardless of how we define terminologies.

That said, with the advancements of research, and the whole
communities appreciation and recognition on the value of free
and open source, that day will come sooner or later for language
models. Creating some simpler vision or language AIs by individual
is already possible.

So the hard deadline for this matter is the day when people
can freely create and publish AIs. We will be too late at that
time point.

While I disagree on OSAID's requirement on training data, it
is directing at the right direction. I can see OSI is making
compromises due to the dilemma that they need something to apply
in practice and take action, while free/open source communities
can not yet create and own language models. What we have seen
from OSI is possibly the best they can do at the current time
point.

From the Debian side, my concerns are unchanged. The OSAID does

not guarantee freedom to our user.

I think FSF is keeping an eye on this. Let me try to make a draft
within Feburary.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sam Johnston@21:1/5 to M. Zhou on Sat Jan 25 17:20:01 2025

On Sat, 25 Jan 2025 at 16:24, M. Zhou <lumin@debian.org> wrote:

On Sat, 2025-01-25 at 12:09 +0000, Sean Whitton wrote:

Wondered if you'd had another chance to look at this.

Ummm... You know what may happen when there is no deadline.

The best time to do this was last year around the OSAID 1.0 release.
The next best time is now. Do you need our help?

I'm working on an article about how the chickens have come home to
roost with the VLC demo at CES 2025. With VLC advertising and users
now expecting real-time AI subtitling that "appears to be built
directly into the VLC app"[1], we have a situation where VLC is
considered Open Source by the OSD, but NOT according to the OSAID and
OSI leadership[2] because of Whisper being embedded. With more and
more software being written by and incorporating AI, this situation is untenable. Distros like Debian would have to lobotomise popular apps
like VLC, or accept more binary blobs.

The OSI also just released a whitepaper[3] that further deliberately
obfuscates the issue, prompting me to post this:

The Open Source Initiative (OSI) goes to the effort of defining four
classes of data *source* (hence the term!) in their Open Source AI
Definition (OSAID) FAQ and again in the Open Future Foundation’s name
in this new paper, only to then accept ANY of them… or NONE at all:

- OPEN data under open licenses, which is the ONLY class that has any
role in Open Source AI
- PUBLIC data like Common Crawl Foundation dumps of the Internet,
which are routinely ab/used without creators’ consent
- OBTAINABLE data “including for a fee” like The New York Times
articles and Adobe/Getty Images stock photos, which are guaranteed to
get end users (but not necessarily vendors given limited liability
clauses) sued
- UNSHAREABLE NONPUBLIC data that obviously has no place in Open
Source, like Facebook & Instagram feeds

With the meaning of Open Source AI being defined solely by the LOWEST
bar — no data delivered at all (which is allowed under the OSAID) —
why bother with the smokescreen if not to deliberately deceive us
users? An honest FAQ entry would have read like this:

What kind of data should be required in the Open Source AI Definition?
None.

1. https://hackaday.com/2025/01/15/floss-weekly-episode-816-open-source-ai/
2. https://www.theverge.com/2025/1/9/24339817/vlc-player-automatic-ai-subtitling-translation
3. https://openfuture.eu/publication/data-governance-in-open-source-ai/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From M. Zhou@21:1/5 to Sam Johnston on Sat Jan 25 18:00:02 2025

On Sat, 2025-01-25 at 17:08 +0100, Sam Johnston wrote:

The best time to do this was last year around the OSAID 1.0 release.
The next best time is now. Do you need our help?

I lean towards making things simpler.

Yes I disagree with OSI's decision on OSAID, and the definition
does not guarantee freedom at all. But a bold move towards picking
a fight against OSI on this matter through Debian General Resolution
sounds terrible and reckless to me.

I'll focus on a simpler topic for the GR:

"how does Debian community interpret DFSG and software freedom
against the AI model and software?"

I'll draft from a pure technical point of view. Neutral to
individuals and organizations, without commenting on how others
think and do. In that case it is as simply as elaborating the
"toxic candy" case, and analyzing the OSAID's implication from
a technical point of view.

In that sense, things will be more constructive and doable.
FSF will also able to learn from Debian's GR discussion.
I'll put my limited energy on this matter towards such direction.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to M. Zhou on Sat Jan 25 18:40:01 2025

On Sat, Jan 25, 2025 at 11:37:48AM -0500, M. Zhou wrote:

I'll focus on a simpler topic for the GR:

"how does Debian community interpret DFSG and software freedom
against the AI model and software?"

Clarifying this officially seem indeed very useful, both for Debian and
for the free software world at large, due to how relevant Debian is as
one of the important gatekeepers of what is free-software-ok and what is
not. Many people out there, and possibly even within the project, are
assuming that the current Debian AI Policy is an official project
position, whereas it is not. It's important to ratify it somehow.

Since the last time this was discussed, did anyone reach out to FTP
master to understand *if* they currently have already an official answer
to this question or not? I'm Cc:-ing them explicitly with this message,
just in case.

Aside from the potential constitutional implications of this (last time
it felt like I was the only one worrying about this aspect, so I'm
dropping it), it would still be interesting to know their take. And it's
quite possible they would actively welcome/encourage an official
project-wide vote on this matter.

Cheers
--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CSO Software Heritage o o o o /\|^|/\ Mastodon: https://mastodon.xyz/@zacchiro '" V "'

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE8ZooXsFA+JEz681OfH5Cj5NBJ5kFAmeVIE4ACgkQfH5Cj5NB J5mmmw//bfoHIS04eXbmA+/V/AjyN2mSQq9CfquoLa1nkwfPJlrfUaeej4575SBx qgGWfm3q9VzUy6sYfGtiPP0xWL1/gGdroILxmRFExjT5aiC0yUnIoGzAUx9PUNVl m4cQFGar+g048BmC83xZQFQpoasbl2G/Fc6BhqFHHKWyvsPFj3mp5XjH81r6LNly 2XocEJIwTj2q2O+yzaLRStA2RhOnP49WEAHByJQ5MZWc6l1ZgokYFsR1UPQqLSMa yzfIYASQP69ySNdrOvjqUrPIWp56wJ+wZKoit7nPsXr0qnIpZ6SFHPq6gBzXYKJD pRVOPmsnbU9N4ZbTlw0A+Xkgmc7NSOZO0RjIiJ5QP0Yk7IUq2TfnnL7xCqG9E/HS MqOkzI2ju97RzVxj/SuQZEH+k4K/kY4ryaLr0ULzinBneY22eNfYm+yWeC3YVkza brCwyfWbwu/mz63vNzuMn5Jhjc61cA5ylQBKzin6YFjNYy68vCfyCAZeEK8H+22b jEQnrDch5iiYy3Pe35iDzb

From Sam Johnston@21:1/5 to Ilu on Sat Jan 25 19:00:01 2025

On Sat, 25 Jan 2025 at 18:42, Ilu <ilulu@gmx.net> wrote:

Will Debian accept a GR that requires all training data to be free,
including training data that belongs to the core of human dignity? That
would be disturbing. And in fact practically lobotomize good projects.

Mozilla would like a word: https://commonvoice.mozilla.org/en/datasets

"Each entry in the dataset consists of a unique MP3 and corresponding
text file. Many of the 33,151 recorded hours in the dataset also
include demographic metadata like age, sex, and accent that can help
train the accuracy of speech recognition engines. The dataset
currently consists of 22,109 validated hours in 133 languages, but
we’re always adding more voices and languages. Take a look at our
Languages page to request a language or start contributing."

- samj

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ilu@21:1/5 to All on Sat Jan 25 18:50:01 2025

When this discussion came up I immeditely thought of text-to-speech
projects like piper, using AI generated voices derived from real persons
voice data. The voice is a very specific attribute of every human. Its
part of attributes that define that persons humanity. Same goes for the
face, the fingerprint, the eyes, the movements, the genome. All these
data are very specific and in summa defines a person (although there are probably more factors, just these came to mind). It also is already used
to identify a person.

If training data involves the core of a humans personhood - as mentioned
above - it cannot be open-sourced or otherwise free. Consent does not
matter. It belongs to that person and nobody else, period. I deeply
believe that there are borders we are not allowed to cross, no matter
how noble the cause.

Do we still want to positively distinguish AI projects whose code,
parameters, weights and adjustments are free? Yes I do and IMHO OSI has
the same goal. That's why I think that OSAID is on the right track. They
could have worded things better and they could have more precisely distinguished different types of training data but in light of time
contraints (EU AI Act) I think they did the best they could.

Will Debian accept a GR that requires all training data to be free,
including training data that belongs to the core of human dignity? That
would be disturbing. And in fact practically lobotomize good projects.

Am 25.01.25 um 17:37 schrieb M. Zhou:

On Sat, 2025-01-25 at 17:08 +0100, Sam Johnston wrote:

The best time to do this was last year around the OSAID 1.0 release.
The next best time is now. Do you need our help?

I lean towards making things simpler.

Yes I disagree with OSI's decision on OSAID, and the definition
does not guarantee freedom at all. But a bold move towards picking
a fight against OSI on this matter through Debian General Resolution
sounds terrible and reckless to me.

I'll focus on a simpler topic for the GR:

"how does Debian community interpret DFSG and software freedom
against the AI model and software?"

I'll draft from a pure technical point of view. Neutral to
individuals and organizations, without commenting on how others
think and do. In that case it is as simply as elaborating the
"toxic candy" case, and analyzing the OSAID's implication from
a technical point of view.

In that sense, things will be more constructive and doable.
FSF will also able to learn from Debian's GR discussion.
I'll put my limited energy on this matter towards such direction.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Sat Apr 26 16:30:05 2025
  from Berea, Ohio via Telnet
- Lonewolf
  Sat Apr 26 16:01:12 2025
  from Little Flock, Ar via Telnet
- Gretchiie
  Sat Apr 26 04:48:26 2025
  from Derry, Nh via Telnet
- Xbit
  Sat Apr 26 02:53:58 2025
  from Pdx, Or via Telnet
- Daniel Garrod
  Fri Apr 25 16:51:41 2025
  from Cambridge, Uk via Telnet
- Lonewolf
  Fri Apr 25 16:12:11 2025
  from Little Flock, Ar via Telnet
- Daniel Garrod
  Fri Apr 25 16:05:48 2025
  from Cambridge, Uk via Telnet
- Ray
  Thu Apr 24 23:59:06 2025
  from Remote via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	463
Nodes:	16 (2 / 14)
Uptime:	156:07:01
Calls:	9,384
Calls today:	4
Files:	13,561
Messages:	6,095,837

Concerns regarding the "Open Source AI Definition" 1.0-RC2

Who's Online

Recent Visitors

System Info