Talk:Data laundering

Latest comment: 2 months ago by Sownheard in topic Recent edits

Two Sentences, One Citation? edit

I'm not familiar with Wikipedia's specific standards, but there's nothing resembling an article here; and I question the notability of the topic. The term was invented within the last two years and there is no academic research on the topic. The only related citations I can find are re: the US government purchasing data to infringe privacy rights, which is another usage entirely from what's being referenced here. As far as I can tell, "data laundering" seems to be a buzzword invented by people who expect themselves to be negatively affected by AI to bolster their legal standing. EmzyM (talk) 00:23, 3 September 2023 (UTC)Reply

Recent edits edit

In this edit, Sownheard has argued (in the edit summary) that Data laundering, implies the data was aquired using illegal means. this term lacks detail and is often thrown around. and needs a clarification unauthorized use of data is not Data laundering.

@Sownheard: No, data laundering implies that data that has undeniably been obtained illegally has then been altered so that the original owner cannot easily detect that the data used is their original. There is not an attempt to define what is "legal" and "illegal" data use; the assumption when discussing data laundering is that it is known the data has been acquired illegally. This might include data obtained from hacking, or images or text where fair use has no role, so the data is intentionally altered (think of pictures where the original watermark has been removed). This article makes no implication that Google's uses of text for its Google Books product is a form of data laundering, so any arguments to that effect should not be mentioned here. (You could, if you so chose, try to add this information to the article on Google Books.) WikiDan61ChatMe!ReadMe!! 16:22, 4 March 2024 (UTC)Reply
@WikiDan61 As a note on the discussion of 'data laundering,' it's important to mention the Authors Guild v. Google case for several reasons. Firstly, this inclusion helps clear up a common misunderstanding: data laundering involves altering illegally obtained data, but the Google case, often mistakenly cited in this context, actually dealt with copyright and fair use issues. Secondly, the Google case illustrates the complex legal landscape surrounding data use, offering a pertinent contrast to illegal activities like data laundering. Additionally, understanding this case is educational, providing insights into legal interpretations of digital data usage. It's also relevant to distinguish lawful but legally contentious instances like Google's from outright illegal practices such as data laundering. Finally, addressing this case helps prevent the spread of misinformation by clarifying what constitutes data laundering versus other types of data use disputes. In essence, discussing Authors Guild v. Google is valuable for its educational value and for setting a clear boundary in discussions about legal and illegal data handling. Sownheard (talk) 16:34, 4 March 2024 (UTC)Reply
@Sownheard: I disagree with your reasoning here. Google was accused by the Authors Guild of illegally using their IP (their "data" if you will). But Authors Guild never accused Google of "laundering" (i.e. altering) their data, so this case does not at all apply to the concept of data laundering. Data laundering involves two facets: 1) the illegal obtaining of the data (which Authors Guild v Google determined did not apply to Google); and 2) the intentional manipulation of the data to hide its true source. Since Google's use of data was not illegal, and they did not intentionally alter it to hide its source, the Google case has nothing to do with this page. WikiDan61ChatMe!ReadMe!! 16:39, 4 March 2024 (UTC)Reply
@WikiDan61 this is precisely why the term data laundering is so dangerous and misleeding.
it lacks a formal legal definition This highlights the importance of using precise and universally recognized legal terminology, especially in complex discussions about data use and a word that brings up legal implications.
im fine with removing the google case ,but the term needs a more detailed devenition such that fair use cases are not mistaken for data laundering.
The Term has a few problems
Lack of Legal Definition:
When a term lacks a formal legal definition, it can be interpreted variably across different contexts and jurisdictions. This can lead to confusion and misinterpretation, especially in legal and regulatory discussions.
Risk of Misinterpretation:
Without a clear definition, terms like "data laundering" can be misconstrued. In the context of data, this term might suggest illegal or unethical manipulation or concealment of the origins of data, analogous to money laundering. However, without a specific legal framework defining what constitutes "data laundering," it's challenging to differentiate between unlawful activities and legitimate data processing practices.
Distinction from Judicial System:
The term implies a connection to legal or criminal activities, as laundering is commonly associated with the process of making illegally obtained money appear legal. However, without being tied to the judicial system through formal legislation or legal precedent, its use can be misleading. It's crucial to make a clear distinction between metaphorical use and actual legal implications.
how about we get to a compromise and i will give the google case a passing mention.
and will instead talk about the laion 5b dataset and how that is legaly optained yet oftain classivied as data laundering? Sownheard (talk) 17:18, 4 March 2024 (UTC)Reply
@Sownheard: Data laundering lacks a legal definition because, as of now, it does not appear to be a crime in any jurisdiction. Presuming that data is stolen (and whether or not the data is stolen is an entirely different question that can be discussed on pages dealing with intellectual property rights and copyright law), once we agree that the data is stolen, the act of data laundering comes into play regarding means of disguising the stolen data. The arguments you have made do not point to any reason for including the Google case. If "data laundering" (as opposed to data theft) is not a clearly defined term, throwing a bunch of language regarding what may or may not be data theft does not help further define data laundering.
I was unaware of the laion 5b dataset prior to reading your argument, but having now done that homework, this case also does not apply to data laundering. Here again, the images were obtained legally (although not necessarily within the intent of the image creators), but no effort was made to alter the images to obscure their source.
The bottom line here is that this article does not attempt to define data theft or copyright infringement. Wikipedia has other articles for that. This article is directed solely at the practice of altering the data to obscure its source, which is not what has happened in either case you have discussed. WikiDan61ChatMe!ReadMe!! 18:22, 4 March 2024 (UTC)Reply
WikiDan61 thank you for the clarification
what are your thoughts about adding this?
Data laundering involves two primary facets:
Illegal Acquisition of Data: This pertains to the unauthorized or unlawful procurement of data.
Intentional Manipulation: This involves the deliberate alteration of data to disguise its original source.
Currently, data laundering lacks a formal legal definition. This absence is primarily because, as of now, it is not recognized as a crime in any jurisdiction. A critical aspect to consider is the presumption that the data involved is stolen. However, whether the data is actually stolen is a separate issue, intertwined with discussions on intellectual property rights and copyright law.
The term "data laundering" specifically targets the practice of modifying stolen data to conceal its origin. Instances involving the use of legally obtained data, such as the LAION 5B Dataset and the case of Authors Guild vs. Google, do not constitute data laundering. In these scenarios, no efforts are made to alter the images or to obscure their source. Sownheard (talk) 11:37, 5 March 2024 (UTC)Reply
@Sownheard: I'd agree with points 1 and 2. The third point would need a source, and I don't have one at the moment. (It's very hard to prove a negative assertion such as this.) The final point is meaningless, because no one has ever accused Google or LAION of data laundering. I really don't know why you are so insistent that we bring those cases into a discussion where no one ever has before. (If you have a citation to indicate that Google or LAION has been accused of data laundering, please share it.) WikiDan61ChatMe!ReadMe!! 12:55, 5 March 2024 (UTC)Reply
There are many more but here are some sources that use the term data laundring with regards to the legally optained Laion 5b data set ( the info can be found with a simple Control F LAION )
LAION-5B - Data Laundering : Medium
LAION-5B - AI Data Laundering : waxy.org
LAION-5B - Data Laundering : systemicalternatives
Even when the Laion data set is legaly optained and even the current artist law court case has stated that the use of the data is fine. people still use the term Data laundering, knowing that the source data was optained legaly
direct qoute, the art news paper US District Court Judge William Orrick ruled on Monday (30 October) that the plaintiffs’ claims were “defective in numerous respects”, and that only a direct infringement claim can proceed regarding Stability AI’s role in “scraping, copying and use of training images to train Stable Diffusion”. The defendants had argued in previous filings that the process of training the software with the artists’ images was protected by fair use.
Court
Class action Lawsuite based on data laundering of the laion data set
The courts Motion to dismiss the lawsuite
Court results : Dismissed that training on Laion data. (just like the google case)
https://www.courthousenews.com/artists-beaten-back-in-california-lawsuit-against-ai-image-generators/
https://www.311institute.com/artists-copyrights-claims-agains-generative-ai-companies-mostly-dismissed/
https://decrypt.co/203789/artist-lawsuit-generative-ai-copyright-stability
https://www.computerworld.com/article/3709691/artists-lose-first-copyright-battle-in-the-fight-against-ai-generated-images.html
sorry for the big wall of sources but the confusion around a term that sounds legal Data laundering.
but isnt brings is real and should be pointed out.
Especially when its used to trick artist into thinking they can win a legal case that essentially revolves around fair use Sownheard (talk) 16:02, 5 March 2024 (UTC)Reply

@Sownheard: Thank you for providing those sources. So, what Google has been doing has not been referred to as "data laundering", but what AI image generators have been doing has been referred to as data laundering. Given that, I would recommend a paragraph as follows, under the heading "Notable cases"

LAION, Stable Diffusion and other AI image generators have been accused of data laundering by the artists whose work has been used to train these programs.[1][2] However, several court cases have ruled that the acquisition of these images for the purpose of training AI engines does not violate existing copyright law,[3][4] so the term data laundering in these cases is inappropriate.

For the most part, "data laundering" does not apply to artwork. The term mostly applies to hacked data sets that are then modified to obscure their origin prior to sale to legitimate data customers. I would not try to make any big points here about artists and the meaningless nature of data laundering. (It sounds to me a bit like you are here to right great wrongs -- Wikipedia is not the place to do this.) WikiDan61ChatMe!ReadMe!! 16:29, 5 March 2024 (UTC)Reply

@WikiDan61 thank you Dan for improving the description, ill be making the requested changes Sownheard (talk) 03:45, 7 March 2024 (UTC)Reply

References

  1. ^ Devansh (14 August 2023). "Data Laundering: How Stability AI managed to get millions of copyrighted artworks without paying artists". Medium.
  2. ^ Malig, Mary Louise (9 April 2023). "The Amazing Artifical Intelligence" (PDF). Systemic Alternatives. {{cite web}}: |chapter= ignored (help)
  3. ^ Gennaro, Michael (30 October 2023). "Artists beaten back in California lawsuit against AI image generators". Courthouse News Service.
  4. ^ Griffin, Matthew (3 November 2023). "Artists' Copyright Claims Against Generative AI Companies Mostly Dismissed". 311 Institute.