Wikipedia talk:Plagiarism/Archive 2

Latest comment: 15 years ago by Moonriddengirl in topic Self Plagerism
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5

Another view, and a plea for guidance

I've been adding new articles to WP as part of the DNB sub-project of the missing articles project. I always start by adding the original DNB source to Wikisource. I then start my new WP article by dumping the original Wiksisource text into WP to start the article, adding a general attribution template and an external link to the Wikisourse article. I then begin editing the article just as I would any article that needs copyediting, with no attempt to distinguish the DNB text from newer text.

In my opinion, the origional DNB authors and publishers have no legal or (Berne convention "moral") rights to the DNB material, because copyright has expired. However the original authors have essentially the same (non legal, non-Berne convention) moral rights as any other contributor to Wikipedia. That is, if we leave petty legalisms aside, we have a moral obligation to see to it that an interested reader can determine which contributor wrote each and every word in an article. However, we have no more (and no less) obligation in this regard to a PD author than we do to a GFDL author. That means that there is no need to keep everything in quotes, and there is no need to attribute every single word in the article, inline, to the individual author. If we did this for GFDL editors, none of our articles would be readable at all. Of course, if something is in quotes, it must be attributed (to make any sense at all of quoting) and it must not be edited. But we should not put PD passages in quotes unless we are somehow commenting on the quoted material, or the if the fact that a particular author made a particular statement is itself notable. Except in these unusual cases, we should treat the PD author as a proto-wikipedian who would have added the material to WP under the GFDL, given the chance.

Now to the Plea: Is there something wrong with my methodology? -Arch dude (talk) 00:41, 7 January 2009 (UTC)

I don't know anything about the legality of this stuff (although what you are describing seems to be comparable to the approach taken towards 1911?) but my personal opinion is that just because we legally can copy PD text doesn't necessarily mean we should. I view PD sources as sources for information, not as scaffolding for the article—and I treat them by using a lot of the information and citing them a whole lot as refs, not by copying the text. That is just my personal editing style, though. Copying text from PD sources may not constitute plagiarism per se, but I do believe it constitutes undesirable (not necessarily "poor") writing style. No offense meant to you as an editor—at the rate with which you are adding new articles, and the wide variety of topics, what you're doing is probably the only way to churn out that much material—but my general feeling is that WP should be a new source, not an agglomeration of other sources (even if those sources are tweaked, copyedited, and improved, which I assume is your intention in incorporating DNB text). Again, I have no policy or anything behind me saying that, it's just the way I feel. Politizer talk/contribs 01:30, 7 January 2009 (UTC)
Thanks for your candid response. I think I understand your postiion, and I am happy that you seem to think of this as an asethetic problem rather than an ethical problem (assuming that I understand your position.) In this spirit may I ask you: in your opinion, how does making a change to the words written by (say) Leslie Stephen in 1890 differ from making a change to the words written by (say) User:Politizer in 2008? WP (in my opinion) has the equivalent moral and ethical obligations to these authors. Note that the Oxford University Press, when updating the DNB text to create the ODNB, does not appear to feel any moral obiligation to preserve Stephen's text with full attribution. Why should we? -Arch dude (talk) 02:23, 7 January 2009 (UTC)
That's a good question. I don't know all the real logic behind this, but my intuition is that I write my contributions for Wikipedia, and Stephen didn't. While there's nothing wrong with using Stephen's words for WP, I personally prefer to use stuff like that as a cited source. I agree with you that, in cases of PD text like this, it's more a style issue than an ethical issue.
The other thing I have against using PD text is that a lot of readers may not understand copyright law and how public domain works, and when seeing a WP article that's similar to an old source they might think it was plagiarized; even if they're incorrect in thinking that, WP loses credibility for that reader. I know that was my own reaction the first time I encountered PD text (from DANFS) before I was more familiar with WP and copyright issues. Politizer talk/contribs 02:27, 7 January 2009 (UTC)
I've said this before somewhere: I think it helps to conceptualize the wikipedia editors as one collective of authors. Politizer's words do not need to be distinguished from my words; "we" wrote the article together. However, it is all a lot clearer if we keep track of other authors' words, whether the other is an individual or another PD or private collective. I definitely appreciate your asking for views here, and for your sensitivity that is already clear. I do think it is a moral issue, though, and it is a marketing-type issue too: for wikipedia to be a credible source it is / would be better for wikipedia to function like one careful collective of authors giving credit where credit is due to others (for wording as well as for content). I believe that it has become practice that for articles to reach FA status, they cannot include non-quoted passages of other text. So, I would hope you would just put the passages from the DNB source (what is that, anyhow?) into block quotes and keep track of which words are theirs. It should make it easier, in the long run, to develop those articles. doncram (talk) 02:44, 7 January 2009 (UTC)
I believe DNB refers to the Dictionary of National Biography. Politizer talk/contribs 02:49, 7 January 2009 (UTC)
Sorry, but I conceptualize the WP editors as members of the human intellectual collective. "We" are not separate from shcolars, authors, and editors that came before us, and "our" moral obligations to each other are no different than our moral obligations to our predecessors. We are all part of a chain of scholarship that extends back to the first written word. Yes, we need to preserve attribution, but we also need to help our predecessors continue to inform our readers. If every author rigidly preserves quoted content forever, then we end up with a morass of quotatins 30 levels deep with 30 levels of footnotes: clearly an insane result. Note: the Dictionary of National Biography (DNB) is a 63-volume compendium of British biographies, written over a period of 15 years from 1885 to 1900. -Arch dude (talk) 03:02, 7 January 2009 (UTC)
(e/c) I may have the indent level wrong, but I'll comment anyway:
  • Our basic principle is that we attribute the original words of the authors to those authors. Further changes can be traced through article history, but the original creative contribution has to be attributed - and by that I mean the specific words. In the case of PD-by-time-elapsed work, I doubt there's a legal requirement, I do feel there is an ethical requirement, and of course here on this page we're (slowly) working toward a policy requirement.
  • For PD works, the single best approach is to rewrite in your own words what the original, and almost always anyway outdated, source has to say. Next best is to blockquote the text. Next after that, make a single edit to insert the copied text, along with a PD-attribution template.
  • Moving something out of blockquotes, or changing something within the quotes, is problematic, unless it's fully rewritten. Extra care must be taken if this is done, such as long edit summaries or accompanying talk page notes. This satisifies the GFDL History section requirements.
  • And in any case, I think the best approach is to just ask yourself: "Is there any way someone could take these words as being mine, when I know myself that they're not?" There's tons of server space, so just make sure that when you incorporate PD text, you make sure that a reader in the future can trace back exactly (on this site) what text was copied. Nutshell: give credit where credit is due; assume no undue credit.
Just my opinions, but there they are. Franamax (talk) 03:09, 7 January 2009 (UTC)
In addition to what Franamax said about the minimum standards for citation, I consider that using significant parts of a old DNB biography in a WP article is an inadequate way to proceed if there are better sources--which for UK historical figures would include as a minimum the current edition of the DNB, and a check for other reputable biographies. This is in most cases over 100 years after the articles there were prepared, and the view of what facts are reliable may have changed, as well as the interpretation.
With respect to interpretation, I wold use the old DNB only for the purpose of showing what the accepted UK academic view of people was at the period when it was written, and that by the named individual who wrote the article there--Stephens did not write the DNB, but edited it--the articles are all signed. Additionally, an errata vol. were published in 1966.
There is an exception--if a contemporary reliable academic source says that the old DNB bio of an individual is still reliable, it could be used--if accompanied by that quotation. DGG (talk) 22:34, 7 January 2009 (UTC)
Stephen was the original editor. There are more than 600 authors of original DNB articles. Stephen did in fact write a few of the articles, including at least one of the articles (James Stephen (undersecretary)) I added to WP. Our current discussion here is about plagarism and attribution, not about whether or not a DNB source dump is a good way to create an article. I personally think that it is lot better than not having an article, but we should have that discussion over at Wikipedia:WikiProject Missing encyclopedic articles/DNB (and I agree with you that your way can create a better article.) So, what do you think about attribution? I believe that starting the article with a raw dump of the DNB text, with attribution in the edit summary, an attribution template on the page itself, a link to the text at Wikisource, AND a comment on the talk page (yes, all four) is more than adequate to fulfill our moral obligation to the original author. Note that the (tiny) project DNB team over at Wikisource is spending a very large amount of effort to postively identify the individual authors for each of the 29,000 articles, so in a sense we are doing a better job of attribution than the original DNB did.-Arch dude (talk) 01:18, 8 January 2009 (UTC)
Now I'd say that's barely adequate, since the original author didn't agree to the terms that their text could be mercilessly edited. The question arises as to when, if ever, the PD-attribution template can be removed: every word may have changed, but the original author's article structure may still be intact (their creative contribution to "approaching the subject"). However in practical terms, starting with a straight dump and including the "all four" seems to me to satisfy the GFDL requirements, in particular the History section which allows traceback and attribution. Arch dude, it seems that nothing is lost by your approach, so (just at this moment at least :) it satisfies me. Franamax (talk) 01:35, 8 January 2009 (UTC)
We need to be clear about what we are trying to accomplish. This discussion is about The WP Plagarism policy. This is distinct from "the GFDL requirements" from copyriht law, and even from academic plagarism requirements. In order:
  • GFDL does not mention plagarism. There in nothing in the GFDL that prevents direct copying of a PD work.
  • Copyright law places no restriction on PD work.
  • Academic plagarism rules are not binding on WP, although I think we should use them to guide WP policy and guidelines.
WP policy is what we are working on here. Speaking personally, I feel no more obligation to the DNB authors than I do the WP editors. The DNB authors' works are released under PD, and the WP editors work is releaseed under GFDL: Note that the DNB author's work is also being "relentlessly edited without attribution" by Oxford University Press in later editions of the DNB, who purchased the copyrights in 1917 from the the first publisher, who acquired the rights from the original authors. As it happens, I feel that we do have an obligation to our readers to preserve attribution, but that obligation is to our reader: that's what a plagarism policy is all about. But our readers already know (or should know) that WP is a collobarative work, and any reader that so desires can learn how to discover exactly how an article has evolved. By ensuring that the original PD work is fully cited when initially added to the article, we have fulfilled our obligation to our readership. -Arch dude (talk) 11:05, 10 January 2009 (UTC)

Coming in late, comments

Kudos to whoever introduced the phrase "Copyright and free are short words but complex concepts." I like. :)

In terms of "free works", I think it's important to note that licensed is not the same as public domain. GFDL works remain copyrighted, though the license for reuse is liberal. If they are used outside of license, copyright infringement still occurs. I'm not going to implement that directly, though, since I'm new to this conversation and want to get a feel for it before diving in. :) Also, it's probably worth noting that there are compatibility issues with CC on Wiki text. We're still unable to use material that is "attribution share-alike". I'm not real clear on the differences between the various CC licenses (though it looks like I'm going to have to learn 'em), but I think it's probably important to clarify here to prevent people reading this and thinking that all CC text is free to use.

Wikipedia doesn't currently insist that all copied work is attributed (ETA: wait: I'm thinking on rereading my own comment that I'm taking attribution in the "inline" sense, whereas what's meant here is Attribution (copyright). If so, we might want to clarify for folk like me that general attribution, not line-by-line, is what is meant, by contrast to my reading of WP:NFC, where inline is required for quotes). We like it to be credited, sure. But we have a whole slew of PD attribution templates that rather vaguely note that some of the text on the page is copied from a PD source. I think we need to soften that wording. I'm inclined to think that exhortations like "it is imperative that their work is distinguishable from the prose of the Wikipedia article" is likely to meet difficulty finding consensus, since it is evidently quite a change from common practice. I do like "it is important to retain an anchor to the originally copied text, so that subsequent changes can be traced", which fits very well with the spirit of GFDL, but I expect that ultimately material will be melted together just as our contributions are. But it seems likely that no "anchor" (in terms of an active internet presence, as I understand it) will exist, which would perhaps make it better to urge people to retain the citation?

With respect to "What is Not Plagiarism": IANAL. Lists of information may or may not be copyrighted, depending on the nature of the list. :) If creativity has gone into the selection of elements (in terms of which facts are included and order of presentation), then it may be protected by copyright (see Feist Publications v. Rural Telephone Service). If the list is a comprehensive list--such as an alphabetical directory of telephone numbers or a list of ingredients in a recipe--it is not. If the list is a list of words or song titles, it's not protected. If the information is presented in sentence form, it probably is. But that's a question of copyright. The question of plagiarism is different. Copyright does not protect "sweat of the brow," or the labor that goes into compiling information, but it seems to me that plagiarism probably does. Unless a list is a common compilation (which one might find freely reproduced anywhere, say), I'm inclined to think we ought to give credit. Thoughts?

Speaking of jargon, "Elgoog hcraes?"

Copying of copyrighted works: "Such additions can be dealt with either by attribution, turning it into a quote with a source, or by truncation or removal of the copied material." This seems to suggest that each of these remedies is equal. I'd suggest appending something like "depending on the nature of the infringement." I like the next bit, which seems straightforward and clear-cut (though we may need to note that extensive quotation can also refer to overuse of short quotations from one source; I can provide examples :)). I get a bit caught here, though: "This may still be a violation of copyright as a derivative work, though the same concerns about plagiarism would apply if the phrases, concepts and ideas in the copied material were not attributed to the original author." Though seems to indicate a contrasting idea. Where's the contrast? I'm not quite sure what's meant to be conveyed by this, but wonder if rephrasing would be useful: "This may still be a violation of copyright as a derivative work. Structure, presentation, and phrasing of the information should be your own original creation. The same concerns about plagiarism apply if the phrases, concepts and ideas in the copied material are not attributed to the original author." (I have utilized some language from the current proposed revision to the copyright FAQ, here.) I wonder if it would be helpful to incorporate some of User:Dcoetzee's suggestions from this talk thread here as well. I think they're very good.

Look forward to feedback. Lacking any, I may WP:BOLD. :) But, again, I'm not familiar with the work-environment of this proposal, so I want to introduce myself to it before diving in. Hi. --Moonriddengirl (talk) 12:51, 7 January 2009 (UTC)

Short answer before the longer ones after some analysis:
  • MRG, I'll accept your kudos, thank you. :) [1] Nice to see you, what took ya so long? I'm sure you can add some valuable perspective, given your involvement with many of these issues.
  • The work-environment is much like some government offices, one person at the counter, the rest of us having a coffee-break. :) I'll try to give some feedback anon, but go ahead and make judicious changes as you see fit.
  • The jargon bit reflects our incomplete development process, there are still quite a few raw bits here. I think maybe the OP didn't want to indicate a preference for an individual search-engine. I'd have used the more sophisticated "Oogle-gay Earch-say", but to each his own. :)
More soon (hopefully). Franamax (talk) 01:44, 8 January 2009 (UTC)
Ah--word puzzles! Got you. :) Having tested the bathwater, I've climbed in and splashed about a bit. Judiciously? One hopes. :) My primary purpose was to condense and structure. I'm new to this policy proposal and still energetic enough to try to bull it through the molasses of wiki-apathy, so here's hoping we can get it in place soon, as I think it is sorely needed. But I'm repeating myself. See below. --Moonriddengirl (talk) 15:55, 10 January 2009 (UTC)

Longer answer to your points:

  • GFDL works remain copyrighted - I would have said "true, but is that relevant to a discussion of plagiarism?" but I'm OK with the wording you've introduced. Note though that your restructuring has subtly altered the message, since the "In all cases..." wording now appears to be part of the sub-section rather than its original intent to be applicable to both PD and free works.
  • Your para. 2: the current wording of the "In all cases..." sentence is a bit of a mish-mash resulting from some passionate debate above about the history of article development (such as the big EB1911 dump); philosophical approaches to using PD sources such as whether we wish to acknowledge Berne Convention-style moral rights that survive the copyright limits and prevent us from modifying the original authors work; methodological approaches such as putting PD-text into WikiSource first, transcribing directly into en:wiki, transcribing and modifying when making the first entry; &c. The whole issue is somewhat vexed and you might gain some insight by reading through some of the long discussions above. This is probably the weakest part of the whole guideline, I don't think we actually arrived at agreement and there are some strong feelings on the subject. (And yes, it included attribution templates)
  • The "retain an anchor" sentence I believe was mine, and if so what I meant was that there needs to be a single place on-wiki where you can see exactly what the PD-text was before it was modified (like, say, an oldid field in the {{PD-attribution}} template). Again, this conflicts with actual and historical practice.
  • Lists - this is a difficult one. I'm not sure if I've ever written an original FUR (which is a list of items) - I've just copy/pasted other people's work and changed a few words. Plagiarism never occurred to me, although I did credit the source for the last FUR I created, since I was feeling a little guilty. By your comment, whoever first created the FUR I copied indeed contributed "sweat of the brow". And what if I go to the NASA website and find a "moon landings" page to create "Chronological List of Moon Landings"? My list would contain the identical information, date/vessel/country, in the identical order. Did I plagiarise a PD work? I don't have to ref the source either, since they are uncontroversial facts and I wiki-linked all the lander names to their articles. The danger here is that in a later discussion I say that I took the list from a NASA page and someone else says "OMG you're a plagiarist!" Given the very strong connotations of the term on-wiki, we need to consider this carefully.

I'll shortly (or longly) try to summarize some of the oustanding general issues from my notes. Thanks for your help and I like your rewrites! Franamax (talk) 06:00, 11 January 2009 (UTC)

Quoting other policies

Where possible, it is better to link to other policies and guidelines than quote them, as there is danger that the other document will change to deviate from the text here, creating a policy fork. For that reason, in my bold restructuring of the section on addressing these concerns, I've removed the quote from NFC. The link should be sufficient. YMMV, of course, but I thought to explain my reasoning here. --Moonriddengirl (talk) 14:48, 10 January 2009 (UTC)

I think that when we started working on this, we were working from existing text in other policies and felt the need to ensure that we were basing our edits here on the actual text when we read it there. Also, I think we were all doing gut-checks on the lines of "if I believe in this, am I actually doing it?" and we may have ended up being very specific about sources and attribution. I also think you're right about policy forks, if this becomes a guideline it will be cross-linked and will properly evolve in tandem, so there's much less need for direct quotes. Franamax (talk) 06:09, 11 January 2009 (UTC)

Resources box?

Although we don't have a "plagiarism" infobox like {{Wikipedia copyright}}, I'm wondering if a specially made one would be helpful. I like them for their quick visual catch and the handy compilation possibilities. (See one I put together on my userpage here.) Thoughts? Would this be a good thing or window dressing? --Moonriddengirl (talk) 15:51, 10 January 2009 (UTC)

I like the content of your sample box and it might also help to visually break up the "wall of text" at the start of the guideline, which can be off-putting for some readers. I'd say give it a try, I recently discovered an extra button called "Undo" :) Franamax (talk) 06:03, 11 January 2009 (UTC)

Proceeding with implementation

This has been around a while; perhaps it is time to expedite matters? I can't see that an RfC has been conducted on this. The image plagiarism section needs a bit of work, but it looks to me like it's about at a point to chuck it in the pool and see if it swims. :) It's certainly (in my not really humble at all opinion) much needed. Thoughts? If others feel it's ready, I'd be happy to do a bit to the image section, launch an RfC and drop it at VP again, with a hope to remove that "proposed" tag in the near future. --Moonriddengirl (talk) 15:51, 10 January 2009 (UTC)

I've tackled the image stuff somewhat and also requested feedback from User:Dcoetzee, who has (unfortunately for him) become one of my go-to guys. (He's also an admin at commons, and is so presumably experience with images. :)) --Moonriddengirl (talk) 14:56, 11 January 2009 (UTC)
Nice work, that was an area somewhat lacking. Some notes:
  • Do we properly define yet what is plagiarism but not a copyvio? You touch on it with the image taken from an 1825 textbook example. I imagine the scenario there would be that the image description is "I made this myself" or some such, without the addition of "by scanning page 714 from Principiae Mathematicae (Newton, Isaac ca.1643)"? Does your text make clear to the novice editor exactly where plagiarism comes into the picture? And do they actually acquire rights of any kind when they make a 2D representation of a 2D object in the PD?
  • Can you expand to touch on other media (since we no longer have "Image:"'s now, we have "File:"'s)? What is the definition of plagiarism of an audio clip? Also, even though they have an image format, should charts and diagrams be treated separately? If I copy a molecular structure or a planetary orbit diagram without specifying the paper page I was looking at, am I plagiarizing?
  • And is there any way to better separate the copyright issues from plagiarism in general, especially with an eye to shortening the text a bit? I'm not sure how that could be done, just asking. :) Franamax (talk) 22:38, 11 January 2009 (UTC)
The problem with separating copyright infringement from plagiarism for me is that they often seem to go hand in hand--particularly with images. While with text, you can copy not quite enough to create infringement but enough to create plagiarism, with images that's hard to do. The only case I could come up with of plagiarism that is not infringement is when an image is pd or otherwise free for use, but not properly credited. I suppose we could strip all the reference to licensing from the text to abbreviate it and focus only on sourcing? Any ideas you have for improving, diminishing or refining are fine with me, even if it means that merciless editing thing we're warned about when we contribute. I'm not likely to feel ruffled. :) It's all for the good of the project, and we can talk about it if I disagree. I don't know that charts & diagrams need separate treatment, although you may be thinking of reasons that I'm not. It seems to me that it's the same with a photograph: we need to know where it came from. If you don't say, you're not working within policy (whether its plagiarism or copyright infringement). As far as audio clips, I'm having trouble imagining plagiarism: maybe if they took a clip of a government speech and claimed it was their own? Do you think this needs separate handling, or should we just generalize most references to images to "non-text media" or something? Maybe Dcoetzee will have some good ideas. He helped out with the recent change to the copyright FAQ and made some good suggestions to the (probably) pending change to WP:C. :) --Moonriddengirl (talk) 23:02, 11 January 2009 (UTC)

"image" to file?

Image: to File: - a reference to the change in the formal name of the storage space, precisely because not all entries are images per se. Audio clips - lifting a clip from the National Archives and using the summary "this is my performance of The Well-tempered Clavier", I admit it's far-fetched. Charts and diagrams - if I look at a diagram of the structure of insulin published in Nature and reproduce it exactly in my own graphics program, with the same colours, shading and legend, upload it with the summary "I made this diagram of the insulin molecule" - am I plagiarisng? If I do it for methane? (probably not) Rubisco? (take a look at the lead image - I'm sure ARP did generate it himself, but what if I just scanned the image from a journal and uploaded it?) Franamax (talk) 23:24, 11 January 2009 (UTC)
You'll have to bear with me on excessive detail and questioning (if you can). It's the legacy of a background in computer science and engineering, both of which are best served by examining all the possibilities and making sure there are no unexpected gaps. :) Franamax (talk) 23:28, 11 January 2009 (UTC)
Well, text is my region, so I'm by no means any kind of expert on image/file/sound issues. But plagiarism is basically claiming credit for the work of others, or as the headline of this proposed policy says, "the copying of material produced by others without attributing that material to the original author, whether verbatim or with only minimal changes." That seems to apply to diagrams and performances of Bach as well as anything else. Do you believe we need examples of non-image media files that may be plagiarised, or do you think we would be served best by just generalizing? --Moonriddengirl (talk) 23:35, 11 January 2009 (UTC)

What parts of this document are different from Wikipedia:Copyrights and Wikipedia:Citing sources? Why should another policy page be added when it is entirely covered by existing policies? —Centrxtalk • 23:43, 11 January 2009 (UTC)

This is what WP:C says about plagiarism: "Note that copyright law governs the creative expression of ideas, not the ideas or information themselves. Therefore, it is legal to read an encyclopedia article or other work, reformulate the concepts in your own words, and submit it to Wikipedia. However, it would still be unethical (but not illegal) to do so without citing the original as a reference. See plagiarism and fair use for discussions of how much reformulation is necessary in a general context." Wikipedia:Citing sources is not a policy, but a "style guideline." Policies are standards; guidelines are advisory. I tend to think that coverage in existing policy is rather skimpy. --Moonriddengirl (talk) 23:51, 11 January 2009 (UTC)
Insofar as Wikpedia:Citing sources is only a style guideline, the principle is covered under Wikipedia:Verifiability. I don't see how the quote from WP:C is relevant; it is simply deferring the issue of uncopyrighted copies to citation. I cannot find a single part of this proposed guideline that does not belong in, and almost invariably is already found in, another guideline. —Centrxtalk • 06:58, 12 January 2009 (UTC)
It seems relevant to me in response to your question. You asked how it differs from WP:C. I told you exactly what WP:C says about plagiarism. --Moonriddengirl (talk) 12:58, 12 January 2009 (UTC)
We're not in any way trying to create policy here. We're trying to create a guideline that summarizes the specific issues, offers advice, and outlines best practice. I can point to quite a few places where the issue of plagiarism is discussed, both on the AN boards and specific incident requests on this talk page. Centrx, you may feel that any issue can be dealt with through appropriate interpretation of WP:COPY and WP:CITE, but it seems as though quite a few other editors disagree. The word arises often enough that we can contemplate offering some consensus-built specific guidance to our editors, especially those less experienced in nuances and interpretation as yourself. We want to build a centralized resource, rather than send editors through a wiki-hunt. Franamax (talk) 00:29, 12 January 2009 (UTC)
The word is simply a dictionary definition, repeated incorrectly in the first sentence of this guideline. If this is not the creation of a new policy and is merely to be a centralized resource, it belongs at some page Wikipedia:Copying, since "plagiarism" means there was an intent to take someone else's work and pass it off as one's own, which is besides the point. An editor who innocently copies some text to improve Wikipedia, but without attribution, is not committing plagiarism, but still runs afoul of copyright or citation. —Centrxtalk • 06:58, 12 January 2009 (UTC)
Your definition of plagiarism as requiring intent is not universal...or necessarily legally defensible in the US (not that it could ever come to that here). In 1992, a Princeton student took her case to court in part over the question of intent; Princeton had withheld her degree for a year because she had plagiarized in a Spanish paper, correctly citing her source for some passages but failing to do so in others. She lost.(See [2] and plentiful other sources; the trial judge did indicate a personal opinion that Princeton's punishment was too severe, with which I heartily agree, under the circumstances.) Purdue University distinguishes between "deliberate and accidental plagiarism" here--and they aren't alone. Cf. [3] and [4]. If plagiarism requires intent, there can be no "accidental plagiarism." (See also this 2008 article, which to Princeton adds Yale, UC Berkeley, Cornell, Vanderbilt and Richmond as among those universities which do not include intent in their definition of plagiarism.) That said, I can see why you would be uncomfortable with the title of the document if you believe "plagiarism" is an accusation of bad faith, while those who don't ascribe intent don't see it that way. Evidently, the matter is somewhat contentious: see [5]. Given divergence in definition, obviously we should be clear in our terminology or in our definitions that plagiarism is not necessarily bad faith. In my experience, copyright infringement is often not committed by bad faith users, but by individuals who don't understand what they're doing wrong. --Moonriddengirl (talk) 12:58, 12 January 2009 (UTC)
In the academic context, they are zealous and will exact the same punishment for accidental copying as they would for plagiarism. This does not change the fact that the meaning of "plagiarism" includes intent, that even with the loosest interpretation still implies intent, and that otherwise the word is reduced to having no distinct meaning, since it would be covered by the much clearer "copying".
Copyright infringement requires no intent, so it is not analogous to "plagiarism". —Centrxtalk • 03:49, 17 January 2009 (UTC)
I've placed plagiarism & definition in a google book search and looked at the first four hits to come my way. "Intent" is not part of this definition, this definition, or this definition. This one doesn't offer a definition, but tabulates the various definitions that exist. I understand that by your understanding of plagiarism, intent is required, but I reiterate that this definition is not universal. And, lo, in good Wikipedia fashion, I provide WP:RS to WP:V that. :) --Moonriddengirl (talk) 19:40, 20 January 2009 (UTC)

break

I've combined addressing problems into one section. It loses detail, but hopefully provides pointers to the relevant guidelines where necessary. Centrx's question does remind me of my original purpose here, though. WP:C currently directs users to article space to find out how to revise to avoid plagiarism. This hardly seems appropriate, since article space doesn't have either the force of policy or guideline. It seems that a subsection on that might be appropriate here. --Moonriddengirl (talk) 00:21, 12 January 2009 (UTC)

Does silence on this point mean that others don't agree or that others don't want to write it? :) I'm inclined to think that such a section would be one of the primary benefits of having such a guide, since our WP:C policy explicitly doesn't answer the question. --Moonriddengirl (talk) 13:34, 12 January 2009 (UTC)
Or silence as in being asleep? ;) Do you mean a section on "How not to plagiarise?" Currently, we just have the links in Resources (on both the project and talk pages), which is certainly an obscure place, so I'd agree a section like that would be a good thing. Beyond linking to resuorces such as Univ. of Indiana as above, I'm not sure exactly what to say though.
Wrt Centrx's comments above, maybe we will also need a "I've been told that I'm plagiarising, what now?" section, or something to indicate that it's not the same as being accused of clubbing baby seals, since you may have done it accidentally, read our section on How not to plagiarise, &c. Franamax (talk) 17:42, 12 January 2009 (UTC)

Comments

Hey all, just looked over this page, a few comments:

  • It claims "anything you contribute to any WMF wiki" is free; the danger here is that some WMF wikis such as Wikinews use licenses incompatible with the GFDL. I'm suspicious of the claim that some Creative Commons licenses are considered compatible with the GFDL - as far as I know we've never accepted text contributions under any CC license. Is there official word on this somewhere?
  • Regarding metadata: it is implied that the primary purpose of metadata is to credit a source and that editing this metadata (which does not require a hex editor - there are tools for editing metadata, e.g. ExifTool) constitutes circumvention of access-control technology under the DMCA. This seems dubious all around to me. Contributors naturally should be permitted to revise metadata where it contains errors or may be augmented. The important thing is that it is not modified to remove or alter a valid attribution (and the DMCA isn't applicable at all). Of course, metadata should never contradict information on the image description page (if it is in fact the metadata that is in error, it should be corrected) - either this or a digital watermark, visible or invisible, is a sign that the image description page information is not correct. Another thing to check is consistency - if a user's uploads indicate different camera data for each image, it's doubtful they actually have 100 different cameras lying around.
  • It's worth emphasizing somewhere that sometimes copyright violations occur on Wikipedia that are not plagiarism; for example, if an article is copied wholesale from a copyrighted source, and a link to it is placed at the bottom. I've seen this happen a few times.
  • Another good example of a PD source used in building the English Wikipedia is the United States Census Bureau census reports, used to build the Rambot articles on U.S. towns. This is interesting to note as it's a source falling in the second category (ineligible rather than old).
  • Note that plagiarism is fundamentally a moral concern with many viewpoints; at one end of the spectrum are people who believe that authors should always have the last word on how their works are used, and at the other end are people who believe that even the requirement to give attribution stifles the circulation of free information. If this were made policy, it would be necessary to seek a wide consensus to ensure that it reflects the moral attitudes of Wikipedians at large. I like the current approach of phrasing the justification in terms of practical issues, like facilitating the consulting of sources.
  • It's worth noting that our policy is to delete images without source information (CSD F4, Category:Images with unknown source), which is generally understood to include author information.
  • On the other hand, it's also worth noting that when author information is unavailable, it need not be specified, as long as the license does not require it; it's important to specify the source so that the license can be verified, but this is a best-effort thing, if the source did not give author information neither can we.
  • In terms of discussing plagiarism with contributors who may have fallen afoul of it, I'd recommend the terms "copying without attribution" and/or "content missing attribution." They're less loaded terms and place the emphasis on the content and on the necessary corrective action (attributing it).
  • It may be worth noting somewhere that plagiarism is primarily concerned with attributing the author of a work, whereas copyright is primarily concerned with the copyright holder. If the author transferred their copyright to a publisher or a client, that doesn't eliminate the moral requirement to attribute them, nor is there any moral requirement to attribute the copyright holder (I'd go so far as to say the former copyright holder of a public domain work is completely irrelevant).

I was skeptical at first that this page would be useful, but it really does appear to address situations that other policies do not concern themselves with - in particular, how plagiarism arises on Wikipedia, and what the appropriate response is. Dcoetzee 20:32, 12 January 2009 (UTC)

Thank you for weighing in. :) (a) I'm responsible for any misunderstandings of metadata, I'm afraid, and really welcome any corrections. Until I waded in, all it said was "consistency of EXIF data." I'm still not entirely sure what that means. I've altered that section, and if you don't think it's helpful for investigating plagiarism, it can perhaps be truncated further or removed. What would one look for there to help in determining plagiarism? (b) Do you have an opinion on a good placement for the emphasis that CV may not be plagiarism? (c) With respect to the divergence of opinion, this has been touched on above in the Wikipedia talk:Plagiarism#Proceeding with implementation section (and probably sooner; I just got here). Do you have an opinion on whether this is better handled by placing this under different title or by retaining the current title and explaining that some defs. require intent, some do not, etc? I like your suggestion about how to approach discussing the matter; those seem like good terms. (d) Good point on the diff between author and copyright holder. (e) (out of order) I know CC-By 2.0 and CC-by-SA are not compatible with text on Wikipedia per. I recently had cause to ask User:Stifle if this was the case with CC-By 3.0, and he confirmed that it is. I don't know the full scope of CC, though, so I have no idea if they have something that is compatible with text. This page used to say, "There are several CC licenses which the author can pick. Some of these do not require attribution, however Wikipedia does not recognize this aspect, we insist that all copied work is attributed." Should this simply be altered back with the word "author" change to "media uploader" or some such? (Still dreading our probable conversion. Whole new game to learn!) --Moonriddengirl (talk) 21:01, 12 January 2009 (UTC)
No problem. :-) Regarding Creative Commons licenses, the original text was mistaken; CC does not offer any license that does not require attribution (with the exception of their Public Domain Dedication, which isn't really a CC license), and if CC-by is incompatible, it's quite likely the others are too as they all build on it. I'm on the fence about changing the page title; if I were to move it it would be to something like "Attribution," "Misattribution," or "Attributing sources." On the other hand, plagiarism is probably the first term a lot of people would think of, if not necessarily the most unambiguous or civil one. I did some editing on the metadata section, and I do feel like metadata is useful to discuss here; feel free to refine it as you please. As for where to discuss CV that isn't plagiarism, perhaps just an sentence or two in the intro section - this isn't a big deal. Dcoetzee 22:24, 12 January 2009 (UTC)
On the CC licenses, I believe that the original text is again mine [6] (which is strange, I'm used to seeing only my "the"'s and "when"'s staying on pages ;). I may have misread the "mix-and-match" bit, currently the choices are all "CC-by", which requires attribution. However, there is a set of CC- which don't require attribution, it's just that they were retired in 2004. If it would simplify the guideline, my text can be retired - so long as the intent in those original words is discarded. That would mean that if you do find CC-licensed material without the "by", it's fair game for copying and it's not plagiarism to fail to attribute the source. Alternatively, change to "Some retired licenses do not require attribution, however...".
The EXIF stuff has been reworded well to better indicate which clues it provides.
Any DMCA (turns to the side, spits on the ground ;) and involved discussion of copyright issues, I'm a little leery on including at length. Altering EXIF data and copying images of others while claiming them as your own to me is much better handled mostly on the copyright policy pages, where the clear legal issues can be covered. Beyond the difficulty of maintaining parallel pages on the same topic, it's much less morally fraught to just say "that's a copyvio" than it is to read this guideline first and say "you're a plagiarist". I'd much rather see more discussion on this page of scanning something from that 1825 textbook and claiming that you made it yourself, or copying an image of a maolecule. (See just above, in the bit before the thread got blurred with other objections) EXIF/copyright issues are pretty well covered by the very first line on the page, "Plagiarism may also be a copyright violation..." - to me, the more we stick as closely as possible to the moral issue of plagiarism (and how to avoid it), the more clear and better the guideline. Franamax (talk) 23:53, 12 January 2009 (UTC)
Hmmm. If the primary purpose of this is to discuss plagiarism as opposed to copyright, I wonder if the section on "acceptable sources" is off-topic. It seems that the whole section could be wrapped up in a sentence like "It doesn't matter where you find information or ideas—whether it is copyrighted or free content—you should acknowledge your source." In terms of plagiarism merely, there really are no "unacceptable" sources. (In terms of WP:V, now....) As far as the EXIF stuff, really, my only point in turning that into text was to try to make clear what it meant. Since I had no clue what EXIF was when I read it, I thought it might be more helpful to readers of a guideline/policy/whatevah to specify what it is and how you check it for consistency. If I've gone off-target in some of what I included, please, yank it. --Moonriddengirl (talk) 01:48, 13 January 2009 (UTC)
Yeah, I agree, I took a stab at it. I think there are two separate topics being addressed here, one is "how do I borrow material from a free source without plagiarizing" and the other is "how do I detect and repair plagiarizing in existing content." I separated these out and also separated text and images. I also removed most of the stuff about Creative Commons; ideally we'd have a place to link describing what licenses are acceptable, but that doesn't appear to exist yet. What do you think? For what it's worth I like the EXIF section as it stands, this isn't discussing them in the context of copyright so much as plagiarism detection. Dcoetzee 05:02, 13 January 2009 (UTC)
Ok, on reviewing, I'm stuck at "Attributing media borrowed from other sources". I'm (literally) sitting here with my 1910 version of Conan Doyle's Best Books with a picture of Arthur (frontispiece) that I want to scan. We're saying here that I can't use any "self" templates - so what do I use? Keep in mind that I'm goal-oriented, I'm gonna upload it anyway, and I'm gonna be pissed off at whatever bot puts the colour-y thing on my talk page. Franamax (talk) 11:53, 13 January 2009 (UTC)
You have a 1910 Conan Doyle? Cool! Actually, I'd like to know the answer to that one. I find images hard to work with. :) Wikipedia:Upload doesn't seem to have an option for "It's really, really old." --Moonriddengirl (talk) 12:17, 13 January 2009 (UTC)
Heh - when I moved here to Kitsilano two years ago, the first thing I did was check out the local music and book shops. Two blocks and five shops away from my place, honest to God, I walked in and asked the same thing I've asked for 20 years: "Do you have The White Company by Conan Doyle? She said "Yes, it's over here". That was my welcome to Vancouver. :) Franamax (talk) 12:30, 13 January 2009 (UTC)
Whew - on further review, that's a pretty radical rewrite. Lots of good stuff is added, although discussion of free sources is gone, it's moved over to "borrowing" now, and - well, just too difficult to assess in one single diff. Dcoetzee, I'd be inclined to revert and ask you to make your changes section-by-section and reshuffle-by-reshuffle so it would be easier to discuss, but that would set back the work you've done. I guess I have a choice here to either sum up my to-do list and ask back some of the original people (what I was planning to do), or just shrug and walk away. In any case, I'm off for a few days and will have only occasional access. Good luck! Franamax (talk) 12:15, 13 January 2009 (UTC)
Ack! You're leaving just as I'm getting into it? Enjoy whatever you're up to. :) Obviously, I'm not as familiar as what existed as you are, but the only thing I see that's been wholesale removed is the section on acceptable sources. Do you disagree that a discussion of acceptable sourcing is off topic for addressing plagiarism? As I indicated above, I'm inclined to think that a simple note of "It doesn't matter where you find information or ideas—whether it is copyrighted or free content—you should acknowledge your source" about covers the question of acceptable sources as it applies to plagiarism. I worry that if we seem to have "instruction creep" we might contribute some confusion to the question of whether this document is necessary or redundant. (And I certainly acknowledge that my first major addition crept far into the land of copyvio....) --Moonriddengirl (talk) 12:34, 13 January 2009 (UTC)
My apologies for doing too much in one edit! I can sum up for you the jist of what I did: I created two main sections, one for creating new content based on free sources and one for repairing existing plagiarism; I eliminated most of the explanation of acceptable sources to copy text from, such as Copyright-expired works/Copyright-ineligible works, since that's a copyright concern; I moved the content "How to properly attribute public domain material" into a subsection of the new section "Attributing text borrowed from other sources"; I moved the last paragraph of this section regarding images into the new section "Attributing media borrowed from other sources", to separate discussion of text and media; I moved the first paragraph of "How to address copied text or images" regarding the EB 1911 articles into its own subsection of "Attributing text borrowed from other sources" and expanded on it a bit. If you have anything else you'd like me to do please just let me know.
As for your public domain image, the simplest thing to do is to go to the upload page, click "Other", and then add your own license tag (see Wikipedia:Image_copyright_tags/Public_domain for the full list). Most PD images are also eligible for upload on Commons; it doesn't have a nice list, but it has Commons:Category:License_tags and Commons:Licensing. Dcoetzee 20:11, 13 January 2009 (UTC)

Plagiarism, definition and intent; guidance on how to paraphrase

I have tried to address some of the concerns about the definition of plagiarism, including intent, in the proposal, here. I'm still wondering if we ought to try to define how to properly paraphrase here or if it is sufficient to provide links to universities that do so as we currently have. This, again, is raised by the fact that WP:C currently directs users to article space to find out how to revise to avoid plagiarism. This hardly seems appropriate, since article space doesn't have either the force of policy or guideline. Franamax seems to agree here that it might be useful, and I'm willing to try to tackle it if others don't think it's wandering too much. It could be a new third section, after "Attributing text borrowed from other sources". It would probably primarily be a (cited) rehash of various university guidelines, such as this one. (Oh, and I've archived this talk page because it was almost 350 kb long. I also added the "talkheader" so as to have a handy place to keep archives.) --Moonriddengirl (talk) 14:51, 13 January 2009 (UTC)

A dissertation on the meaning of "plagiarism" is not relevant to Wikipedia policy, and would serve just as well in the article plagiarism linked from Wikipedia:Copyrights. Name-calling "plagiarism" and lengthy asides do not belong in Wikipedia policy. While a page on copying may be appropriate to summarize, connect, or supplant existing policies on copyrights and citation, the current proposed policy seems to be alternatively a) redundant, at length, with those policies; and b) irrelevant to Wikipedia policy.

More specifically:

  • Almost the entire lengthy introduction is reducible to "Material on Wikipedia copied from other sources must be GFDL-compatible and must be cited."
  • Section "Plagiarism defined" is a) a folksy definition for students that is inappropriate for a Wikipedia policy; and is b) not relevant to Wikipedia policy, which to avoid confusion should be concerned only with defining disallowed actions, that is, copying.
  • Section "Attributing text borrowed from other sources" and section "Attributing media borrowed from other sources" are entirely redundant with Wikipedia:Copyrights and Wikipedia:Citing sources.
  • Section "What is not plagiarism" is downright wrong: these examples, while not copyright infringement, still should have sources; indeed, sources are especially important for the pure facts of infoboxes. This is an example of the confused purpose of this proposed policy: it worries so much about "plagiarism", that it recommends bad practice for citations merely because that omission citations would not be plagiarism! This section is, however, a good example of a pithy summary of a subordinate (proposed) policy.
  • Section "How to respond to plagiarism" has promise, if it concerned itself exclusively with copying, though I suspect it is redundant with Wikipedia:Copyright problems and some of its subpages. —Centrxtalk • 04:29, 17 January 2009 (UTC)
Addendum: To drive home that "plagiarism" is not the proper subject of this policy: Much copying and alleged copyright infringement is actually done by the creators or copyright holders themselves, such as for advertisement. Even if you subscribe to an expanded definition of "plagiarism" the copied work must still be someone else's. All the policy on citation and copying, and the how-to on identifying copies, must apply to that popular mode of copying. If you notice academic policies against a student using parts of his own previous essays, this reveals how special the schools' definition of plagiarism is: the university's "plagiarism" essentially means "any academic act that violates policy". They may wish to invent such a title, but it is a tautology that does not add constructive meaning, and Wikipedia should not import confused disparagement. —Centrxtalk • 04:39, 17 January 2009 (UTC)
(1) I agree that the introduction could be shortened. (2) I disagree that this section is irrelevant and not simply because I drafted it. :) So long as it is currently titled, I think it's necessary to alleviate concerns of those who feel that plagiarism is a mens rea matter. Otherwise, the very title of the document does become potentially bitey. Perhaps it could be shortened and incorporated into an abbreviated intro. (3) There is nothing in Wikipedia:Copyrights or Wikipedia:Citing sources about "Copying within Wikipedia". Other material in that section probably could be shortened with a pointer to Wikipedia:Citing sources, although I think the pd attribution templates are handy to point out. I'm inclined to think the "media" section is beautifully brief and to the point. (4) I may agree with you the "What is not plagiarism" section overstates a bit, unless what is meant by it is that no inline citation is necessary. But citations are not required for "common knowledge." "Puppies drink milk" is common knowledge and also a simple logical deduction (mammals drink milk; puppies are mammals; puppies drink milk). I'm inclined to think that this could be condensed and included in another section (which would eliminate what you do like about it :)). (5) The closest redundancy that I can think of (and I spend most of my time at WP:CP and its various subpages) is Wikipedia:Copyright problems/Advice for admins, but information on how to evaluate for copying seems useful for all editors. I do not believe the media infringement material is redundant.
Your final note seems to be on the point that this guideline should be retitled. I don't know how others feel about that. I myself don't care what we call it, so long as it addresses what to me are the salient points: (a) unacknowledged borrowing is bad practice, (b) unacknowledged borrowing can be fixed, (c) here's how. (With, of course, subpoints of those as necessary to define/explain/expand.) --Moonriddengirl (talk) 20:50, 20 January 2009 (UTC)
  • Mens rea is irrelevant to Wikipedia policy. The policy is the same regardless of motive, and repeated infractions require the same response regardless of motive. See also Wikipedia:Assume good faith. Furthermore, this policy covers actions that do not require any mens rea.
  • The small parts of section "Copying within Wikipedia" that are not already present there, belong in Wikipedia:Copyrights or Wikipedia:Citing sources if they belong anywhere.
  • Common knowledge requires sources just as much as anything else, knowledge considered common is often very much wrong, and any statement of common knowledge can in fact be required to have a source: if challenged a source must be provided. To wit, it is even misleading to say "puppies drink milk": some puppies do not drink milk, though they can, and puppies that drink non-dog milk become sick. Sources clarify this; and leaving such a bare assertion is worse than plagiarism.
  • Evicting the facile notion of "plagiarism" and minimizing redundancy requires more than retitling the page. Unless Wikipedia:Copyrights and Wikipedia:Citing sources are also restructured, essentially all that belongs remaining in this page would be the How-to. —Centrxtalk • 19:54, 21 January 2009 (UTC)
You were the one who kept adding "intentional" to the definition. Mens rea seemed very important to you; perhaps I have misinterpreted that action. At this point, if I'm understanding you correctly to say that failing to source a commonsense assertion such as "Puppies drink milk" "is worse than plagiarism", I think perhaps the communication gap between us, at least, may be uncrossable. :) Perhaps an WP:RFC will help bring wider response. --Moonriddengirl (talk) 20:10, 21 January 2009 (UTC)
  • "Intentional" is accurate for the definition of "plagiarism", but irrelevant to Wikipedia policy. Either this page is about "plagiarism" and refers to intent, and is not a policy, or this page is about copying and does not refer to intent.
  • Misleading, superficial statements imported from the vague intuition of one's own "common knowledge" are indeed worse in an encyclopedia than copied text from a reliable source. —Centrxtalk • 22:46, 21 January 2009 (UTC)
Sometimes, reliable sources don't even seem to cut it. --Moonriddengirl (talk) 23:24, 21 January 2009 (UTC)
"Schaum's Quick Guide to Writing Great Research Papers" and "Guiding Students from Cheating and Plagiarism to Honesty and Integrity" are popularized How-To books for students that adopt the specialized meaning of "plagiarism" used in that industry. They are not reliable sources for the meaning in general; they are not even authoritative works within the field of academia; they are irrelevant, though they would not very well prove your assertion. The "Historian's Toolbox" specifically states "stealing" and "representing...as one's own", which require intent and misrepresentation. "Student Plagiarism in an Online World" is a tiny survey of student opinions, not a reliable source on the meaning of a word. —Centrxtalk • 05:46, 22 January 2009 (UTC)
In any event, it matters not: even under the broad school-wise meaning "plagiarism" is not a fruitful topic for a policy. Those student handbooks refer to turning in "someone else's" work, for assignments that are under one's name "as one's own". On Wikipedia, submissions are anonymous and appear under no one's name, and apparent copyright infringements or uncited text from the company website is prohibited even if the copier wrote the text and owns the company. —Centrxtalk • 06:05, 22 January 2009 (UTC)
Prove my assertion that the definition requiring intent is not universal? I think they do, quite handily, and if we took it to WP:RSN I suspect I'd get consensus on that. :) Proving a universal definition for a term that is utilized in many disciplines in many cultures is a bit difficult. (Here's a book that addresses at length cultural differences in defining plagiarism.) But you're right that quibbling over the definition is fruitless. --Moonriddengirl (talk) 12:35, 22 January 2009 (UTC)
To be clear, as I said above, a meaning of "plagiarism" without intent is still not productive for this page as a policy. That said, the readable part of the book you cite discusses the "concept" of plagiarism and "theft of intellectual property rights" in general; it does not appear to discuss the meaning of the actual word. The reliable sources on the meaning of the word generally, universally being irrelevant, the OED and Webster are clear. In any event, I repeat, submitting your own writings is far from plagiarism, yet would be caught up under this proposed policy, and the ways of identifying "plagiarism" and of dealing with "plagiarism" on Wikipedia are the same as the ways of identifying and dealing with copyrighted and uncited works in general. —Centrxtalk • 04:53, 23 January 2009 (UTC)
Okay. Shorter OED (1993; alas, the most recent I possess) says, "plagiarize...E18. 1. v.t. Take and use as one's own (the thoughts, writings, inventions, etc., of another person); copy (literary work, ideas, etc.) improperly or without acknowledgment; pass off the thoughts, works, etc., of (another person) as one's own." Improper copying doesn't require bad intent. The book above is focused on concept, but also differing definitions, including the term "inadvertent plagiarism" as it is used in the field of psychology (Cryptomnesia). "Inadvertent plagiarism", obviously, is an oxymoron if the essential definition of plagiarism involves intent. In some definitions, it would be. In others, it would not. As far as self-plagiarism and this document, this document currently says, "Plagiarism is the taking of someone else's work and passing it off as one's own, whether verbatim or with only minimal changes" and refers to "duplicating the work of others without credit", so there already seems to be language in place intended to prevent charges of self-plagiarism. If you can identify the section that you fear will be used to prevent self-recycling, maybe we can clarify that. Of course, we do have the challenge of verifying identity. Since copyright is a legal matter, we can't take people's word for it when they say they are Dr. Imminent Authority, publisher of "Important Document", but must have external verification. Perhaps plagiarism will allow for more assumption of good faith; I don't know. The other matter--which seems to come down to overall redundancy--seems to be one on which we simply disagree. WP:C is narrowly focused on the matter of legal concerns. It doesn't care if you cite your public domain source or not. Wikipedia:Citing sources is specifically a Style guide (so it says), and in my opinion is not really the proper place to carry the load of defining academic integrity on Wikipedia. A separate document seems ideal for that to me. I respect that your opinion on this matter differs, but this may be, again, a matter that will require wider community input to resolve as we find out where the will of the community lies. --Moonriddengirl (talk) 12:17, 23 January 2009 (UTC)
It is not a matter of opinion. Any non-short coherent policy on this subject must necessarily cover violations that are not plagiarism. This policy ought to be designed for the concept that covers these violations, not confused with plagiarism. Whether "Important Document" is plagiarism makes no difference for identifying and correcting its uncited unvouched presence. If the purpose is an accessible summary of Wikipedia:Copyrights, or to create a policy that all text copied from elsewhere must be attributed, those purposes are not accomplished by a page on "plagiarism". —Centrxtalk • 03:47, 24 January 2009 (UTC)
So, we're back to the question of title. --Moonriddengirl (talk) 12:39, 24 January 2009 (UTC)
No, the entire page is infused with the limited concept of "plagiarism". There is even an entire section on its definition! —Centrxtalk • 21:37, 25 January 2009 (UTC)

Paraphrasing considered harmful

I'm uncomfortable with having guidelines for paraphrasing to avoid plagiarism. Here is my reasoning:

Paraphrasing is used to avoid copyright violation, not to avoid plagiarism. There is no need to paraphrase to avoid plagarism. To avoid plagiariam, you cite your source. If you cite your source, you have not committed plagiarism. If you fail to cite your source, you are likely to be committing plagiarism and you are certainly violating Wikipedia's rules, even if you paraphrase.

Plagiarism and copyright infringement are independent: you can commit either one, or neither, or both, depending on the situation. Copyright prohibits using the creative aspects of certain works without permission: if you have permission of the copyright holder, you can copy verbatim without attribution without violating copyright (but not at wWikipedia.) You can copy unattributed non-copyrighted work verbatim without breaking the law (but not on Wikipedia.)

Paraphrasing is a mechanism that is intended to permit a writer to extract the non-creative portion of a copyrighted work. There is no equivalent for plagiarism: If you use a source, you either cite it or you are plagiarizing: this is true whether or not you have "removed" the original creative element. As an example, Copyright does not protect "sweat of the brow." That is, even if someone goes to a great deal of effort to compile a huge database, the database is not thereby copyrightable (in the US.) However, if for instance you are a scientist and you use such data without attribution, you will be severely censured.

In my opinion, when a work is not in copyright, we pay much more respect to the original author by copying verbatim than by paraphrasing. We should paraphrase only when we are forced to do so by copyright law. -Arch dude (talk) 01:50, 21 January 2009 (UTC)

Paraphrasing also exists to more concisely summarize material, say in compilation with other sources...which is often part of what encyclopedists do. We may often be put into the position of needing to paraphrase even public domain material, unless we're going to compete with Wikisource. :) And, of course, even if you cite your sources, you can plagiarize whilst paraphrasing if you give the impression that you are summarizing when you are actually reproducing. For example, see this and this (which notes "Even with attribution, plagiarism can exist if the writer paraphrases excessively or quotes without using quotation marks.") --Moonriddengirl (talk) 02:38, 21 January 2009 (UTC)
you are correct: paraphrasing can and should be used to create a more encyclopedic presentation.But this has little to do with plagiarism: just cite the source. Your other argument is that even a cited source result in plagiarism if it the result appears to be your own. This is relevant in an academic environment, but I do not believe that this is relevant to the Wikipedia environment. In academia. a paper is assumed to be the work of the listed author or authors. Here at WP, the article does not have listed authors and the reader should already have a presumption that an article is a collaborative work, Any reader interested in actual authorship will need to look at the edit history, and any editor who copies or paraphrases a PD work should attribute the work in the edit summary. Again paraphrasing cannot mitigate plagiarism and should not be encouraged for this purpose. -Arch dude (talk) 00:11, 22 January 2009 (UTC)
I find your note about the different expectations of academia persuasive. Given that, I think that on reflection I agree: paraphrasing is more an issue for copyright on Wikipedia than plagiarism. --Moonriddengirl (talk) 00:31, 22 January 2009 (UTC)
It depends on exactly what "paraphrasing" means. A paraphrase can be plagiarism if the source isn't cited; even if cited, it should be made clear that the structure and selection of ideas are taken from elsewhere (and not only information). In fact, a close paraphrase can even be a copyright violation in certain circumstances, because creative expression isn't limited to literal wording. So we should encourage editors to express things in their own words, but not give the impression that lengthy close paraphrasing is necessarily "safe". --Amble (talk) 03:04, 21 January 2009 (UTC)
But again, your argument is about paraphrasing to avoid copyright infringement, not about any possible relationship between paraphrasing and plagiarism. My complaint is that there is no such relationship. -Arch dude (talk) 00:11, 22 January 2009 (UTC)
Shouldn't it be judged by the same standard though? To judge the sufficient level of paraphrase to avoid plagiarism, the standard to avoid copyright infringement would be the same. (In other words, "pretend that it's copyrighted") Otherwise, blockquote or make explicit that "this is a direct copy" in one edit, modify in the next. That is the crux for me, am I representing these words/this structure as my own? I have very few choices, a source, an edit summary, my pseudonym on the edit - so how do I indicate honestly that I'm copying stuff almost word-for-word, with a few changes? Franamax (talk) 00:50, 22 January 2009 (UTC)
No. Plagiarism and copyright violation are not mutually exclusive. When a close paraphrase preserves structure and selection of material, there is a risk of plagiarism. In some extreme cases, it may also constitute copyright violation as well. The entire point of a policy on plagiarism is that our standards are higher than the bare minimum demanded by copyright law. My point is simply that organization and choice of material matter too, not only literal wording. This is true for copyright and it's true for plagiarism policy. I think we agree that it's not helpful to present paraphrase as a blanket solution. --Amble (talk) 01:01, 22 January 2009 (UTC)
But structure and selection comprise elements of copyright too, crucially so in some cases. Minimal rewording while using the same structure is still copyvio (while also being plagio). Do you have a specific example of the distinction for a copyright work, where a modification is not a copyvio but is still a plagio?
This guideline developed largely around usage of PD works, and the grey area of GFDL/CC-BY stuff, with attention to the general issue of plagiarism (as in accusations thereof). The distinction between copyvio, plagiarism, moral rights, and whether or not Wikipedia aspires to a higher "ethical" standard than the rule of law is a bone of contention, so please do expand on your thoughts on higher standards. Those standards are important to forming consensus on this guideline. Franamax (talk) 02:29, 22 January 2009 (UTC)
Yes, copying the structure while paraphrasing the wording could be a copyvio. However, the only case I know of, Salinger v. Random House, was an extreme case concerning unpublished personal letters. Perhaps a simpler case of closely paraphrasing an encyclopedia article would also be found to be a copyvio; I don't know. The one example I know of was from an article (now deleted) in which paragraphs were constructed by stringing together sentences from a few (cited) sources, with clauses rearranged and a few words replaced with synonyms. Several editors believed that this sort of paraphrase was acceptable, but I argued (and the consensus seemed to be) that it's unacceptable plagiarism regardless of whether or not it's provably a copyright violation. My main concern is that our guidelines not encourage people to build articles in this way. --Amble (talk) 05:38, 22 January 2009 (UTC)
Wherever it may wind up, I'd also very much like to see people discouraged from this practice. :) --Moonriddengirl (talk) 12:37, 22 January 2009 (UTC)
No. Copyvio and plagarism should NOT be judged by the same standard. We should directly copy and cite PD works explicitly to ensure that we preserve and attribute the original author's words to avoid plagiarism. If the law allowed it, we would do the same for copyrighted works, but the law does not allow it, so we are forced to paraphrase to remove the copyrightable creative elements. After material is incorporated by either of these means, we may choose to (further) paraphrase for editorial reasons to make the article better. Neither of the reasons for paraphrasing (copyvio avoidance and editorial improvement) have anything to do with plagiarism, so a discussion of paraphrasing is not appropriate for the plagiarism policy. -Arch dude (talk) 00:30, 23 January 2009 (UTC)
I think we are talking about different issues here. I have in mind use of copyrighted works where excessive close paraphrase constitutes plagiarism, but may or may not reach the level of copyright violation. From my limited knowledge, the application of copyright law to paraphrases has not been widely tested and doesn't give a clear guide. Your concerns are somewhat different, since you are discussing the use of public domain works as sources of article text. I don't disagree with your points regarding public domain works. --Amble (talk) 01:28, 23 January 2009 (UTC)
Fine. In that case, we need to make it clear that if you paraphrase to avoid copyright infringment, you must still cite your source: paraphrasing, for any reason, does not relieve you of your responsibility to cite your source. This is an even stronger reason to avoid recommending that editors paraphrase to "avoid plagiarism." Plagiarism avoidance is never a reason to paraphrase. The plagiarism policy should have a statement similar to the following: "paraphrasing does not mitigate plagiarism and should not be used for this purpose. If you need to paraphrase for some other reason (to avoid copyright infringement or to improve the encyclopedic content or tone) you must still cite the original source." Since paraphrasing is not used to mitigate plagiarism, there should be no encouragement of the practice and no "how to paraphrase" section in this policy. If a "how to paraphrase" section is added to another policy (e.g., the style manual or the copyright violation avoidance policy) then that section should point back to the plagiarism policy to clarify the (non)relationship between paraphrasing and plagiarism. -Arch dude (talk) 04:42, 23 January 2009 (UTC)

←Just to note that there is now an essay on the subject at Wikipedia:Close paraphrasing. --Moonriddengirl (talk) 00:06, 31 January 2009 (UTC)

Best practices

The essay now contains a passage that is misleading in certain ways. It states: "Material from public domain and free sources is welcome on Wikipedia, provided it is properly identified and attributed. The best practice is to copy free content verbatim and indicate in the edit summary the source of the material. Further changes such as modernizing language and correcting errors should be done in separate edits after the original insertion of text. This allows a clear comparison to be made between the original source text and the current version in the article."

What is meant as best practice, is that in the evolution of articles relying upon public domain material that it's better for text tracking purposes, and as a favor to following editors, for any public domain text being put into an article, to be clearly put in, in one well-labelled chunk. That is better than pasting in PD text and, in the same edit, changing some of the wording, which makes it hard for others to separate what is from the original source, later, if issues come up.

That is not at all, however, best practice in my view, and I think many would agree with me. In my view, if public domain material is going to be introduced, it is best practice to put any such passage from public domain into blockquotes or quotation marks. Then, in later edits, editors can reword material and remove it from quotation marks, when the source no longer needs/requires crediting for its wording. Proper attribution of public domain text involves giving credit both for the source of facts and ideas as is done by footnoting and for wording, which is done by quote marks or blockquoting supplemented by footnotes as to the exact pages of source material.

I argue this latter approach is "best practice" because it is far more rational for the development of articles that ever may be featured articles. Featured article standards have evolved now to disallow use of public domain text covered by a generic PD template. I believe that from some interactions in FAC (though i was never involved very much there and I am not sure of how strongly ingrained this is or not). Also, it should be mentioned somewhere that articles built of pasted-in material are NOT eligible for DYK consideration. Pasting in is not "best practice" if you would like for your work to be highlighted in DYK or FA.

Also, I assert that the "best practice" for use of Federal government material on historic sites is not to paste in public domain material, but rather to go by the second approach. On this I believe I am speaking for most wp:NRHP editors. It may be that "best practices" for use of public domain material vary across types of public domain material and across wikiprojects involved in handling those types of sources.

So I think the paragraph should be revised to suggest its good idea about separating paste-in from subsequent edits, and it should describe more than one approach to using PD text with advocating the paste-in and freeforall approach that it describes. Also it should not be so broadly asserted that PD material is welcome in wikipedia, unconditionally. This passage has just now caused some difficulty between a new editor trying to do right, bumping up against evolved standards on use of Federal material on historic sites and some experienced editors including me who don't want to loosen our standards. doncram (talk) 00:44, 24 February 2009 (UTC)

A recent case at DYK and wp:NRHP has made it abundantly clear that the "welcome" message serves new editors poorly. It is obvious that in 3 areas of wikipedia, at least: NRHP subject articles, DYK nominated articles on any subject, and Featured Articles, that paste-in public domain material not clearly separated by quotation marks or blockquotes from other text is generally really rather unwelcome. Paste-in material from DANFS may or may not still be as welcome as it once was in wp:SHIPS articles; I believe it is no longer welcome in FAC articles from that wikiproject. Paste-in material from eb1911 was once welcome, but it is widely believed that mistakes were made in how it was brought in, and that it is very costly later to try to weed out the pasted-in material as many have been doing. I am also sure that there are vast areas of public domain material which is unreliable, offensive, or otherwise un-encyclopedic. If there are no objections, I will edit this draft guideline to modify, considerably, the welcoming of any and all public domain material. doncram (talk) 01:25, 26 February 2009 (UTC)
I strongly disagree. I'll come back with more later, but I agree with the discussion above stating that Wikipedia is a collaborative project, and incorporating public domain works allows us to collaborate to a greater extent. If someone else has already compiled research for us, it is a waste of our time, pure and simple, to rewrite it (unless issues of reliability, style, etc) so dictate. There is a lot of public domain content on areas that are undercovered in Wikipedia. The quickest way to jumpstart our coverage is to copy-paste, and subsequent editing should get such articles into fine shape. Consideration of the reliability of the source is appropriate, of course, but this is true for any writing here. Calliopejen1 (talk) 07:00, 26 February 2009 (UTC)
doncram, that wording resulted from extensive discussions about use of PD text. Perhaps it should say "best practice for when you need to modify the PD text". There is an essential tension here: we are allowed to freely incorporate PD text into our articles (as witness the thousands of articles created with {{EB1911}}); we must accurately attribute all sources; we must correct factual errors / update with new information; and we should reword old sources for contemporary use.
A blockquote is a great way to go iff it can be maintained. Perhaps a minor correction can be inserted following the quote, as discussed in the archives. PD text can be completely rewritten over time and morph into something completely different, an example case is discussed in the archive (Anadyr River). And I can't find it in the archive right now, but another approach would be to read the PD-source, change it on the fly while typing it in, then attribute the entered text to the PD source. It's that last method that the "best practice" wording is meant to discourage. We need to indicate at some point what exactly was from the public domain. Many of the contributors here agreed that was a clear line: this was from the public domain; here is where it was changed.
There is also (or at least was recently) text indicating that blockquotes are best for verbatim copying. However we also need a best practice for how to introduce and then extensively modify PD text. It's fine to say the best way is to completely rewrite it, but we also need to be open to different editing methods - and as also noted in the archives, lots of people want to contribute, but they're not all good writers.
As far as featured content goes, while I do appreciate your concern in that area, and I would agree that no-one should get a DYK for copying 1500 words out of an old book (or should they maybe? they've created an article and attention is attracted so that it can be rapidly improved, the purpose of DYK, right?) - I think those considerations might be better expressed in the FA and DYK guidelines, where they will be directly relevant. Franamax (talk) 08:05, 26 February 2009 (UTC)
No, the consensus in DYK is that no one should get credit for that. Blockquotes are specifically excluded from DYK word-counting formulae, and failing to put stuff into blockquotes that should be in blockquotes does not earn an exception. I do agree that it would be nice if there were very clear FA and DYK guidelines which could be pointed to. I will look for specific guidelines there, and i can invite regular participants from there to comment here. I will also look to clarify public guidelines in the wp:NRHP area. I don't expect you to take my word on it, simply, please allow me to assert that I know from some experience in FA/FAC reviews and in DYK and in wp:NRHP, that copied in PD text is often not welcome. My main point is that the current "PD is welcome" message is a gross overstatement of the actual state of affairs, in at least some process areas of wikipedia (DYK and FAC) and in some content areas (wp:NRHP) and with respect to some sources of public domain material (material that is offensive, unreliable, and unencyclopedic). There are new editors who come to wikipedia and believe they can make a contribution by pasting in text that they believe is PD. It is often not helpful and is counter to practice in areas in wikipedia, and they should not be unduly encouraged to come and paste stuff in. In practice, it is very touchy telling a new editor that the contributions they are making are not helpful, or are problematic for purposes of building articles towards GA and FA quality. It is better for them and for the "regular" editors in an area not to encourage them incorrectly that mixing in PD text will be appreciated. The effect of encouraging addition of PD text when its welcomeness is not true, which this wp:plagiarism draft guideline currently implements, is in fact mean to new editors. I'm not commenting about paraphrasing or not, I'm not saying material must be paraphrased, it is just clear that practices in some areas of wikipedia is now that verbatim text, if used, is put into quotations or blockquotes and credited both for the wording and the source, not just for the source. It is, in practice, mean to some new editors to assert otherwise in this guideline. doncram (talk) 08:38, 26 February 2009 (UTC)
I (think I) know where you're coming from - and I'm not suggesting you change the DYK criteria, I was just riffing there. I can see from the point where you work that you are faced with many, and often new, people looking for recognition for contributions which turn out to be not their own. To that subject, the current wording may actually be good, since it encourages people to clearly note the exact text they have copied and thus allows you, the reviewer, to ably evaluate how well they have adapted and rephrased it. Otherwise, it seems to me the incentive would be to copy PD with enough subtle changes on entry as to make it very difficult for you to trace down the true source. So we wish instead to present a clear "correct" path to using PD text.
As regards the exact wording of "best practice", indeed it needs to be matched to the desirability of using blockquotes - but recall that bq's preclude incremental changes. It's great to say that it either has to be within an inviolable quote or completely rewritten, and I kind of agree, but that's not really compatible with the style of a wiki, which moves in fits and starts. Fundamentally, we wish to encourage addition of content and remove obstacles to adding it, so if we can draw the clearest path to adding PD works, we should do so.
Now as to editors who "believe" text is PD, that's always going to be a problem. But at the very least, if we encourage them to copy verbatim and cite the source, we can evaluate the text and look for the overlap to WP:COPYVIO.
As far as being mean to new editors, heh, look at the top of the window where it says Wikipedia - that's another word for "mean to new editors", there are dozens of mean places and FAC/DYK is not excluded. But you may be talking about those new editors who come seeking rewards such as DYK's and think there may be an easy path. There just isn't, but I'd respectfully suggest that is your problem, not mine. Put another way, the purpose of the Plagiarism essay is to address the general concept and show the path to adding PD content to en:wiki properly. This essay is not concerned with qualifying for FA/GA/DYK status - that is procedural, not encyclopedic. The vast majority of editors here, and I would suggest the large majority of editors who ever read this, are not concerned with that particular area and are instead amateurs in the true sense of the word. Franamax (talk) 09:27, 26 February 2009 (UTC)
I think it is a basic aspect of respect to mention the source and author of a work that previous contained the information that is being used, regardless of PD or not. If Wikipedia wants to be a better encyclopedia than Britannica, it would be best to follow these practices. What would this mean exactly? Well, we can work out those details later. But I think we should strive for this ideal. Ottava Rima (talk) 15:25, 26 February 2009 (UTC)
I agree with this statement. We need to get away from just adding a note at the bottom re: PD text, to actually using footnotes to explicitly say which parts come from where. When I create articles from PD sources (see Mali for an example that uses a significant amount), I use the template {{PD-notice}} (which I created) to mark footnotes where PD text has been incorporated. Calliopejen1 (talk) 20:39, 26 February 2009 (UTC)
(I haven't read this whole thread, I'm just jumping in) I know there's no legal problem with using PD text, but I have always maintained that it's still not desirable and, while we don't necessarily have to edit this guideline to say "you will burn in hell if you use PD text," we should at least not be encouraging it. For one thing, it reflects poorly on the encyclopedia: most readers don't know a lot about the difference between PD and copyrighted text, and when they look up an article and find it to be the same as something they read elsewhere will think "ah, Wikipedia sucks, bunch of plagiarizers"—indeed, the first time I encountered a WP:SHIPS article that was entirely DANFS text I almost marked it for speedy deletion because I thought it was copyvio. Secondly, even if it's not "wrong" to import PD text, there's no good reason to do it if you're article-building. I'm often struggling to find good references for an article and have a nice, meaty Footnotes section...so why throw away a reference by simply copying it, when you could instead stick it in <ref></ref> tags and beef up your references section and build a nice article out of it? rʨanaɢ talk/contribs 17:13, 26 February 2009 (UTC)
The answer to your question is the History of Cambodia series, and many similar areas of Wikipedia. Not using (as in copy-pasting) this is nearly as good as throwing it away, because no one in the near future (five years? ten years? who knows) is going to research the damn thing themselves. Where we have reliable tertiary sources in the public domain, this is a GREAT reason to use the text for article building. Calliopejen1 (talk) 20:33, 26 February 2009 (UTC)
And just to clarify, I don't have any big problem with explicitly saying PD text is not welcome in DYK (though to a certain extent that is foolish, because tracking down and formatting PD text can take TONS of time as I know from personal experience, and is often worth of rewarding) or FA. These processes should be able to set their rules however they want. My big problem is changing from saying that PD text is welcome as a general matter. Calliopejen1 (talk) 20:37, 26 February 2009 (UTC)
I appreciate what Calliopejen1 is saying with respect to pasted in material providing something for obscure subjects in wikipedia. It is Calliopejen1's personal opinion, not a fact, though, that putting PD text in place for those subjects speeds the day in which a better article is in place. I happen to believe that in practice, having pasted-in PD text often blocks progress. For one thing, it tends to be daunting for new editors, if there is massive material in place written in a certain way. Also, it tends to engender edit wars when others wish to develop the article, perhaps by wiping the slate clean of mixed up PD text and other writing. Also, to whatever extent others do edit the PD text mixed into an article, that work is wasted if the entire material is later wiped out by those wishing to start fresh, in which cases it would seem better for the PD text not to have been put in in the first place. There are differences of taste present, and it is a matter of opinion about which process of article development works best or fastest. doncram (talk) 00:15, 27 February 2009 (UTC)
Okay, I took a shot at revising the "welcome" passage and a bit more, to make it clear that pasting in text is not always welcome. I have tried to put forward a positive example of one place where public domain text has been welcomed, in ships articles using the DANFS source. I've asked one ships editor to check what i wrote. I don't want to discuss whether or not wp:ships and wp:NRHP should take the differing positions that they do about different public domain sources that are available in their areas; I mainly want to get across that there are differences and that PD text is not universally welcomed, irregardless of its quality and the status of the wikipedia articles to which it might be added. Also, I think it needs to be said that it is okay to treat public domain text like other text. I have been involved in situations where another person, adamant that PD text "can" be mixed in without violating copyright law, took the ridiculous position that the PD text cannot be quoted and footnoted like other text. It needs to be said, you can quote from PD sources and treat them just like other sources, as is done generally now i believe in wp:ships articles brought up to FA status. doncram (talk) 00:15, 27 February 2009 (UTC)

Doncram, I have some major concerns about your recent changes:

  • You've vastly increased the length of text. People just don't read that much in one shot.
  • You're hedging around the plain fact that PD text is acceptable, provided it's properly attributed. Whether or not it's relevant is an editorial decision.
  • You are leading off the section by indicating that PD must add to the existing article - but that's most often not the case, PD is more often used to start an article. Some (short) wording about "welcome if it adds significantly" would be good though.
  • You're discussing DANFS, but EB1911 is already in there as an example of generating articles.
  • You're discussing FA and DYK criteria. Those have nothing to do with a guideline on plagiarism, they have only to do with FA and DYK. Have you beefed up those guidelines so that editors interested in those achievements are aware of the requirements? The most needed here is along the lines of "use of PD text may affect article assessment". People reading this are looking for information about plagiarism, not how to get pretty stars for their userpage.
  • You're changing the message. When PD text is copied verbatim, it must be attributed, either in the edit summary or elsewhere. If it's not, that is the very definition of plagiarism. That may not have been worded emphatically enough in the existing text, but it looks to me to have been diluted further.

Please revisit your changes. I'm inclined at this point to just revert them and start over to address some of the concerns raised above. And please be aware of tl;dr - we really need to keep things concise to have an effective message. Franamax (talk) 00:23, 27 February 2009 (UTC)

Thanks Franamax for your comments. I don't know what you mean by "tl;dr", by the way. I'll respond point by point here, rather than within your comments.
    • Length. Yes, what i wrote is perhaps now too long for the purpose here. I think that providing some positive and negative examples of where PD text has and has not been welcomed in wikipedia is important. This could be relegated to a separate article "Wikipedia:Plagiarism/Past use of public domain text in wikipedia" perhaps? I would object to your simply removing these examples, but i do think it could be appropriate for you or someone else to go ahead and split some out to a new article, linked from here.
    • PD is acceptable? Acceptable for what purpose, where? I am trying to get away from too-broad statements that PD text is unconditionally "welcome". I don't think it is unconditionally acceptable, either, in real practice, and on various policy and editorial grounds. Editors are coming here for guidance who want to add in big blocks of PD text, and they need to be told, here, that it is not always welcome or "acceptable", although it may be legal in terms of copyright law. If you want to say that PD text is often acceptable, instead of saying it is often welcome, that is okay by me though.
    • Saying that PD text has often been used to create new articles is okay by me, that is accurate. It needs to be said that is often not welcomed, though.
    • I didn't actually see the EB1911 mention. That is mentioned only in a section further below. At least that section mentions big campaigns to add PD material can be controversial. I think that section should be integrated into this one. It perhaps could be said that adding any new PD material can be controversial, because it can be seen as the start of a campaign to add a lot of PD material. It is not necessary to give too many examples, others can be relegated to a separate linked article. But, I think giving at least 2 examples to convey that the PD sources vary, and the acceptability of adding them varies, is really needed.
    • Discussing DYK and FA criteria is very relevant here, I think. People should not be encouraged to add in PD text without being given some idea that their additions will likely be removed, eventually or immediately, and to suggest there is immediate reason (for getting DYK recognition) to putting in the PD material in blockquotes or quotation marks, one acceptable treatment. This could be more briefly suggested though, I agree.
    • I did not mean to dilute the message, about PD text must be attributed. Please restore the stronger language where necessary. I do think the previous version overly strongly suggested adding PD material with PD attribution template, in lieu of adding PD material using regular quoting and sourcing, which is valid and needs to be suggested as a viable, often preferable in the eyes of editors in some areas, alternative.
    • I would rather you tried to work with what I added, rather than revert back to the previous version. Don't you agree that i added some legitimate, important points and relevant examples that clarify matters? I do agree it should be done with less wordiness. doncram (talk) 01:28, 27 February 2009 (UTC)
Generally, yes. :) And looking over the details of your response, yes. I've just spent the last two hours going over the last two months of changes and I have some notes that might amount to a major reorganization of sections and paragraphs, with the aim of incorporating recent views and making it all more readable. On the other hand, I could well fail in the attempt. Should I accept this mission, it would likely start with moving back beyond your changes of today, with the aim of re-incorporating the intent of your changes. Or not. Seems we're singing from the same musical score though. :) It's just that the normal editing process has not resulted yet in a coherent document. Franamax (talk) 03:26, 27 February 2009 (UTC)
Okay, go ahead and put in a complete rewrite, or edit in place, either way. I appreciate that you've received my input, thanks. doncram (talk) 04:43, 27 February 2009 (UTC)

This section is now disharmonious with the free-licensed section. Why do we have so many caveats for PD text and so few for GFDL text? Calliopejen1 (talk) 19:31, 1 March 2009 (UTC)

I agree, actually. The guideline could do with a complete rewrite. doncram (talk) 19:58, 3 March 2009 (UTC)

He's apparently added all this text to this page to retroactively win a dispute he had (which he'd already won by making me leave Wikipedia, I guess this was the coup de grace). Is this really what you people do with your lives? lol. --Miss Communication (talk) 00:33, 11 March 2009 (UTC)

Indeed my actions here are in immediate response / followup to recent interactions with Miss C, in which she invoked the "PD is welcome" message from here. I'd rather say i was taking Miss C's points about guidance given here and elsewhere seriously, for future new editors, rather than i was trying to "win" an argument with her. doncram (talk) 02:19, 11 March 2009 (UTC)

possible development of wp:pd

Perhaps much or all of this discussion on the use of public domain material should be included in a positive Wikipedia content guideline on using public domain material, rather than under the negative label of plagiarism. And it seems onerous for wp:Plagiarism to cover all proper practices; it should be more about identifying attribution problems and how to contest or remedy them, I think. I think that wp:PD, which is labelled as a content guideline page, should carry a lot of the burden for describing proper options for public domain material use. I've opened discussion at Wikipedia talk:Public domain#where are the guidelines? towards redeveloping that page to serve this purpose. doncram (talk) 19:58, 3 March 2009 (UTC)

Creative "use of force" ...

Wikipedia articles may embed full verbatim texts only when those sources are licensed PD or GFDL. GFDL texts must be referred to the author. But what about all other kinds of free licensed text (mostly including CC-BY and CC-BY-SA)? According to some people I met, CC-BY text can not be verbatim copied in Wikipedia, but needs to be rearranged in order to avoid an original license misuse charge. Brief explanation: the Italian Army officially stated to Wikipedia that most part of Army web site contents are licenced under CC-BY. Some wikipedians on it.wiki do not consider texts released under CC-BY licence fully compatible with Wikipedia GFDL policy as they consider CC-BY licence "would force" Wikipedia articles status to CC-BY too, which is incompatible with present Wikipedia GFDL choice. How can we deal with this and where we can find useful data for better assessing this dispute winner ? --EH101 (talk) 18:43, 28 February 2009 (UTC)

Hmm, I can see arguments on both sides of this. I'm going to copy it over to WP:Media copyright questions and see if anyone there has an opinion. See Wikipedia:Media_copyright_questions#CC-BY_vs._GFDL Franamax (talk) 22:10, 28 February 2009 (UTC)

Direct translation and plagiarism

Hello, could anyone add some description about cautious warning on direct translation and copyrighted derivative works? Since an issue relating to plagiarism based on a direct translation from a foreign language is being discussed at ANI, I think this guideline needs to be addressed on the essay or guideline page. Thanks.--Caspian blue 19:15, 5 March 2009 (UTC)

Dispatches article on plagiarism

Please see Wikipedia:Wikipedia Signpost/2009-04-13/Dispatches - the Signpost's Dispatches' article on plagiarism. Carcharoth (talk) 19:30, 13 April 2009 (UTC)

Attribution templates

Useful reading material at WP:FCDW/Plagiarism.

These templates should all be deleted and this practice of wholesale copy/pasting from other sources into Wikipedia with only a notice at the bottom that the text was ripped from another source, even though that source is "public domain", should be stopped and future practice of it strongly discouraged. This is something that should probably come down from the level of the Wikimedia Foundation itself. Cirt (talk) 05:36, 15 April 2009 (UTC)

Cirt, it may be a long slog through the talk archives here, but there is extensive discussion on the wiki-historical practice of incorporating PD text and how exactly to manage the transition to mercilessly edited text. Many thousands of our articles are based on PD sources, EB1911 is an excellent example. When you advocate deleting the templates, you imply deleting the PD text also. Beyond the loss of content, this becomes impossible for well-integrated articles.
More generally, we have no policy restriction to prevent incorporating free sources. An imperfect analogy is with PD images or audio clips - we use and reuse them all the time. You can compare for instance to Durova's vast work "mercilessly editing" historic images to improve their presentation quality. She states the PD source and notes the changes made. The new work becomes part and parcel of the article. The analogous process with text is no different - we are free to incorporate PD text and improve it. However, there is a large discussion as to how exactly we go about that.
And even more generally, if you explicitly say "I copied this" - it's demonstrably not plagiarism. Franamax (talk) 08:00, 15 April 2009 (UTC)
Cirt: Why? Stifle (talk) 14:28, 15 April 2009 (UTC)
This is clearly a big issue and I am not sure myself what the best answer is or the best way to address this - I just wanted to bring this up on this page because it is a discussion the community should have, to avoid plagiarism. Essentially, read WP:FCDW/Plagiarism for more info. Perhaps it is difficult to address this with regard to the troubling situation of the many articles that copy/paste from other sources without proper attribution to which specific parts of the texts of those articles is copied verbatim from other sources - but maybe going forward we should cease doing this practice in the future (copy/pasting verbatim text from other sources without proper attribution to each specific part of those texts) and instead develop a better way to utilize these public domain sources going forward, so as to avoid plagiarizing them. Cirt (talk) 08:28, 16 April 2009 (UTC)
I have to note that there's already healthy dissent with that portion of the dispatch at its talk page. Although I didn't write that section, I am one of the dispatch's authors, and I myself found the question that Colin raised here pertinent. Of course, I've also advocated at that talk page altering these templates to indicate that language may have evolved to change the views of the original, but that's not a pressing plagiarism issue; at some point, I'll raise that at the templates' talk. --Moonriddengirl (talk) 11:17, 16 April 2009 (UTC)
"...altering these templates to indicate that language may have evolved to change the views of the original." I agree with that. Note that, I'm firmly on the side of Wikipedians who do not want to treat free content the same as non-free content when considering charges of plagiarism. Plagiarism is the presentation of another's work as one's own; reuse of externally-produced free content is fine so long as proper attribution is given. We just need to agree on what is proper. We have a goal to produce an encyclopedia that presents the sum of human knowledge. Nowhere in our goals do I see that we need to do that alone. --mav (talk) 00:49, 17 April 2009 (UTC)
My preferred solution is twofold:
  • First, I think that we should state as best practice that unquoted PD-text is inserted verbatim (or as close as possible) with a single well-indicated edit. "As close as possible" leaves open normal wiki-formatting and fixing that Elizabethan tall-f vs. "s" thing, but precludes injecting originally authored phrases in the PD insertion. Normal editing proceeds from that point, correcting inaccuracies, adding ref's, sentence-wise rewrites - but the original and exact PD copy is preserved in the history, so there can be no doubt as to the authorship. In fact, I'd favour making this an absolute requirement, but others may disagree, see for instance this (possibly tl;dr) talk thread.
  • Second, I favour a triple-barrel approach for PD attribution: 1) clearly indicate in the edit summary that you are placing PD text not written by yourself; 2) Indicate on the article talk page that you are adding PD text, with a link to the article edit and the source, either online, ISBN link, or best effort notation; 3) Place a suitable PD attribution template on the article page (I think the page bottom is just fine).
  • Third, the PD attribution templates would ideally be modified to allow inclusion of edit links to allow the casual reader to easily identify exactly which pieces of PD text were incorporated. Tracing the subsequent changes is just part of the normal wiki-hunt.
  • Fourth, I think that additions of PD text to relatively mature articles should be deprecated or strongly discouraged. Not that I've seen that happen ever, but just in case. However when a valuable source enters the public domain and can be used to initiate a bunch of new articles, I say "heck yes!" - I'm thinking here about about insects and alligators, if we can expand our coverage, why not do it?
I'll leave my "twofold" solution comment at the top, just to show why I didn't choose accountancy as a career. :) Feelings are quite strong on this and I make no claim on my preferred solution being optimal - but I think we can all find a middle ground here. Franamax (talk) 02:11, 17 April 2009 (UTC)
I agree with all of this except the 4th. Often a older PD source will be found which can add sufficiently important information even to a mature article. Often articles are created from scratch without even looking for PD text. A much greater amount of PD material is now of course becoming available on the net, and can be easily used. New PD sources are of course created continually as such things as US government publications and open access material are published. (Non-PD sources not in the public domain in the US will not become PD for many years from now, under current legislation, so we need not deal with that now.). I am concerned that a great many of our "reasonably mature" articles, even many classed as Good articles, can be greatly improved by what may be available in the PD. DGG (talk) 18:22, 19 April 2009 (UTC)
I take your point. What I was thinking of was the potential effect on structure and tone of a mature article when you drop in a chunk of text that was written in 1905. I have no problem whatsoever with adding new information, but I think extra care has to be taken to properly integrate it.
For instance, if a text on Silver maples is released by the US Forestry Service, it's not appropriate to blanket copy it into the existing article. It will largely duplicate what's already there, with a different structure and tone of writing. IMO that would actually subtract from the quality of the article. However, directly copying a section, lets say "Common pathogens of the silver maple" which is not currently in our article - yes, that would be a net benefit I guess, since it rapidly expands our article and makes the free text available for improvement through the normal editing process. I'll back off the "strongly discourage" bit in favour of "use extra care" - how's that? Franamax (talk) 19:17, 19 April 2009 (UTC)

Promotion to Guideline status

Another editor has promoted this to guideline status but I have reverted as it seems that there are too many loose ends above and no clear process of acclamation to indicate that there is consensus for this guideline in this form. Colonel Warden (talk) 07:16, 24 April 2009 (UTC)

All guidelines and policies are subject to additional changes. Nor is a 'clear process of acclamation' normally required. This has been in proposal stage for long enough, has been fairly stable, and there is minimal opposition to the notion that Wikipedia ought to include plagiarism within its formal policy and guideline structure. It's somewhat of an embarrassment to the project not to have it. Nonetheless, I will open a request for comment on the proposal. Note that formal RFC is not a requirement for guideline promotion. DurovaCharge! 18:24, 24 April 2009 (UTC)
No, but consensus is. And right now there is no consensus for the guideline in it's present form 189.105.47.84 (talk) 20:00, 24 April 2009 (UTC)
Since this is the only edit by this IP address, presumably you have a main account? DurovaCharge! 20:49, 24 April 2009 (UTC)
I fail to see the relevance of my status, but no. I just happen to have a dynamic IP. 189.105.47.84 (talk) 20:53, 24 April 2009 (UTC)
From Wikipedia:Requests_for_arbitration/Privatemusings#Sockpuppetry: Sockpuppet accounts are not to be used in discussions internal to the project, such as policy debates. Since there is no visible evidence of any other edit history outside this discussion, your declaration that the proposal lacks consensus might be ignored by uninvolved editors. DurovaCharge! 21:02, 24 April 2009 (UTC)
How convenient! Except this is not a sockpuppet account. I've been here a long time, editing with my IP adress which just happens to be dynamic. (dynamic IPs are the norm in my part of the world). Also, people don't need your permission to ignore me if they so wish. If you don't want to engage me then thats you prerogative but don't try to discredit me for no reason other than that you disagree with what I've said. You should know better 189.105.47.84 (talk) 21:12, 24 April 2009 (UTC)
Perhaps you should consider registering for an account. There are many advantages, including establishing a reputation by which other editors may know you (for more on that, see Wikipedia:Register#Reputation and privacy.) As to consensus, the RfC will determine that. --Moonriddengirl (talk) 21:17, 24 April 2009 (UTC)
Agree with Moonriddengirl. Alternatively, you could link to prior IPs that you used to demonstrate a consistent edit history. DurovaCharge! 23:55, 24 April 2009 (UTC)

Guidelines on charges of plagiarism

There have been calls to clarify when charging a wikipedia editor with plagiarism is fair or not. I think this can be clarified in a guideline by following the framework for evaluating plagiarism by academics set forth by Roger Clarke in published version "Plagiarism by Academics: More Complex Than It Seems" and in preprint version with more weblinks "Plagiarism by Academics - A More Complex Issue Than It Seemed". I highly recommend reading Clarke's discussion of ways in which plagiarism is culturally bound (a more Western concept) and how music and other artistic expression often includes explicit copying or more veiled references that might be spoiled by heavy-handed referencing, and more fascinating thoughts.

I am working with the basic definition that plagiarism is a state of under-attribution of a work. In simple terms, a work is plagiarized if the degree of attribution present is inadequate, relative to what is reasonably expected for the work, by the most readers of the work, and/or by the typical reader of the work. We can obviously disagree on what is the right degree of attribution required, so we will often disagree on whether a work is plagiarized or not. Where we can recognize that disagreements are honest and reasonable, I think we should usually avoid calling another wikipedia editor a plagiarist. Also, I think we should usually comment on the article or the action taken ("the article is plagiarized" or "the edit taken by this editor amounts to plagiarism") rather than comment on the person (be slow to say "the editor is a plagiarist"). But I think that we should embrace the language of calling out plagiarism when we see it, where the term is apt.

Clarke sets forth that plagiarism by academics is more or less serious according to five factors. Applying those to wikipedia setting, I think that plagiarism is more serious in wikipedia when it appears towards one end of each of five scales:

1. intentional vs. accidental: when the plagiarism is more intentional than accidental

2. salience, or nature of the work: on the high end, when the plagiarism appears in an article that is included in a Wikipedia 1.0 or whatever version of wikipedia that is released on CDs, or slated to be published that way, or if it is in a Featured Article, a Good Article, or an article nominated for FA, GA, or DYK. On the low end, when an article is very rough, new, in a Userspace sandbox, or marked with {{Underconstruction}}

3. claim of credit: on the high end, when the extent to which originality is claimed is higher (as when the article is nominated for FA, GA, or DYK, or as when an editor claims on her/his userpage that s/he wrote or contributed to the article. The claim of credit can be explicit or implicit.

4. nature of incorporated material: when it is higher on the scale provided by Clarke:

   1.  verbatim or near-verbatim copying of:
          * an entire work (e.g. a book, book chapter or article);
          * a substantial part of a work (e.g. a section; or the diagram, image or
             table around which an entire work revolves);
          * segments of substantial size (e.g. paragraphs);
          * segments of moderate size (e.g. sentences);
          * novel or significant segments of small size (e.g. clauses, phrases, 
             expressions, and neologisms); 
   2. copying of ideas that are highly original;
   3. paraphrasing of segments of substantial size, without new contributions;
   4. paraphrasing of segments of moderate size, without new contributions;
   5. verbatim or near-verbatim copying of unremarkable segments of small size (e.g. 
       clauses, phrases, expressions, and neologisms);
   6. paraphrasing of segments of small size, without new contributions;
   7. copying of ideas that are somewhat novel;
   8. paraphrasing of segments of substantial or moderate size, but which include 
       new contributions;
   9. copying of the structure of the document, or of the argument, or of the sequence 
       of information presentation or 'plot';
  10. copying of ideas that are long standing. 

(copied directly from Clarke, http://www.rogerclarke.com/SOS/Plag0602.html This passage is copyrighted material of Roger Clarke, per http://www.rogerclarke.com/CNotice.html. I believe that this is fair use to state this much here, but I am trying to contact Clarke about that. If not, then this might technically be copyvio, although it would not be plagiarism because it is explicitly attributed. doncram (talk) 23:19, 2 May 2009 (UTC))

5. nature of attribution provided: when the clarity of attribution is lower. For example, when there is no attribution, or when just a general PD template is present, rather than explicit footnotes following each specific idea in material copied plus use of explicit quotation marks for any creative wording.

Therefore, if I see an article involving copy-pasted material, 1) where the principal editor is experienced and known to be aware of guidelines, 2&3) where the editor is putting the article forward for DYK credit, 4) where the article includes long verbatim copied passages of distinctive, original, creative material, and 5) credit to the source is only given as just an External link, then I will call that blatant plagiarism. If the editor has repeatedly done this, despite being called out on it before, then eventually it is fair to call the editor a serial plagiarist.

On the other hand, if I see an article involving copy-pasted material where 1) the editor is a newbie, 2) the article has just been started in a sandbox, 3) the editor has not bragged about it anywhere, 4) there has been extensive reworking of the material into paraphrases which no longer include distinctive wording (except for quoted phrases), 5) there is explicit footnoting and use of quotations where appropriate, then I will not call that plagiarism. doncram (talk) 22:56, 2 May 2009 (UTC)

Self Plagerism

I am new to the community although I have made a substantial number of edits and corrections over the years where I had information.

Forgive me if I transgress on the culture, but I think that one aspect of Plagerism is being overlooked.

The area is where one plagerizes from one's self. Here is an actual example that I faced a little while ago when editing an article.

The article had very little detail about inportant aspects and did not even include some critical material. I had authored a book which had covered some of the missing material. To invest the time to write original material to fix the article was more of a time comittment that I was willing to make at the time. However, it would take little time to just take appropriate passages from my book and include it in the article.

I considered the idea of quoting myself and in trying to keep faith with the non-commercialization philosophy of Wikipedia, I decided against it. In a way it was a promition of my book, which if purchased would yield me royalty money. Further, it might seem like self aggrandisment.

I did not think that there was a copyright issue as I owned the copyright to the original book, and if I wanted to use the material and make it available subject to the Wikipedia license, that is my legal right.

However, I did use copyrighted material and did NOT give any attribution.

I believe that the policy under consideration should have a general exception where the contributor is using material which he has written and owns.

In contrast, I have also added material to articles where I have provided my own publication as a source documentation. But I have reserved this for the documentation of a quotation where my material was one of the major references if not they major reference in the field.

As Wikipedia matures, experts and authorities who were initially very skeptical of the Wikipedia concept had altered there opinion and see the benefits as far outweighting the shortcommings. It would seem that this is a good thing. I for example have been paid by Encyclopedias Brittainica, Americana, and the New Book of Knowledge to write "authorative" articles. For those areas where I have some expertise, it is desirable to contribute to Wikipedia. But when I sign my name to an article, I am responsible for the accuracy of the article. Where the article is unsigned and I or some other expert might be one of many contributors to the article there needs to be some guidance about quoting ones self.

LDEBarnard (talk) 18:08, 3 May 2009 (UTC)

With respect to using copyrighted material without giving attribution, please see Wikipedia:Donating copyrighted materials. You should provide verification of permission for the text. --Moonriddengirl (talk) 18:21, 3 May 2009 (UTC)