Wikipedia talk:Plagiarism/Archive 3

Latest comment: 14 years ago by Franamax in topic Promotion to guideline
Archive 1 Archive 2 Archive 3 Archive 4 Archive 5 Archive 10

Accidental plagiarism in a wiki environment

The peculiar nature of the multi-editor wiki environment makes a rare form of accidental plagiarism possible. Let me see if I can lay out the steps that produce this:

  • (1) Editor A paraphrases source Z to add a paragraph or section to an article, and cites the reference. During this paraphrasing process, they present the same information in a different way, and deliberately omit some details mentioned in the source (for fear of engaging in wholesale copying from the source).
  • (2) Editor B comes along later and reads the article and reads source Z. They see that some of what is in source Z is not mentioned in the article, and so they add this information to the article, citing source Z.
  • (3) Editor C comes along and thinks that the section written mostly by editors A and B doesn't flow very well, so editor C rewrites the section to make it flow better and changes some of the words used as well, unintentionally changing the article to use some of the words used in source Z.
  • (4) Editor D comes along and reads the article and then goes to check source Z to verify what has been written in a particular section (the one jointly written over time by editors A, B and C). Editor D notices that this particular section of the article is very similar to the cited passage in source Z, and suspects plagiarism has taken place.

Has plagiarism taken place? Is this a copyright violation? Is this a problem unique to Wikipedia? What went wrong? My view is that editors B and C should have read source Z before making the changes they did, but in practice it is hard to hold people to that. I'm uncertain whether this phenomenon of articles regressing from paraphrasing back towards the wording used in the source, due to the nature of editing distributed over time and different people, is rare or not. But I believe it is certainly possible. Carcharoth (talk) 00:23, 3 May 2009 (UTC)

I think that kind of process is indeed possible. I think we should be clear that the article is plagiarized, though, however it got there, if its current state is such that it is under-attributed relative to reasonable expectations for what attribution should be. I think we have to leave editor intent out of a useful definition of whether a wikipedia article is plagiarized or not. Also, it is further irrelevant whether editors in the previous edit history were very clear with their edit labels and their original paste-in or not. The prior edit history is irrelevant. We need to be able to judge an article as it appears, comparing to the sources from which it was developed. You've described a situation where the article is plagiarized, although I agree no editor should be called a plagiarist for their role in it getting there. It is our job as a collective of editors to try to prevent such situations from arising frequently though. For example, I believe that paste-ins of PD text plus use of PD templates will often tend to lead to plagiarized article situations, while a different guideline on how to introduce PD text would work better in avoiding future plagiarized state situations. doncram (talk) 00:54, 3 May 2009 (UTC)
Possible, yes, but highly improbable and not worth worrying about. --SmokeyJoe (talk) 11:58, 12 May 2009 (UTC)

Plagiarism or excellent article?

Our FA on Rabindranath Tagore extensively cites a single work, Dutta & Robinson. Is this plagiarism, or is it just an excellent article, based on the most authoritative source available? Jayen466 19:22, 3 May 2009 (UTC)

If the source is cited, it is not plagarism. There may be other problems, but not "plagarism". --SmokeyJoe (talk) 12:00, 12 May 2009 (UTC)
NO, just citing the source is not enough to avoid plagiarism. The way the sourcing is done has to give "adequate" credit for the nature of the work, the degree to which you use ideas and actual words from it, and so on. For example, if you include verbatim passages, you may cite the source in a footnote, but you would further need to give credit for wording by using explicit quote marks around those verbatim passages, to convey credit for the actual wording as well as for the ideas. Plagiarism happens when credit given is less than credit due. This is already discussed in this discussion thread, but I respond here to SmokeyJoe because I think this is important to understand. doncram (talk) 23:00, 12 May 2009 (UTC)
Plagiarism happens when credit is not attributed. An obscured or hidden citation might not be considered a citation. Compound uses of the same source complicate things, which is the case in your example. If a component, the quote, is not cited, eg implicitly by the use of quotes, then it was not cited. Starting to split hairs? In the example, drawing exclusively from a single source is not plagiarism if it is clear that the material is drawn from the source it is drawn from. The example is not a case a plagiarism (unless of course you are alleging unmarked quotes or slabs). --SmokeyJoe (talk) 07:20, 13 May 2009 (UTC)
Without access to the text, how is one to know? Google book, unfortunately, doesn't offer a preview. :/ --Moonriddengirl (talk) 19:41, 3 May 2009 (UTC)
I thought there might be a potential problem of "substantial taking", independently of the quality of any paraphrasing, on any topic where there is really only one authoritative standard work. For example, this might be the one available, definitive biography of a minor historical figure, or perhaps also a book or paper by a highly-regarded theoretical physicist who is considered to have written the most authoritative work on a particular topic. For an apparent example of the former, see this GA: Hugh Trenchard, 1st Viscount Trenchard. More than two-thirds of its 150 footnotes are to Boyle 1962. Even assuming there is good paraphrasing, most of the article is clearly based on that work. By some of the academic standards we have discussed here, this is plagiarism. Jayen466 21:11, 3 May 2009 (UTC)
From a plagiarism standpoint, if the paraphrasing is adequate and attribution is good, I don't think we can have much of an issue. But given the length of this page, would you mind pointing out the specific in this conversation that you think might make it so? :) --Moonriddengirl (talk) 21:21, 3 May 2009 (UTC)
Sure. A source you posted earlier gave a test for plagiarism (1) and a test for "substantial taking" (2). Substantial taking is one of the indicators of copyright infringement. The tests were worded as follows:

(1) You may not escape plagiarism even if you give attribution. This happens when you copy or paraphrase excessively so that the work as a whole is clearly not your own although you give attribution. You are presenting someone else’s work as your own. [...]
(2) How do you decide in practice whether there has been a ‘substantial taking’? One practical test is that your quotes or paraphrasing from a particular source should not be a substantial portion of your work or a substantial portion of the copyright work. This can be measured by quantity (number of words) or quality (relative importance in the copyright work and/or your work of the portion quoted/paraphrased).

Of course, substantial taking is just one half of the copyright infringement test. The other half is "fair use", also discussed in the cited text. I felt confident there until you pointed out to me that our licence allows any and all re-use of our texts, including commercial use. Jayen466 21:49, 3 May 2009 (UTC)
Shouldn't this be strictly a copyright test though? In my view, relying heavily on a single work is not plagiarism if you make clear what you are doing. This again brings up the theme here of the putative difference between academic plagiarism and "Wikipedia" plagiarism. Wikipedia-wise, if only one source is available to create an article, you still go ahead and create the article. If it's a copyrighted source, you get into the whole "substantial taking" and "fair use" thing - but think of it instead as basing a wiki-article on a book from 1895. Would that be plagiarism? This teases out the copyvio issues from the plagio issues. Franamax (talk) 22:16, 3 May 2009 (UTC)
I'd say using the 1895 book would be plagiarism if the book is not acknowledged. It should be cited like a normal source and, if used as a basis for long sections or indeed the entire article, acknowledged in the text. No pressing need to otherwise reformulate what it says though. E.g.: "According to the Victorian Encyclopedia of Entomology (1895), the Lesser Frisian Springtail is a unique member of the springtail family characterised by ..." Jayen466 00:49, 4 May 2009 (UTC)
(edit conflict) Ah, I see. Well, we should be clear on the distinction between copyright and plagiarism; the two may overlap, but (obviously) are not the same. Substantial similarity is a legal term with specific points of definition in copyright law. It's covered, albeit briefly, in our copyright FAQ. It isn't a test for plagiarism. (As an aside, I believe your "two tests" are conflating two separate parts of that document. Your first test isn't a test; it's drawn from page two (and is listed as "secondly.") It's just an authorial point. The actual "test" is from page 3, and it's step 1 of "TWO separate stages or tests in determining whether copyright infringement occurs.") (emphasis added).
As for copying or paraphrasing so excessively that the work is not your own even if attributed, which I suppose is where the concern arises about over-reliance on a source (and hence fear of substantial taking): I think we need to look at the standards of encyclopedias as set out by the American Historical Association, at least, in its Statement on Standards of Professional Conduct:

Of course, historical knowledge is cumulative, and thus in some contexts-such as textbooks, encyclopedia articles, broad syntheses, and certain forms of public presentation-the form of attribution, and the permissible extent of dependence on prior scholarship, citation, and other forms of attribution will differ from what is expected in more limited monographs. As knowledge is disseminated to a wide public, it loses some of its personal reference. What belongs to whom becomes less distinct. But even in textbooks a historian should acknowledge the sources of recent or distinctive findings and interpretations, those not yet a part of the common understanding of the profession.

Unlike many scholarly works, compendiums such as textbooks and encyclopedia articles are not presented as our own. Properly attributed facts are not a problem, even if over-reliance on language is. I would be uncomfortable including "distinctive findings and interpretations" without indicating where these came from, but basic facts with in-line citations seem fine.
Of course, your point 1 is quite within my own definition of plagiarism where it comes to close paraphrasing...insufficiently revising the creative elements of presentation. It may be a legal concern if the text is copyrighted, and it may also be plagiarism. It's just not that I can see a concern when it comes to sources of information...not if these are cited and (where necessary) attributed in text. --Moonriddengirl (talk) 22:35, 3 May 2009 (UTC)

(outdent) The document by Roger Clarke here, already referenced by Doncram above, includes a case study on plagiarism in textbooks. It states that "The literature search yielded a disappointing quantity and quality of guidance. The scope of most references was narrow, and very few directly addressed textbooks." It quotes the AHA document you quoted above, as well as an essay by the religious scholar, Hexham.

The Hexham quote Clarke gives may be of interest: "Many basic textbooks contain passages that come very close to plagiarism. So too do dictionaries and encyclopedia articles. In most of these cases the charge of plagiarism would be unjust because there are a limited number of way in which basic information can be conveyed in introductory textbooks and very short articles that require the author to comment on well known issues and events like the outbreak of the French Revolution, or the conversion of St. Augustine, or the philosophical definition of justice. Further, in the case of some textbooks, dictionaries, newspaper articles and similar types of work both space and the demands of editors do not allow the full acknowledgment of sources or the use of academic style references. ... It ... therefore seems necessary to distinguish between academic and other types of writing and to ask what is the reader led to believe an author is doing. If a book or thesis contains academic footnotes, is written in an academic style, and is presented as a work of original scholarship, then it must be judged as such and measured against the accepted rules for citation" (bolding added).

Clarke adds, "Given how trenchant Hexham is in his condemnation of plagiarism in scholarly work, the distinction he draws between criteria for scholarly and textbook writing is telling."

Under Exhibit 1, Clarke then gives the following guidelines:


The approach to incorporation and attribution in a textbook should:

  • avoid citations intruding into the presentation in such a manner that they detract from the primary pedagogical objective;
  • avoid not only express claims of originality, but also implied claims, and language that could mislead the intended audience into inferring that the work is original; and
  • provide ready access to works on which the author has drawn heavily.

Generally, incorporation should avoid the use of quotation marks, because these intrude too much. On the other hand, the use of verbatim, near-verbatim and close-paraphrase passages imposes yet greater expectations on the author in relation to attribution.

In the case of generic attributions to well-known authors (e.g. Piaget, von Neumann, Newton), and of well-known and well-documented quotations used in section and chapter headings (e.g. Keats, Martin Luther King), it may be reasonable to name the author, but nominate no specific work. Generally, however, attribution should be achieved through one of the following mechanisms:

  • Harvard-style citation, perhaps without page numbers. This approach adds to the length of the text, but minimizes the interruption of the flow;
  • numbered footnotes or endnotes. These have much less impact on the length of the text, but are nonetheless disturbing to the reader because of the uncertainty as to whether the note contains information of relevance, and hence as to whether the break in concentration is warranted that is involved in a diversion to the note;
  • no citation within the text, but attribution to the source in notes at the end of each chapter or the book as a whole. A refinement to this approach is to include within each endnote a key to the page number and line number in the text where the source has been used;
  • mention of the name of the author at the beginning of the relevant segment of text, or perhaps within the relevant segment of text, and inclusion of a reference at an appropriate point elsewhere in the publication;
  • mention in the Preface or Introduction of the authors and works used as sources during the preparation of the book.

Precise descriptions of all works to which attribution is given need to be provided. The alternatives are listed below, commencing with the most preferable:

  • a Further Reading, Recommended Reading, and/or Primary Sources List at the end of each chapter or section, which contains all works that were drawn on during the preparation of that segment. Particularly important references can be supplemented with annotations;
  • a single Reference List at the end of the book, which contains all works that were drawn on during the preparation of the book;
  • a Bibliography at the end of the book, which contains both works that were drawn on during the preparation of the book and works that were not.

Again this notes that the use of close paraphrase should go hand in hand with more explicit attribution. Of course, numbered footnotes are standard here in WP, so Clarke's comments on their disturbing the reader are less relevant. In my view, any use of close paraphrase or direct quotation should be accompanied by (1) naming the author in the text (2) adding a footnote reference to the specific work and page number at the end of the quotation or paraphrase and (3) listing the work with full publication data among the References, if the article has a separate reference section. Clarke's suggestion not to use quotation marks for verbatim quotes seems appropriate enough for textbooks, but is inappropriate for WP, I think. WP should use quotation marks around direct quotations. Jayen466 23:40, 3 May 2009 (UTC)

Wikipedia should use quotation marks around all text copied from PD sources and it can never be subsequently modified? What form of "quotation mark" shall we use for media copied from PD sources and processed through image enhancement software?
This goes to the heart of the matter. Can we or can we not freely incorporate and modify text and media which have been placed in the public domain? Some say no, it must remain inviolable within quotations. Others say it can be freely modified so long as it is properly attributed. Remember, we are not talking about copyright violation here - we already have that subject covered. Franamax (talk) 00:15, 4 May 2009 (UTC)
I'm sorry, I mostly think of copyrighted sources in these discussions. I did not mean to give an opinion on how to treat PD text. If asked for one, I would say modify, rephrase and expand freely, but cite each individual sentence based on the PD source. In other words, cite it as if it were a copyrighted source, but you don't need to paraphrase or restate to prevent copyright infringement, and you don't need to place quotation marks if it is a verbatim quote from a PD source. However, try to mention the source in the text: "According to the 1911 Encyclopedia Britannica edition, ..." if it can be elegantly done. That would be my 2 cents. Jayen466 00:32, 4 May 2009 (UTC)

Clarke includes a section on "Necessary or Inherent Plagiarism":

The case study of textbook plagiarism in the previous section has relevance beyond textbooks alone. Parts of many other kinds of publications are intended "to make existing knowledge accessible" and to "address a particular market need" or "a particular audience." In particular, various sections of scholarly works such as refereed journal articles and conference papers, theses, and academic monographs, have an expository purpose, in relation to pre-existing knowledge.
The preliminary sections of many works comprise the recitation of existing bodies of theory, in order to set the stage for extensions to, criticisms of, and/or testing of, that theory. If such recitations stray too far from the words used by prior theorists, then the author of the new work would be subject to accusations of misrepresentation or at least inaccuracy. Hence it is very challenging to 'use one's own words' while being faithful to the sources. Paraphrasing and generic attributions are therefore tolerated. The context in any case implies that little or no originality is being claimed. There is accordingly tacit acceptance of practices that would otherwise be castigated as plagiarism.

Jayen466 05:49, 4 May 2009 (UTC)

RfC

Promote this to guideline? DurovaCharge! 18:44, 24 April 2009 (UTC)

  • Note: The results of this RFC should be considered support for a Plagiarism guideline in general, not for a particular version of this page during or at the end of the RFC:
    • Most supporting comments state support only for a general Plagiarism guideline.
    • Indeed, few people edited the page at all during the RFC, and the state of some parts of the page at the end of the RFC, indicate few persons read it through. —Centrxtalk • 05:37, 21 May 2009 (UTC)

Reasons for

 
Plagiarism is not identical to copyright.

A plagiarism statement ought to have been policy years ago. This was overlooked because WP:PLAGIARISM was a redirect to other pages. First here, then here. The problem was that neither redirect defined plagiarism, and copyright is a separate concept. This page has been in proposal stage since June 2008 and was highlighted in the 13 April 2009 Wikipedia Signpost under the title "Let's get serious about plagiarism". Promotion to guideline is essential to establishing credibility for this project, and also essential for explaining proper citation requirements to plagiarists. It's high time to promote the page. DurovaCharge! 18:44, 24 April 2009 (UTC)

Reasons against

Wikipedia has survived without a policy or guideline on plagiarism for eight years. The legal requirements for submitted text and images are clearly dealt with in the WP:COPYRIGHT article. There is no concern with keeping this article on plagiarism in Wikipedia for reference but to upgrade it to a guideline is unnecessary. At around 4,500 words this proposed guideline adds further bloat to the already bloated collection of guidelines in Wikipedia. The addition of yet another guideline may discourage new editors from editing and give the incorrect impression that Wikipedia is more focused on policies and guidelines than producing a well-written encyclopedia. The Wikimedia wiki noted that it was desirable to avoid instruction creep back in 2004. —Preceding unsigned comment added by Cedars (talkcontribs) 27 April 2009

Everything in this proposed guideline is covered by existing policies and guidelines, specifically WP:V and WP:CITE, thus we don't need to cover the same ground. (Discussed in the talk archive here and in other places). Included for completeness of discussion, not as my view. Franamax (talk) 02:50, 29 April 2009 (UTC)

Discussion

  • Yes, promote this to guideline. A guideline on plagiarism is long overdue; if Wikipedia is to be taken seriously as a work of scholarship, it needs to acknowledge scholarly standards. --Moonriddengirl (talk) 19:05, 24 April 2009 (UTC)
  • I agree that this should be a guideline. Plagiarized material brings disrepute on the encyclopedia.   Will Beback  talk  19:08, 24 April 2009 (UTC)
  • It should honestly be a policy, but one step at a time. Ottava Rima (talk) 19:19, 24 April 2009 (UTC)
The guideline is specifically needed to help new wikipedia contributors. In several cases I have seen new contributors very much turned off by wikipedia, upon their adding material and then being criticised for possibly plagiarizing. In the absence of a guideline, it is very confusing and dismaying for new users, who can believe they are contributing within all rules, but encounter editors (me included) asserting that at least some types of public domain text is not wanted and/or that sourcing should be done differently. It is much better for the new contributors to be able to point them politely to a guideline. And it is better to have some experienced editors focused on refining a central guideline, rather than having practice emerge out of conflicts with new users. doncram (talk) 19:13, 27 April 2009 (UTC)
  • (Scrambles for notes...) I'm not sure I'm totally comfortable with the current state of the proposal and the recent Dispatch page seemed to turn up a minor division on the issue of quoting/not quoting PD text (for which read a possible vast yawning chasm). :) No objections to making it a guideline though, it might get appropriate attention that way and be fleshed out faster. Also, a guideline might facilitate establishment of a formal forum where plagiarism concerns can be brought. It's a hugely imflammatory term, since it carries with it the suggestion of dishonesty. You can call someone a troll or an idiot and they still have their dignity. Call them dishonest though... Hence the need for central treatment. Franamax (talk) 22:05, 24 April 2009 (UTC)
Strike my lack of objection, since I'm not happy with the extant text. I'm no longer convinced that this can be furthered with more success as an existing guideline, better to improve as a proposal and resubmit.</s,a;;> Franamax (talk) 03:55, 29 April 2009 (UTC)
  • Comment: a read through the archives is well worth the time, since some pretty long-term and experienced editors had input and perspective on almost every issue that may come up in future discussions. Franamax (talk) 22:05, 24 April 2009 (UTC)
  • Wikipedia policy development typically is descriptive, rather than prescriptive. Policy and guideline documents are generated to help people respond to recurring situations (rather than reinventing the wheel each time), to codify best practices (to keep track of what works), and to allow us to maintain some consistency across the project. Plagiarism is definitely a recurring, persistent problem. Having a coherent, cohesive guideline which describes our most effective strategies for dealing with plagiarism is long overdue. Give this one the rubber stamp. TenOfAllTrades(talk) 00:29, 25 April 2009 (UTC)
  • Absolutely. It's also worth saying that Carcharoth put a fair bit of work into this at one point. But the lack of a guideline on plagiarism has been a glaring absence here. --jbmurray (talkcontribs) 00:43, 25 April 2009 (UTC)
  • Yes, this is an important step towards a sensible guideline, though only a start. Until recently, plagiarism issues have been treated badly in Wikipedia; the guidelines have been inadequate and resources offered to editors almost non existent. This has opened the way to unnecessary copyright violations, and I suspect, a significant under-body of as yet unrecognised copy violations. It has also resulted in some editors acting with a focus on and concern for the copyright issues, but treating other editors in careless and unproductive ways. --Geronimo20 (talk) 02:22, 26 April 2009 (UTC)
  • This is like having a referendum on funding children's hospitals, the kind where the voter information guides say "no contact information was provided" for the oppose side. There may be disagreements over minor details (how to fund the hospital), but essentially no one's going to argue against it. So yes, I support the proposal. Recognizance (talk) 02:24, 26 April 2009 (UTC)
  • Oppose. See above arguments. Cedars (talk) 14:08, 27 April 2009 (UTC)
  • Support. I've been unpleasantly surprised to find out how many people are unaware what plagiarism actually is and how to avoid it. Several times recently at FAC, reviewers have identified articles that were in part plagiarized from various sources - the editors writing those pages were clueless on why it was bad. We need a guideline to point to so that editors can be educated about this very serious issue. Karanacs (talk) 19:26, 27 April 2009 (UTC)
  • Wikipedia all too often displays profound intellectual laziness, if not a disturbing lack of ethics and respect for the original work of others. Expediency, efficency and technical legality are common, yet unacceptable excuses for such trespasses. To be somewhat topical, the AIG bonuses were legal - note, however, their stark departure from ethics. That Wikipedia has thus far "survived" is not relevant. That other guidelines may constitute bloat is not relevant. The proposed guideline is, frankly, not terribly well written; it is nevertheless meritorious. Editors, if any, deterred by a formal preclusion of plagiarism will not be missed by any project that values quality and intellectual propriety. Эlcobbola talk 19:54, 27 April 2009 (UTC)
Umm, cough, fishbone in my throat... The AIG bonuses were the outcome of contracts signed with persons before the current debacle, and those people performed to their contracts. I don't want to debate that issue here, but I dispute that ethics were involved per se. It's like saying that a rat is unethical for negotiating a maze. The rat should still get the reward, even if the maze has collapsed.
Other than that, I agree with your points, and we do need a framework to ensure that in fact no ethical breaches occur in our own little world. Franamax (talk) 02:42, 29 April 2009 (UTC)
The departure of ethics is not that they were paid, but that they were (in some cases and/or initially) retained by the payees. The point is that being legal is not necessarily tantamount to being ethical; the example I chose to communicate that may not be prefect, but it is sufficient to understand the spirit of the argument. Belabouring of this point is not a productive or necessary use of our time. Эlcobbola talk 02:58, 29 April 2009 (UTC)
  • Support Per Elcobbola. A text-based reference work like Wikipedia must have a plagiarism policy. Awadewit (talk) 21:27, 27 April 2009 (UTC)
  • Support upgrading it to guideline status. –Juliancolton | Talk 01:22, 28 April 2009 (UTC)
  • Support. Among the supporters here we have prolific featured article writers who have taken time out from their own article writing to ensure that plagiarized material doesn't run on Wikipedia's main page, also volunteers who have poured long days into uncovering and undoing the contributions of prolific plagiarists. These are tedious tasks that they undertake because they care about this site's ethics and credibility. Some of this site's contributors are quite young and need an introduction to the concept of plagiarism. The few who would depart in a huff rather than comply would not only not be missed--their departure would lift a burden from our best editors' shoulders and leave them more time to create featured content. Consider this RfC a palpable expression of thanks to those who have worked to eliminate plagiarism; you know who you are. :) DurovaCharge! 15:43, 28 April 2009 (UTC)
  • Support I came here a long time ago planning to help build this page, and got sidetracked (and intimidated by how much work needed to be done). Since then it has grown into a great resource. And plagiarism is one of the main things that can hurt WP's reputation, so it's critical that we have something official to address it. rʨanaɢ talk/contribs 23:29, 28 April 2009 (UTC)
  • Support. Absolutely. --Moni3 (talk) 23:40, 28 April 2009 (UTC)
  • Support Definitely needed for credibility and because we need a guideline (I would support it as policy too) on one of the most important aspects on what not to do for article writing and researching. Dabomb87 (talk) 23:42, 28 April 2009 (UTC)
  • Support: I would much rather have a solid guideline or policy in place to help a new editor get truly proficient and lose ten casual editors who might plagiarize (causing more work for the rest of us) than the other way around. I understand the concern about instruction creep, but this feels like something that can't be helped. (Does anyone affected by instruction creep really go beyond the Nutshell, anyway?) I'm as opposed to biting the newbies as anyone (as I was bitten) but on the other hand: If people really want to contribute meaningfully to the project, they need to understand how it works (as I did). Scartol • Tok 00:13, 29 April 2009 (UTC)
  • Support: We need a guideline or policy because these are tricky issues, and we can't rely on common sense. The boundaries of plagiarism are drawn according to academic tradition and practice, and not everyone is familiar with these. As for biting the newbies, it is much more helpful to point someone to some useful guidelines and help them do the right thing than to simply delete their work when they do the wrong thing. Even for someone like me who is used to academic writing, it's also quite useful, because the informal rules in my field are not really the right ones for a general encyclopedia. --Amble (talk) 00:30, 29 April 2009 (UTC)
  • oppose Not another bloody guideline! I edit some quite technical articles that involve a lot of research, and WP's policies and guidelines already total more than I read up for several research-intensive articles. --Philcha (talk) 00:37, 29 April 2009 (UTC)
    • If the only objection to this is the overall number of guidelines then perhaps it can be joined with WP:COPYVIO at some point, since they are similar concepts.   Will Beback  talk  00:49, 29 April 2009 (UTC)
      • That just gaming the metric implied in my previous comment. I oppose the addition of words or KB to WP's already bloated corpus of policies and guidelines. --Philcha (talk) 11:45, 29 April 2009 (UTC)
  • Support. Mere copies of existing public domain material should be archived onto Wikisource where it will not be tampered with, and the Wikipedia article can use it as a cite, rather than dumping the text into Wikipedia with a few minor tweaks and pretending it is our work, and then placing it under a copyright license, which is akin to Copyfraud. Thank goodness we are getting serious about this. John Vandenberg (chat) 01:02, 29 April 2009 (UTC)
  • Support As long as people think edits like this are "fixing copyright issues" we need this as a guideline. Ruhrfisch ><>°° 01:44, 29 April 2009 (UTC)
  • Support. We need this guideline to send the clear message that appropriating the words of another, regardless of the status such as the Public Domain designation, is not what creating an encyclopedia is about. —Mattisse (Talk) 02:02, 29 April 2009 (UTC)
  • Strong Oppose Should Wikipedia have a guideline on plagiarism? Yes. Should this be it? No. I have serious concerns about the text of this proposed guideline as it presently stands. First of, the tone is heavily biased against incorporating public domain texts into the project. I don't think I need to remind you all however, that this has been a long stading practice and as long as adequate atribution is provided there is nothing wrong with it. Which leads me to my second, and most important point, the guideline conflates alot of issues that have absolutely nothing to do with plagiarism, the whole section debating the merits of public domain sources should be removed for instance. Finally, I'm abit dismayed by the comments of some people here, which clearly show that they don't fully understand the distinction between copyright violation and plagiarism and might be supporting for the wrong reasons or under false assumptions. 189.105.99.200 (talk) 02:22, 29 April 2009 (UTC)
  • While I do think we need a guideline on plagiarism, I don't believe this proposal is ready. It is neither clear nor cohesive and the intro is way too large altogether. Overall it has too much material suited for Plagiarism and not enough material suited for a Wikipedia guideline.--BirgitteSB 03:28, 29 April 2009 (UTC)
I agree that there are many things wrong with it (for example, it spends far more time explaining what isn't plagiarism than what is), but I believe we can start with this as a draft. Is there anything here that you absolutely can't agree with? Awadewit (talk) 03:33, 29 April 2009 (UTC)
I really question the "media plagiarism" section. See how the tern doesn't merit an article? And the "What is not plagiarism" section is a real problem for me. And most of all the lack of focus on actual guidelines on how to appropriately copy text into Wikipedia. Guidelines should not focused on the wrong way and this proposal is. For example a proper Wikipedis guideline would be titled "Avoiding plaigarism". I mean we don't call it "Oringinal Research" for a reason--BirgitteSB 03:50, 29 April 2009 (UTC)
Without having looked into it closely; my first impression is that "media plagiarism" doesnt belong in here, and "What is not plagiarism" is unnecessary as that is essentially "what cannot be copyrighted". But taken as a while, this page is good enough for a guideline. John Vandenberg (chat) 05:09, 29 April 2009 (UTC)
I really in the descriptive camp, even more for guidelines rather than the policies. And compared to my expectations of guideline; this just isn't written as one. It is hardly the end of the world if it were promoted despite the problems. But on the other hand, I believe most people here don't just want a guideline, any guideline on plagiarism. They want a truly useful guideline.--BirgitteSB 18:10, 29 April 2009 (UTC)
  • Oppose and leaning now toward a strong oppose. I've taken to heart the last two objecting editors (though I do wish 189.105 had signed in first), and I have to change my previous non-objection. I've worked on this for a while now, so I may be too close to the subject. Nevertheless, on re-reading of the current proposed guideline:
  • It is an absolute wall of text, it's way too long and meandering. This guideline is really only meant for two audiences: newish users who don't understand the concept and need concise guidance as to what is and what is not OK; and less-new users who are looking for guidance when they've spotted something iffy and need to quantify. It should be much shorter! From my experience: "it's a long story" - "put it into a small package"
  • The tone is now more aggressive than perhaps what was originally envisaged. I detect a more present-tense case, for instance "Material plagiarized...is not being properly presented" suggests this is something that just happened, whereas I would phrase "When you include material...you do not properly inform your audience" (apology if I wrote the original!).
  • Conversely, we went from "copied" to "borrowed" in headings and text. Sorry DCoetzee, but I disagree, if you borrow it, you give it back eventually, right? We're talking about "copying" material.
  • And to echo the IP above, the "Attributing"->"Public domain" section is just impenetrable for a new editor, but seems to say "it's welcome - but here's all the hundred reasons why it's not". That's just an impression, but I don't think that impression is conistent with our mission. No policy (or ethical or moral) imperative says that we can't freely use PD-text within our actual articles PROVIDED that it is properly attributed.
  • Echoing above, this guideline dhould very clearly draw the distinction between copyvio and plagio and it doesn't right now. Copyvio is copyvio, it already has it's whole own procedures and it is (or was) noted right at the top. Copyvio shouldn't keep being discussed throughout the guideline.
  • And echoing again - we need to focus on the basics: what is plagio, how not to plagiarize, how to spot plagiarism, how to deal with it.
  • And to bring up aother point that may not win me any friends, I'm dismayed by the current fixation on DYK and FA achievements. Those are wiki-internal subjects and completely irrelevant. A simple note to consult the relevant rules at those venues is sufficient. DYK/FA are important, but they can't get in the way of us adding content - they just help us to turn it into quality content.
Sorry for the length. In summary, I would propose non-promotion, and those editors who are still interested could take apart and reformulate this. Some of the incoherence is due to sporadic interest, so a sustained effort would be good. However I think that would be much more difficult should the current text be promoted as a guideline. Franamax (talk) 05:11, 29 April 2009 (UTC)
You are close to it, and I am close to it too, having modified what was an overall "PD is welcome" message to mention some of the "hundred reasons why" or places where PD is often not welcome, including the points about DYK and FA. I think that the whole issue about plagiarism is whether adequate attribution is provided or not, and there are important differences among us about what serves as adequate attribution, yet to be worked out. Plagiarism can be defined simply as situations where there is less clear attribution than is reasonably expected. And, DYK and FA articles are good examples where expectations on attribution are higher, in part because implicit claims of credit for writing by wikipedia editors (individually or as a collective) are more salient. I think the right thing to do now, though, is say, yes, this is a guideline now, and it is important to get consensus behind improving it. doncram (talk) 06:29, 29 April 2009 (UTC)
I understand what you're saying doncram, and yet: "there are important differences among us about what serves as adequate attribution, yet to be worked out" - doesn't that preclude adooption of the current text? There is near-unanimous agreement that we should have a plagiarism guideline. I'm less sure on consensus to adopt this plagiarism guideline, in that I've not seen many comments on the specific text as oppposed to approval of the general principle. Nevertheless, if promotion happens now, any future changes will be judged against the "consensus" version adopted right now as the guideline. I'm always wary of the "now or never" approach that puts stones on the ground - they're devilish hard to move around later. Franamax (talk) 07:55, 29 April 2009 (UTC)
Understood. Why don't we all conclude that the apparent consensus is indeed that there should be a guideline at least. You're a principal author of the current version, and are uncomfortable with it. I myself objected to the overly broad welcome of PD text in previous versions, and edited it to reduce that, but I recognize the current text (perhaps especially where i contributed) is not general enough or otherwise appropriate for a good guideline. I certainly accept there could be room for a good rewrite, perhaps by someone else altogether. I wonder if the authors of the Signpost article, namely Awadewit, Elcobbola, Jbmurray, Kablammo, Moonriddengirl and Tony1, could get it together to do a rewrite / proposal for a guideline. I happened to think that the Signpost article was ambitiously titled (as "Let's get serious about plagiarism"), but then it did not actually come through with any strong advocacy on what wikipedia should do about the issue. How about we ask and/or give those authors a chance to come through with a serious proposal here? doncram (talk) 09:51, 29 April 2009 (UTC)
I wasn't aware that they needed any particular invitation. Franamax (talk) 10:07, 29 April 2009 (UTC)
  • Comment: Revisions: to address some of the concerns that this proposed guideline is too long, I have done some restructuring. It did it all in one go so that it can be easily reverted and compared: [1]. I moved some of the material from the lead into a new section so that the lead would be more concise. I have restructured the "Public domain text", including removing several examples of how to format quotations, since a reference to the proper styleguides seems to suffice and we don't have to reinvent the wheel. I've tried to focus it more narrowly on plagiarism rather than addressing all matters that might relate to incorporating public domain text, but I have still attempted to note other concerns that had been previously raised (such as whether material is reliable or neutral). --Moonriddengirl (talk) 12:23, 29 April 2009 (UTC)
    Two goes. :) [2]. --Moonriddengirl (talk) 18:56, 29 April 2009 (UTC)
  • Comment - I think Wikipedia needs a plagiarism guideline, but this proposal still needs serious work. It contains a lot of contradictory information and is quite confusing, IMO, as to what constitutes plagiarism and what should be done about it. Kaldari (talk) 18:27, 29 April 2009 (UTC)
  • Oppose. Agree with Kaldari. Too long, too confusing. I can see merit in such a guideline, but this is not it. One specific point: The issue of how attribution affects plagiarism is not as clear as I would like, especially since attribution can be in-text attribution ("Smith has written that ..."), a footnote reference citing Smith's work, or both. If I write, for example, "Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was the most original new act he'd heard this decade, citing her idiosyncratic vocal style and eccentric dress sense", footnote ref to Rolling Stone given, then I can hardly be accused of plagiarism, can I? Or do I now have to write "vocal style" in quotation marks, and "eccentric", and "dress sense" too? Or should I reformulate Smith's opinion to the extent that it does not sound anything like what Smith wrote any more? The alternative is to only give authors' opinions in full verbatims. If we quote half a dozen commentators, that will be tedious reading. Jayen466 21:58, 30 April 2009 (UTC)
  • In a word... yes. If you use someone's exact phrasing, quotation marks are required. At least that's the way I was taught. Recognizance (talk) 05:55, 1 May 2009 (UTC)
Recognizance, of course I can reuse the entire quote in quotation marks, and that may often be a good way to go, if I want to communicate colour. But assume Smith wrote: "Her vocal style is idiosyncratic! Her dress sense eccentric! I love her! Amanda Palmer's the most original new artist I've heard in the entire decade." Would you argue that an editor who had inserted the summary I wrote above ("Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was the most original new act he'd heard this decade, citing her idiosyncratic vocal style and eccentric dress sense") should have put quotation marks around each word that also occurs in the original source? That would look like this:

Smith, writing in Rolling Stone magazine, expressed the opinion that Amanda Palmer was "the most original new" act he'd "heard" this "decade", citing her "idiosyncratic" "vocal style" and "eccentric" "dress sense".

Jayen466 07:48, 1 May 2009 (UTC)
  • As some of the editors above said, there is a difference between writing an academic essay and writing a Wikipedia article. One should be a product of your original thought. It should contribute something new. The other most definitely should not be a product of your original thought. To the contrary, WP:V requires that any material inserted by editors should be "directly supported" by the source, without the addition of any original analysis by yourself whatsoever. As such, the entire Wikipedia concept is built on what would be plagiarism in the academic context.
  • I would rather have more (and more prominent) guidance in WP on how much paraphrasing is necessary to avoid copyright infringement than a guideline on avoiding plagiarism. Any such discussion will also need to address how big a portion of the cited work has been paraphrased. For example, it is my belief, based on this posted earlier, that a close paraphrase of twenty sentences from a 300-page book is no reason for concern from a copyright point of view. (At any rate, I think close paraphrasing is far less of a problem than linking to copyright-infringing sites.)
  • It is obviously inappropriate to build a whole article on a single copyrighted source, reflecting the structure of the source in the article structure, reproducing 50% of the intellectual content and original thought in that source, and I am not advocating that. But on the other hand, consider that if we mention an author in our text and/or cite their work, that is also an advertisement for them, and exposure to a huge population of potential buyers out there. Google books today routinely shows a big part of people's books in Preview, and so does amazon. I really think that worrying about a close paraphrase of a few sentences from a book is somewhat disproportionate. Jayen466 07:38, 1 May 2009 (UTC)
the entire Wikipedia concept is built on what would be plagiarism in the academic context. That's not quite true. There's an organisation called Annual Reviews that publishes ... annual reviews of progress in various sciences (e.g. Halanych, K.M.. (2004). "The new view of animal phylogeny". Annual Review of Ecology, Evolution, and Systematics 35: 229-25). I don't know its rules, but the content is a review of umpteen scientist's work, although as far as I can see it allows a little more POV than WP, in emphasis rather than pure content. --Philcha (talk)
I meant in the academic context of writing an essay, thesis etc. Jayen466 09:06, 1 May 2009 (UTC)
Good to see, at last, some common sense being injected into this debate. An antidote to the lazy and closed mindsets of some academic contributors who treat Wikipedia as though we are making submissions to Nature. Let's hope this is the start of some sanity. Thank you Jayen466 :) --Geronimo20 (talk) 09:18, 1 May 2009 (UTC)
(reindent to address this point) I actually agree with a good deal of what Jayen said in terms of the differences between academia and Wikipedia, but I still maintain that "eccentric" is an example of something you'd put in quotation marks in the example given. Recognizance (talk) 18:33, 1 May 2009 (UTC)
(unindent) I don't see the common sense in asking for more guidance on avoiding copyright violation, to the extent that is different from plagiarism, in a discussion on a guideline for plagiarism. Plagiarism is different from copyvio. But, Jayen does have a good point that writing for an encyclopedia is different than writing for Nature or other academic publications. Indeed encyclopedia articles are not supposed to be original, and the implied claim of credit is lower, and the reasonable expectation of readers for providing exactness in all sourcing is lower. In an encyclopedia article, there should be relatively less footnoting and quoting, and relatively more paraphrasing. What plagiarism is, is providing less attribution than is reasonably expected for the given medium. Standards for attribution of an encyclopedia article are in fact lower than for top academic journals. The guideline should be written to cover that. P.S. The most intelligent discussion I ever read on plagiarism was: Roger Clarke, 2006, "Plagiarism by Academics: More Complex Than It Seems", Journal of the Association for Information Systems Clarke provides scales for evaluating the seriousness of plagiarism as an offense, according to five factors: "whether the plagiarism is intentional or accidental, the nature of the new work, the extent to which originality is claimed in the new work, the nature of the incorporated material, and the nature of the attribution provided." For the nature of the new work, he means nature of publication, from formal refereed papers down to unpublished, informal materials. Intermediate categories would be scholarly books, textbooks, informational brochures, newspapers, trade publications, casual publications in student newspapers or email lists or blogs. I think encyclopedia articles would be in the middle, are a lot like textbooks, where you do not see much footnoting: it gets in the way for readers who are trying to learn, and there is little/no implicit claim of originality. doncram (talk) 14:22, 1 May 2009 (UTC)
Perhaps ironically, precisely because of its injunction against "original research," but also in part because of concerns about its lack of credentials, Wikipedia articles (at least those that it presents as its "best work" at FA) are in fact much more heavily footnoted than typical academic articles. --jbmurray (talkcontribs) 16:16, 1 May 2009 (UTC)
WP is not an encyclopedia like any other, and therefore I disagree with the notion that sourcing standards should be less exacting here.
WP disclaims originality through the WP:NOR policy. It aims to offer readers an overview of what reliable sources have written. It is important to remember that this overview is compiled by a random and self-organizing (i.e. unsupervised) set of contributors, comprising mostly minors and lay people, along with a very few genuine subject matter experts. That is why WP needs footnotes. ;) They help to demonstrate "lack of originality".
I think the demand for paraphrasing should be restricted to what is legally required to avoid copyright infringement. Editors need clear guidelines on that: the need to use quotation marks for verbatims, an explanation how much is okay to quote or paraphrase closely, etc.
Beyond that, I see value in encouraging proficient writers to paraphrase well, so we arrive at a professional result in our best work. But paraphrasing should not be demanded beyond what is legally required for copyright reasons. This will enable everyone to contribute. It will also help to maintain accuracy with those editors less adept at paraphrasing. And it takes care of cases like the above one with the music critic, where it is arguably desirable to use the words the source used. Jayen466 16:12, 1 May 2009 (UTC)
Jayen, I studied writing in graduate school where a course in relevant law was required curriculum. Our instructor was a lawyer and the textbook she wrote for our course became a minor bestseller. We were very interested in knowing the amount of paraphrasing that is legally required to avoid copyright infringement and she, being quite good at her branch of law, could not answer. There isn't a mathematical formula; if it goes to trial it's a bit haphazard how decisions come out, which is one reason wise people avoid coming close to that line. Yet that belongs in a discussion of the copyright violation policy; plagiarism isn't a legal concept. It appears your quarrel is with that policy, not with this proposal. DurovaCharge! 17:41, 1 May 2009 (UTC)
I don't think you quite follow. One of the external links in the proposed guideline is to a set of pages on a Duke University site, explaining what plagiarism is. The set includes this page here: [3]. As you can see, it says: "A paper composed mostly or entirely of paraphrases from other authors is very likely to be described as 'patchworking' (discussed later in this tutorial). Even if you have cited every paraphrase correctly, you've forgotten to include your own analysis!" Basing our idea of plagiarism on such sources just isn't quite correct, because in Wikipedia's case, such a "patchwork" is precisely what we are aiming for. We don't want editors' own analysis. If I am writing an essay for my university course, producing a "patchwork" is something to be avoided. See [4]. If I am writing a WP article, producing a "patchwork" is what I'm supposed to be doing. It is absolutely clear that copyrights must not be infringed. But the academic concept of plagiarism, on which parts of this guideline seem to be based, does not fit the context of WP. WP editors are not trying to establish reputations as independent scholars and researchers, which is what the university system is designed to produce. The rest is governed by our copyright policy. So what does this proposed guideline add? Jayen466 18:13, 1 May 2009 (UTC)
I understand quite well: you read a paper on plagiarism and asked about its legal implications. That is not a fruitful avenue of query. Avoidance of plagiarism does not require breach of WP:NOR. The Wikimedia Foundation is an educational charity; its projects aim at respectability. It would be incompatible with that mission to take credit for other people's work even if that work is in the public domain. DurovaCharge! 19:21, 1 May 2009 (UTC)
I don't recall talking about the use of PD work. Some aspects of what I was talking and thinking about were competently discussed above, under #Paraphrasing_considered_harmful, by Arch dude and Moonriddengirl. Beyond that, I am all in favour of attributing and naming sources, incl. for public domain work. We are probably talking at cross purposes, so let's just leave it there for now. Jayen466 19:36, 1 May 2009 (UTC)
  • Strong support - As per Moonriddengirl and agreeing with Ottava, it should be policy. Dougweller (talk) 17:14, 1 May 2009 (UTC)
  • Support This should be either policy or guideline. One on plagiarism is long overdue, and the Wikipedia community should be informed on what exactly plagiarism is and why it should be avoided at all times. Timmeh! 22:09, 6 May 2009 (UTC)

Problematic passages

Along with Franamax above, I find the references to FA and DYK out of place in a proposed guideline. Next, here some examples of wordings in the existing version that seem confusing or unhelpful:

  • "In some cases, it is not necessary to cite a source or sources. For example, stating "common knowledge" may not be plagiarism (though in certain circumstances, it may be)." – It may, or may not. Wanna flip a coin? Removes confidence and certainty, rather than inspiring it.
  • "An easy way to test for plagiarism of online sources is to cut and paste passages into a search engine. Exact matches or near matches may be plagiarism." – We are clearly talking about unsourced material – otherwise there is no need to use a search engine to find the source. Copyright violation and verifiability are the primary policy issues here, and are already addressed in the relevant policies. Redundant.
  • "The names of some such programs and services for which Wikipedia has articles may be found at Category:Plagiarism detectors. Wikipedia does not endorse any of these or certify their accuracy." – Then why mention them? Too much information.
  • "It can also be useful to do a direct comparison between cited sources and text within the article, to see if text has been plagiarised, including too-close paraphrasing of the original." – If the material is attributed to cited sources, it is not plagiarism. It may still be a copyvio, based on fair-use considerations, proportion of material taken, presence or absence of quotation marks around direct quotations, etc. We already have a policy for that.
  • "An editor's reputation may also be beneficial in helping to evaluate plagiarism." – No. If multiple FA author X applies close paraphrase in service of his POV, it is okay. But if his novice POV opponents do it, then it isn't. A new gambit for content disputes: "Your stuff was plagiarism! I've deleted it."
  • "Sometimes material from a copyrighted work is copied into Wikipedia with minimal rewriting. This may still be a violation of copyright as a derivative work, and the same concerns about plagiarism would apply if the phrases, concepts and ideas in the copied material are not attributed to the original author. If the text follows closely enough on the original in structure, presentation, and phrasing to raise copyright concerns, handle it as a copyright violation. If it does not, address it as plagiarism." – Is it just me? It all seems so ... hypothetical.
  • "Direct copying of copyrighted works may be a copyright violation." – May be??

Overall, the proposed guideline says a lot that is already spelt out in WP:V, WP:NOR, WP:CS, WP:COPYVIO, etc., only it says it in a way that is much less clear. Instead of just saying, "Unsourced material is bad", it says "unsourced material is plagiarism". And yes, there is useful stuff here too, but right now the reader has to work too hard to extract it. Jayen466 21:35, 1 May 2009 (UTC)

I think the "editor's reputation" bit is a nice way of saying that newer editors sometimes don't understand how the site works. If you read something suspiciously elegant and (say) Casliber or Geogre wrote it, you'll probably save yourself a lot of time by not following it up. If the editor name shows up in red (or they've already been caught copy-pasting), it may be worth investigation and some gentle correction. Of course, if any editor is found to be plagiarising, no matter how many articlestars they have, their reputation is going to take a big hit.
At the risk of being told once again that I just don't understand, I'll say that plagiarism and copyvio are two sides of the same coin. However, one is a moral issue and one is a legal issue. This -guideline- needs to minimize its concern with copyvio. However, adequate paraphrasing is an equivalent concern in both areas. The difference is that extensive verbatim copying of PD and GFDL text is acceptable - provided that it is properly attributed. Same goes for media. We originally largely focussed on how to handle free text, but the mission may have crept along the way.
"Common knowledge" can be evaluated as plagiarism in much the same way as copyvio: copy-pasting a company address or list of directors is neither copyvio nor plagio. Same with a list of moons in the solar system. The borderline is when you copy a unique style of organization of common knowledge - say, ordered by how often each moon is blue. In that case, we would require more than a footnote if you copy-paste the "list of moons by frequency of blueness" and would want a PD-attribution template. That's my view anyway.
Generally agree with your other points. Franamax (talk) 22:13, 1 May 2009 (UTC)
Generally agree with Jayen and Franamax, except Franamax's conclusion, too confident that adding a PD-attribution template is helpful. Why on earth not give a specific footnote attribution of the PD source for blue moons, rather than tarring the entire article with vague suggestion that anything and everything in the article might be copy-pasted from the PD source. Original research, unresearched/poor claims may be present or may creep into the article, and seem to be supported by the too-general PD template. The indiscriminant use of PD templates lowers the quality (defined as fitness for use, citability, etc.) of wikipedia articles. No one could/should quote a wikipedia article, featured or otherwise, that has a PD source: are you quoting the collective of wikipedia editors or are you quoting an idiosyncratic yet PD source whose material was pasted into the article? Why on earth not give the specific attribution for a specific part of an article, when you know what is the the specific source? This repeats some comments i have made in some previous discussions cited in this Talk page or its archives, sorry about that. So, Franamax and i, anyhow, differ about utility of PD templates. doncram (talk) 00:26, 2 May 2009 (UTC)
There is no reason why the use of PD templates cannot be accompannied by footnote attribution. I'm supportive of a three way attribution method myself: Attribution template, footnote and edit summary note. Also, so long as the article is properly cited, I don't see how it would be too dificult to tell which parts were imported and which weren't (I do like the idea of linking to the revision that inserted the imported text in the attribution template though) 189.105.83.163 (talk) 13:46, 2 May 2009 (UTC)
There are editors who charge other editors with "plagiarism" at the drop of an unparaphrased phrase, even though the text in question has been attributed and otherwise paraphrased. If editors are to be subjected to this serious charge, with its implications of fraud, lying and theft, then there needs to be a policy determining when it is appropriate. There should be sanctions to prevent editors carelessly using the term as a destructive instrument of abuse. --Geronimo20 (talk) 23:34, 1 May 2009 (UTC)
I am still concerned that we are using and citing definitions of plagiarism that apply to the academic arena, which teaches people to do original research. Some of these definitions fly directly in the face of what all our policies tell our editors they should do. I think if an editor researches a source and conveys what it says, in a properly attributed manner that respects the intellectual property rights of the source author, we should say "thank you" rather than laying them open to charges of "intellectual plagiarism".
The standards we should apply should be based on those used in newspaper reporting, rather than those used in academia. That is a better equivalence. I will do some research on the kind of rephrasing quality newspapers do when referring to the content of copyrighted works and post examples on the WP:Close paraphrasing talk page. Jayen466 10:24, 2 May 2009 (UTC)
Further commentary and examples of paraphrasing in the New York Times and The Independent posted here. Jayen466 19:47, 2 May 2009 (UTC)
While I'm unsure why newspapers would be more appropriate for us as a model than, say, textbooks & encyclopedias, rather than doing research by looking for examples of close paraphrasing in existing publications (granting that not all journalists even at prominent publications are quite in line with standards), why not look for professional publications address close paraphrasing in journalism? Would we balance your examples with examples where it doesn't happen? --Moonriddengirl (talk) 20:06, 2 May 2009 (UTC)
As mentioned below, academic definitions of plagiarism stress the importance of demonstrating independent thought. That is not really an issue here, because of what Wikipedia is. That is why I thought that newspaper standards might be closer to our situation than academic definitions of plagiarism. But yes, professional publications addressing close paraphrasing in journalism might be useful. It also seems possible that attitudes to paraphrasing differ from country to country – one of the books we recommend has an interesting section about France which I dipped into. Another editor earlier on mentioned Annual Reviews summarising recent research – it would be interesting to see how they go about this, whether they reuse the researchers' own original expressions, or whether they paraphrase extensively, avoiding close paraphrase. The compilation of such reviews, too, parallels our work in some respects. Jayen466 21:36, 2 May 2009 (UTC)
But textbooks & encyclopedias don't, and they are I would think a far closer corollary to us than newspapers. Not that this might matter, if we can't find sources addressing the ethics of plagiarism in any of them. Attitudes towards paraphrasing differ not only country to country, but discipline to discipline, which does make a challenge creating suitable guidelines. Wikipedia may be forging new ground, even among encyclopedias, textbooks & newspapers, both because of our inclusiveness (no professional code of ethics already created for us) and our lack of review. --Moonriddengirl (talk) 22:11, 2 May 2009 (UTC)
Some of our articles on popular culture (bands etc.) are more like newspaper reporting. For articles on scientific topics I agree textbooks and encyclopedias are the better model. Jayen466 20:21, 3 May 2009 (UTC)
Jayen, I agree with your example of unsuitable commentary above and just went on a search-and-destroy mission to eliminate the "patchworking" reference - but I can't find it! Can you point out the link-chain that led you to this? IMO it is completely wrong as far as our mission here goes. Franamax (talk) 20:34, 2 May 2009 (UTC)
It's from one of the resources given under Further Reading: "Duke University Libraries. "Citing Sources: Documentation Guidelines for Citing Sources and Avoiding Plagiarism". Duke University Libraries, (last modified) 2 June 2008. Web. 12 Mar. 2009. (Provides hyperlinked "Citation Guides" pertaining to the most commonly-used citation guidelines, including parenthetical referencing; includes: APA, Chicago, CSE, MLA, and Turabian style guidelines; such style guides define plagiarism and how to avoid it.)" Most of these resources given under Further Reading, from what I saw, take a similar line to this one – i.e. stressing the importance of demonstrating independent thought. Jayen466 21:18, 2 May 2009 (UTC)
  • First off, I've just wandered through the -guideline- and done some trimming, rewording &c, including some modification of MRG's previous excellent work. So the hedge has changed shape a bit and as always, review of my edits is welcome!
  • doncram, again we agree and disagree at the same time. Above or maybe elsewhere you can find my proposals that the PD-attrib templates be modified to clearly indicate the exact diff where PD-text was inserted. This is crucial to me - show which exact text was copied from a free source. The subsequent changes are down to your wiki-skills, no more or less than if you want to see how much your own or my words have changed since we originally added them. It comes down to whether you know how article history works. As an object lesson, track the genesis of what I originally wrote here and what exists now (I'm fairly confident I can claim continuing authorship for "the", "they" and "also":).
  • more to don, yes, definitely copy-pasting in free text "degrades" the quality of our articles. I have no argument with you there, there is almost no case where insertion of PD-text does not degrade articles in terms of tone or style. However that doesn't apply to articles which haven't been created yet, and I don't think it applies to stubs either. Massive expansion is part of what we do here, painful as it may be for the perfectionists (BTW, I am a confirmed perfectionist, so you may be preaching to the choir :)
  • Further, I think I detect a subtle bias toward FA/GA/DYK in your comments. I'm not sure, and I wholeheartedly agree with you in any case that the article improvement and editor recognition processes are very important to what we do here. I just don't think they should have primacy over article expansion. We're nowhere close to finished yet, all we've done so far is all the Pokemon characters. As always, I'm thinking trees and insects. I disagree with you (quelle surprise:) that a PD-attrib template "degrades" an article. Rather, it is the actual content of the article that degrades it. To me, as long as the original PD-injection is clearly identified, we can always reach the point in the editorial process where that template can be removed. That is actually one of the unresolved topics of discussion here - can you and/or when can you ever remove a PD-attrib template? We never actually got agreement on that,
  • Geronimo, yes absolutely the charge of plagiarism is fraught with implication and needs to be handled in a very sensitive way. There is some bold text in the lede (which I've lately tweaked a bit) to emphasize that point. Do you wish some even stronger text? Perhaps you could propose the specifics here, or start a new section? Franamax (talk) 02:24, 2 May 2009 (UTC)


  • I could defend either position, that you can remove a PD template when you use a more specific reference that specifically cites all the material used, and includes the date of the original publication of the material, or that it should remain permanently as a warning that some of the material may be outdated. But this applies to such things as the old EB; there are also current PD sources from the US government or the like which do not have the same objection. The one thing which is not acceptable is the use of such sources without exact indication of the material which has been copied. DGG (talk) 17:38, 3 May 2009 (UTC)
  • Support I've come across plagiarism before a few times and it would be useful to have a guideline that specifically discusses the issue. In prtactical terms it seems to do little or nothing to change existing policies. Some sections could be better written but that's a detail that needn't hold up its promotion to guideline. Of course it should really be a policy, but one step at a time! AndrewRT(Talk) 00:06, 10 May 2009 (UTC)
  • Support. Looks good. Is clearly helpful to beginning editors. Definately a good idea to give it prominence. It doesn't have to be perfect to label as {{guideline}}. --SmokeyJoe (talk) 11:53, 12 May 2009 (UTC)

Problems with new changes

This is a major change and should be discussed on the talk page first. There is no consensus to removing such a large portion of text, especially when it deals with definitions which are necessary. Ottava Rima (talk) 05:31, 21 May 2009 (UTC)

Why are multiple definitions that read like they were taken out of an eighth-grade student handbook necessary? How do multiple, long-winded, redundant definitions help Wikipedia editors to avoid, prevent, and fix plagiarism, rather than detracting from plainly described information about copying and sourcing? —Centrxtalk • 05:47, 21 May 2009 (UTC)
(ec) I object to[5] this edit. The model is bold, revert, discuss. Centrx made a very bold edit, which was reverted with a detailed edit summary and a request to take it to talk. The proper thing to do at that point is to discuss the proposed changes, not reinstate them with a terse accusation that the criticism was wrong. The proposed change fails to distinguish between plagiarism and copyvio, and makes a mistaken assumption that plagiarism cannot be OR. Plagiarized text certainly can violate WP:SYNTH. DurovaCharge! 05:48, 21 May 2009 (UTC)
You reverted numerous distinct types of changes, and restored a blatantly unintelligible version of some text. Either this page is a guideline which, as was repeatedly averred in the RFC, needs serious improvement; or it is not, because most of the support in the RFC was general support for any Plagiarism guideline, not for this page. —Centrxtalk • 05:55, 21 May 2009 (UTC)
(Edit conflict)
1. The new text distinguishes between plagiarism and copyvio as much as the old text, and adds new explanation of issues with copyright problems.
2. The new text says nothing about original research. Original research is not more relevant to plagiarized material than to any other material. Note that you may be assigning undue relevance to plagiarism because the Wikipedia:No original research page uses article text about plagiarism as an example, almost like lorem ipsum. —Centrxtalk • 06:02, 21 May 2009 (UTC)
I think I have inferred what you mean about distinguishing between plagiarism and copying, and addressed it by restoring some of the previous text, in this edit. —Centrxtalk • 06:29, 21 May 2009 (UTC)

Some current issues

I'm glad this has become a guideline and think it can be very useful. I've read through it and there are a few things that jump out at me as possibly problematic or needing clarification (basically listed in order of importance in my view):

  1. I think we need to start with clearcut (to the extent that's possible) definitions. There was a "defintions" section previously which was recently removed. I think that was problematic as written, but I still think we need something like that at the outset before we start talking about "Why plagiarism is a problem."
  2. Specifically with respect to number 1, I think we need to clearly lay out, perhaps with an example, the problem with material that has been "insufficiently adapted into original language," as it's described in the intro. That's a complex question and I think we need to get into it a little bit and also explain that there is some ambiguity there (but probably not as much as some people think, and one should err very much on the side of caution). The recent post-RFA brouhaha has made clear that some folks thinking changing a word or three means it's not plagiarism, and I think we need to be direct that that is not the case.
  3. In the "how to respond" section we rightly note that "failure to properly attribute text may be intentional, but it is often inadvertent." I think this point also needs further emphasis (indeed we might want to include it toward the top of the page in a "definitions" section). While there may be different ideas in different locales/disciplines about whether or not "unintentional" (as in non-malicious) copying is truly "plagiarism," I think we need to say definitively here on Wikipedia that it is, and that inadvertent plagiarism is still a serious issue. Let's strongly emphasize that the onus is on the editor adding material to make sure that it is not plagiarized, and that in a sense when it comes to plagiarism ignorantia juris non excusat (although we might not want to word it that strongly, or maybe we do). I don't mean to say that someone who simply didn't understand what plagiarism is gets permanently banished to the Wiki doghouse, but rather that we need to stress that plagiarism (intentional or not) is unacceptable here and editors need to be familiar with what plagiarism is and is not as they edit.
  4. The whole copyvio vs. plagiarism thing is confusing. What are we trying to say there? On the one hand we say "Direct copying of copyrighted works may be a copyright violation. Doing so without attribution is also plagiarism" which suggests that copyvios can be plagiarism at the same time (which I think is a simple fact), but in the next section we say "Plagiarism doesn't have to be immediately removed, unlike copyright violations," which suggests that it cannot. First of all I don't understand the reason for the last sentence - plagiarism should most certainly be removed immediately, even if it is somehow not technically a copyvio, because it's a horrible idea to have plagiarized text sitting in an article. But again I'm having trouble understanding the logical relationship we are trying to describe between plagiarism and copyright violations. To some degree this gets into legal issues which I would not pretend to fully understand (what we really need here is a certified expert in intellectual property rights), but for starters I'm wondering if someone can explain the general point or points that section is trying to get across. If it's basically the same thing as can be found in the second paragraph of our Plagiarism article (and assuming that's a fair summary), we might just want to say something like that.
  5. From my understanding what constitutes "plagiarism" is very often a culturally specific question, and "Western" conceptions of plagiarism may not hold throughout the world. We'd need sources for that idea but if that's true I think we should say that, and also obviously say that we're using definition X (or X,Y, and Z) when we say "plagiarism." The fact that the Wikimedia Foundation is based in the United States might also have some bearing on how we approach this, but I'm not sure.

Apologies for the lengthy note, but these are some of the issues that pop into my head and I'd like to hear what others think about them. --Bigtimepeace | talk | contribs 07:46, 21 May 2009 (UTC)

  • Yes, where did that "Plagiarism defined" section go? It was good, I'll fish around for it. You have to understand that this has been an ideological battleground for the last year or so...
  • Moonriddengirl is enthused about Dcoetzee's essay on "Close paraphrasing", not sure where it's located though. Using a direct example is problematic though, since it will tend to stigmatize one editor, and people will often tend to take the example as the rule. For the same reason, we fight to keep WP:IAR as just the 12 words. It's very much a matter of opinion. There are many sources giving the academic definition, which unfortunately don't translate directly to the wiki.
  • While you were writing the above, I was putting wording back in to the lead about how plagiarism can be done unintentionally. Until people are aware of the concept, they're, well, unaware of the concept. We do need (I think) to emphasize the process of education here, as opposed to accusation. A lot of editors are genuinely unaware that it's wrong to copy text or images. My personal rule of thumb is "how often do you use the keyboard when you're editing compared to using the mouse to copy/paste?".
  • Copyvio versus plagio has always plagued this guideline. On the one hand, the courts give precedent on example cases - but those are copyvio cases. On the other hand, most external discussion on plagio is in the academic context, where the emphasis is on original research and synthesis - the one thing we most disallow here. If you can improve the wording, go at it. We all know where the undo button is... :)
  • Cultural concepts aren't just West/elsewhere. A new generation in the West consider free copying and reuse just fine. (And it is, Lessig and Stallman have lots to say on the topic) I really think we need to adopt an internal definition of what is acceptable, subject obviously to US law so the servers can be operated (and Korea and Holland where the mirrors are) This needs to be an "English Wikipedia" definition, and even then, attitudes vary widely among even the people who've commented here.
There, I won't even apologize for the long reply. :) Edit away! Franamax (talk) 08:56, 21 May 2009 (UTC)
Thanks, that's very informative, and way shorter than what I wrote!. I restored the "definitions" section since that was only recently removed, though I think it needs tinkering/expansion. As to "close paraphrasing" and an example, were we to use one I think it should be made-up, not something based on Wiki edits. Obviously you're right that there is the issue of taking the example for the rule, but I think some specific illustration of what is not good is probably worth that risk. As to the unintentional plagiarism issue I think we need to strike a balance between education, as you quite rightly put it, and "not-knowing-doesn't-make-it-not-a-problemness" (to coin a phrase, and avoid the term "accusations"). The recent experience with a new admin showed there were a number of editors who seem to think not knowing the rules about plagiarism isn't all that bad, and I think we somehow need to say that you really, really need to know about that, but at the same time we won't totally freak out on you if you don't. I think that's a balance we can strike. Right now I have no smart ideas about plagio v. copyvio as it were, though maybe something will come to me. And I completely agree with your last point about cultural concepts (indeed I've read my Lessig). I spoke too generally there, but my point was simply that there are places where "traditional" (for lack of a better word) "Western" (for lack of a better word) concepts about plagiarism do not at all prevail. Obviously it's a lot more complex than that (and it's been years since I read about this topic so I have no specifics offhand), but we might want to acknowledge that there are entire societies where a more Lessig-like approach is the norm (as is not the case in the U.S.) and that there is some sort of cultural specificity to the ideas about plagiarism which we are putting forth in this policy. That's not a huge deal though.
Thanks again for your reply, and I will stick around to work on this, but not for awhile since I'm leaving town in a couple of days for a couple of weeks and have a lot to do before then (working on this doesn't make the cut I'm afraid, it's just that Durova - wittingly or not! - semi-guilted me into making at least some comment here which is all I can do for now :-) ). --Bigtimepeace | talk | contribs 09:28, 21 May 2009 (UTC)
The essay is at Wikipedia:Close paraphrasing. :) There is a made-up example in that essay, and there was one that I made up at the talk page of that essay that was not incorporated (perhaps because it sucked; perhaps because nobody ever read it. Since there were no responses, I'll never know. Reading over it at this later date, I think my "close paraphrase" example may have been too subtle). --Moonriddengirl (talk) 10:52, 21 May 2009 (UTC)

Media plagiarism

Here's another concern, in the "How to find plagiarism->Media plagiarism" section. The current wording reads a bit like a manual on how to poke beans up your nose. Can we trim this to just a descriptive bit about how EXIF data persists and leave out the stuff on specific methods to spot abuse and how easy it is to alter EXIF data? Franamax (talk) 12:18, 21 May 2009 (UTC)

No objections from me. :) --Moonriddengirl (talk) 12:20, 21 May 2009 (UTC)

Requesting clarification on math equations

First off, please look above at Bigtimepeace's concerns, they are very germane to this overall guideline.

My narrow concern here is the wording in "What is not plagiarism", where it says "Simple mathematical calculations...". This bothers my inner geek. Any math equation is what it is. It can't be paraphrased or rewritten. The symbols used and the form of the equation are "written in stone" as it were. Math equations must be copied verbatim.

The problem arises when we discuss methods of attribution. A footnote marker (the blue number in square brackets) at the end of a marked-up math series is improper notation (maybe), since it indicates raising to a power of the footnote number. And we already have many thousands of marked-up equations included without direct attribution.

So my question is: what guidance can we give here for math and science editors who wish to include specific equations, which by definition must be copied verbatim? Franamax (talk) 09:18, 21 May 2009 (UTC)

I can't believe I'm even attempting to respond to this since I've utterly enjoyed not doing math since (inexplicably) passing the AP Calc test over a dozen years ago, but a chance conversation I had with a math prof a few weeks back might shed some light on this. He said that, with at least some frequency, published math papers and the like copy other published stuff pretty much wholesale (I assume with attribution) when a formula (or something - crap I'm already in over my head here!) is expressed particularly well. The basic point of the conversation, oddly enough, was how wildly different plagiarism standards were in mathematics and history (my field). Apparently they're far less strict in the former, which suggests that here on Wikipedia we could perhaps feel comfortable having fairly complex equations that lack a citation, as opposed to just simple ones. But please don't take my word for that since this is essentially hearsay, and since my mathematical knowledge regressed many years ago to a 7th or 8th grade level. --Bigtimepeace | talk | contribs 09:39, 21 May 2009 (UTC)

My experience is that formulas, equations, and calculations are not attributed when they're part of the general background knowledge of a field. They may still be associated with the discoverer's name, so that I would refer to the Heisenberg equation or Gauss's law but not feel the need to cite any specific written source. When the work is novel, or not widely known, the source should be mentioned and cited in the text, and the equations copied nearly verbatim (some rearrangement and use of different symbols are OK, to match the conventions of the article). Sometimes, textbooks or other reference works are cited. These don't imply any kind of credit, and are placed in the text leading up to the equation, without being mentioned by name in the text. --Amble (talk) 16:38, 21 May 2009 (UTC)

Based on the input of both editors above, I have now changed the semtence to read "Mathematical formulas which are part of the general background knowledge of a field". This seems to accord well with actual practice on-wiki, in which named equations are generally prefaced as such. It's possible that the wording should be extended to mandate the use of names for identified equations, but I'm not going to add it, and I think that those mathematically-inclined will already be diligent about such matters. Thanks for your advice! Franamax (talk) 04:21, 23 May 2009 (UTC)

Public domain text

In this section, we give different instructions for material from "public domain" and "free" sources. Free content requires an attribution template, public domain does not. However, we don't explain the difference between public domain and free content, and it is all under the heading "public domain text". As a result, the section is completely confusing to those who don't know all of this already. Of course, the next section then is about "Text available under a free license", which is obviously something else again than "free content". Once more, the difference is not explained. Even more confusion, leaving the hapless reader stranded. Jayen466 15:45, 21 May 2009 (UTC)

(edit conflict)The position of this document seems to be that all free content--public domain or otherwise--requires attribution, either through an attribution template or a note. Given that the position is that credit is a moral requirement, where not a legal one, that seems consistent to me. --Moonriddengirl (talk) 15:49, 21 May 2009 (UTC)
Moonriddengirl, here is the contradictory section:

A good practice to use when copying free content verbatim is to indicate in the edit summary the source of the material. Further changes such as modernizing language and correcting errors should be done in separate edits after the original insertion of text. This allows later editors the ability to make a clear comparison between the original source text and the current version in the article. In addition to the edit summary note, be sure to attribute the material by using an attribution template or by writing your own note in the reference section of the article indicating that language has been used verbatim. For an example, see the references section in planetary nomenclature,[1] which uses a large amount of text from the Gazetteer of Planetary Nomenclature. Whether adding text verbatim, summarizing, paraphrasing or making explicit quotations, regular referencing should be added to provide both attribution and verifiability.
An advantage of public domain sources is that longer quotations are more acceptable than those from non-public sources, which may run afoul of "fair use" copyright limitations. (See the guidelines on quotation for information on formatting quotes.) In this case, an attribution template is not separately needed. A standard inline citation is sufficient.

I make of this:
  1. "When using free content verbatim be sure to attribute the material using an attribution template."
  2. "The advantage of public domain sources is that longer quotations are more acceptable and that an attribution template is not separately needed."
This is really not good. Jayen466 15:55, 21 May 2009 (UTC)

Under "Generating many articles from a free source", we give as an example the 1911 Britannica. However, that work is public domain. Jayen466 15:55, 21 May 2009 (UTC)

"...or by writing your own note in the reference section of the article indicating that language has been used verbatim." You stopped bolding a little early. :) (I have myself indicated verbatim duplication in inline citation.) --Moonriddengirl (talk) 16:07, 21 May 2009 (UTC)
I am afraid it is still a shambles. Is there a difference between public domain content and free content, or not? And do we want people to use an attribution template, their own note in the reference section, or an in-line reference? Jayen466 16:24, 21 May 2009 (UTC)
The word "or" suggests we will allow them to make their own choice. That seems acceptable to me. I gather you want us to nail it down? --Moonriddengirl (talk) 16:47, 21 May 2009 (UTC)
Without wishing to distract you further :P, it just seems like muddle-headed writing, where the writer doesn't know what they want to say, or where there have been several people contributing, each wanting to say something different.
To recap: assuming that no material difference is implied between "public domain" and "free content" in that section, we are saying right now that "for free content, people should be sure to use an attribution template or their own note in the reference section, but the good thing about free content is that an attribution template is not required, and an in-line citation is enough." As a guideline that people are supposed to follow, this is almost kafkaesque in its opaqueness.
If on the other hand there is a relevant difference between "public domain" and "free content", then we have to tell the reader how to distinguish one from the other, so they know what they are dealing with in any given case, and which guideline to apply. For instance, the intent may have been that "free content" should refer to copyleft materials as well as to public domain materials (where copyright has expired). In that case, it would be best to have one set of instructions for copyleft and one set for public domain, and name the section "Free content" rather than "Public domain text". Jayen466 17:47, 21 May 2009 (UTC)
On the last paragraph, "advantage of public domain sources...", the key word there is quotations, meaning that you can copy a whole paragraph or chapter or entire work and put it inside quote marks. You can't do that with a copyrighted work generally, it becomes a substantial taking. That's a different thing than copying in PD text without quote marks, which is also allowed but needs to be attributed differently. I guess the wording could use a tweak to make it clearer and the PD/free content back-and-forth is a bit confusing too. Franamax (talk) 19:21, 22 May 2009 (UTC)

Page focus

This page seems to me to be trying to do too much, and overlong and confusing as a result. Is it a How To guide for users not sure how to avoid plagiarism? A general repository of Stuff To Do With Plagiarism? It seems to me it would be better split into Wikipedia:Avoiding plagiarism and Wikipedia:Dealing with plagiarism. The latter in particular overlaps a lot with things like Wikipedia:Copyright violations and Wikipedia:FAQ/Copyright. But at least the former could then focus on guiding newbies on how to avoid plagiarism - what to copy (or not), how, and how to attribute/cite. Rd232 talk 15:12, 22 May 2009 (UTC)

Hello, I think the problem is you have too many pages and they are all highly technical. Do you have a simple page about copy-pasting and paraphrasing that essentially just tells the editor they need to use quotation marks? :) 217.189.255.152 (talk) 16:26, 22 May 2009 (UTC)

If the idea is that plagiarism will be understood to not be the charge by which you are expelled from university, but rather the notion of giving less credit than is due, then Wikipedia:Avoiding plagiarism seems like the reasonable thing to have. And it should be made official policy, not just some guideline to be overruled by common sense. Ideally, it would be short and concise, something you could reasonably expect that every editor should read. 217.189.255.152 (talk) 19:18, 22 May 2009 (UTC)

I like the idea of getting to a short, clear policy-type statement, written towards becoming wikipedia's official policy, and that we could expect every editor to read. I think it would useful now to divide this guideline into three items: the guideline (to become a policy someday), the how-to-avoid-plagiarism friendly guide for new editors (along the lines of the recent Signpost article, akin to many university guidelines for students but adapted to writing for wikipedia), and the plagiarism problems / issues / enforcement area. doncram (talk) 19:59, 22 May 2009 (UTC)
Since this has just become a guideline, we probably should let it settle in for a while and let all the comments come here. I like the idea of splitting off a noticeboard right now, as discussed in the section just above.
The guideline is long and a bit meandering, it reflects the views of quite a few very experienced editors with varying thoughts on the matter, built up over the last 10 months or so. People's natural tendency when trying to help is to add text, so there you go. The page was though originally intended to be structured on the lines of "what is plagiarism", "how to avoid plagiarism", "how to spot plagiarism", "what to do about plagiarism" so that curious editors could jump quickly to the sections where they were seeking information. That sense may have been lost over time, but we can regain it just as easily by restructuring this page as by splitting it into two.
That said, I do like the idea of a friendly essay on "How not to plagiarize" for newer editors and those unfamiliar with the concept. This could reflect the community's current views on the subject, without needing to address the mass-imports done in the past. I'll warn you though, there's a lot of different views on the right way to do it. Franamax (talk) 01:54, 23 May 2009 (UTC)

Here's my idea of an "intro" page: User:Franamax/Test essay. It's totally descriptive (needs links) and written from a first-person perspective, using the simplest phrasing I can manage. That is what I envision as an introduction for people who haven't yet grasped the concept of plagiarism. It doesn't explain why plagiarism is wrong, but it's my best shot at a practical guide on how to not be a plagiarist. Edits there and talk comments are welcome if it looks even remotely usable. Franamax (talk) 06:32, 23 May 2009 (UTC)

Oops! I guess I spent the last few hours too close to the subject. That test essay would be a candidate for WP:Avoiding plagiarism or some such simple presentation. Franamax (talk) 06:49, 23 May 2009 (UTC)

What is not plagiarism

This section is full of weasel words and circular reasoning. "For example, stating "common knowledge" may not be plagiarism (though in certain circumstances, it may be). Such cases in which plagiarism does not occur may include: common knowledge."

This text is not ready. Jayen466 16:04, 21 May 2009 (UTC)

Feel free to propose a revision. --Moonriddengirl (talk) 16:10, 21 May 2009 (UTC)
It would help knowing what the text is trying to say. Jayen466 16:19, 21 May 2009 (UTC)
If you want to know that, this is where to go. Otherwise, it can simply be revised. --Moonriddengirl (talk) 16:32, 21 May 2009 (UTC)
I think on the whole I liked the old version better. ;) And we should not be using footnotes if what is in the footnotes is essential information. Here a proposed revision:

The purpose of citation is to provide educated conclusions drawn on a subject which back up a statement. In some cases it is not necessary to provide citation, and the reuse of such material is not considered plagiarism. Some such cases include:

  • Factual information found in infoboxes that is common knowledge. (Many infobox parameters do, however, include a feature enabling notes of sources of information to be placed in infoboxes where appropriate.)
  • Simple, non-creative lists of information, such as a list of song titles on an album or actors appearing in a film. If creativity has gone into the selection of elements (in terms of which facts are included and in which order they are listed), then reproducing the list without attributing it as a quotation constitutes plagiarism (and may in any event constitute a copyright violation, see Feist Publications v. Rural Telephone Service).
  • Statements of common knowledge, as long as you remember that using another person's words to discuss a topic that is common knowledge still qualifies as plagiarism. If an outside source uses a particular wording to state a fact that is common knowledge, any verbatim quote of that source that exceeds a length of four or five words must be attributed (author, work, page number) and placed in quotation marks; replicating another person's exact wording without attribution is plagiarism no matter what the content of that wording was.
  • Simple mathematical calculations which can easily be reproduced.
  • Simple logical deductions. Complex logical deductions, in contrast, may require a citation.

Please review. Jayen466 16:56, 21 May 2009 (UTC)

First, I would prefer to discuss one point at a time myself. In that spirit, I'm looking at common knowledge. I'm not comfortable with giving any length for verbatim duplication of material without attribution. An apt phrase may be shorter than four or five words, but if it is a striking or novel combination of words even if being used to convey common knowledge, it needs quotation & attribution (see, for example, [6], with a two word apt phrase example of "psychological gentrification.") Otherwise, I think it's a good direction. Now I have to get back to History of the Jews in Poland! Off with me! No distractions. :) (P.S. I'm not blaming you for my distractions. I'm chiding me.) --Moonriddengirl (talk) 17:10, 21 May 2009 (UTC)
Perhaps it would be best to say "any distinctive and original wording of that source must be attributed" rather than giving a number of words. The problem is like you say that two words like "psychological gentrification" may be original and creative, whereas seven words like "the Secretary-General of the United Nations" or "Frank Zappa and the Mothers of Invention" are not. Jayen466 17:55, 21 May 2009 (UTC)
I like that. Now, to the open. I'm afraid I don't quite follow what you mean here: "The purpose of citation is to provide educated conclusions drawn on a subject which back up a statement." Can you clarify? (Done with my task! For now! Yay!) --Moonriddengirl (talk) 18:34, 21 May 2009 (UTC)
Thanks for identifying the one bit in the old version that was definitely inferior to what we have now. :) So now we would have:

The purpose of citation is to provide sources of information supporting a statement about a subject. In some cases it is not necessary to provide citation, and the reuse of such material is not considered plagiarism. Here are some examples where attribution may not be required:

  • Factual information found in infoboxes that is common knowledge. (Many infobox parameters do, however, include a feature enabling notes of sources of information to be placed in infoboxes where appropriate.)
  • Simple, non-creative lists of information, such as a list of song titles on an album or actors appearing in a film. If creativity has gone into the selection of elements (in terms of which facts are included and in which order they are listed), then reproducing the list without attributing it as a quotation constitutes plagiarism (and may in any event constitute a copyright violation, see Feist Publications v. Rural Telephone Service).
  • Statements of common knowledge, as long as you remember that using another person's words to discuss a topic that is common knowledge still qualifies as plagiarism. Any distinctive and original wording of that source must be attributed (author, work, page number) and placed in quotation marks; replicating another person's exact wording without attribution is plagiarism no matter what the content of that wording was.
  • Simple mathematical calculations which can easily be reproduced.
  • Simple logical deductions. Complex logical deductions, in contrast, may require a citation.

Can we [drop that in then? JN466 21:55, 22 May 2009 (UTC)

The idea of putting some things into footnotes is to keep the main text concise. That way people will have an actual chance of getting through the whole thing, and they can always check the footnotes for a topic of particular interest to them. In theory, everything on this page is essential information, so it becomes difficult to include it all in the main body without creating a vast wall of text. Franamax (talk) 01:05, 23 May 2009 (UTC)
I'm afraid I'm still going one step at a time. :) "The purpose of citation is to provide sources of information supporting a statement about a subject. In some cases it is not necessary to provide citation, and the reuse of such material is not considered plagiarism. Here are some examples where attribution may not be required:" That first sentence doesn't seem unnecessarily complex to you? It seems to me that essentially what's being said is "The purpose of citation is to provide sources for information." "supporting a statement about a subject" seems unnecessary. For that matter, the next bit seems like it could all be abbreviated. We could melt the two first sentences into one: "It isn't always necessary to cite sources to avoid plagiarism. Here are some examples where attribution may not be required:" --Moonriddengirl (talk) 01:30, 23 May 2009 (UTC)
I don't think we need to explain the purpose of citation in this already lengthy guideline. Your rewording of the second and third sentences I like very much. [7] JN466 09:56, 23 May 2009 (UTC)
Comparing what we've got in the guideline now with what we've got above, it seems like this is all resolved, yes? If not and there's more to talk about, please let me know. :D --Moonriddengirl (talk) 13:19, 23 May 2009 (UTC)
Yes, resolved. Pleasure doing business with you. :) JN466 16:03, 23 May 2009 (UTC)

HELP: fair use? plagiarism? copyright?

Could someone who knows about this stuff comment here? Note that I don't accuse the user of plagiarism — it is obvious s/he believes the text or sufficiently attributed — but this is the only page I found that actually tries to explain that copy-pasting other people's creative work is not fine. Since the user in question does not believe in anons, and removed the copyvio template I had placed on the page, I ask the knowledgeable people involved here to clear up this situation. 217.189.255.152 (talk) 08:09, 22 May 2009 (UTC)

Several persons have gone to comment at the user talk page and to edit that one article. It's a bigger problem though. Taking one other contribution by that user, at random, i see in article Nicolas Rashevsky, written only by this user, that the user wrote this sentence: "His later efforts focused on topology of biological systems and the formulation of fundamental principles in biology and hierarchical organization of organisms and human societies.[1]".
:Reference
  1. ^ Planet Math page
  2. That sentence, at least, is verbatim, an exact copy of the sentence in the source, and in my view it is plagiarism not to credit the source adequately by quoting it. Planet Math.Org releases the text under GFDL, but to me it is obvious that wikipedia needs to quote and credit Planet Math. Note, the short Planet Math source article itself has 9 substantial references. We are relying upon the Planet Math writers' credibility in accurately reporting information that they attribute to those sources. We cannot ourselves just paste in their sources as we have not consulted them directly. We are relying upon the secondary/tertiary work of the Planet Math writers. I didn't check other sentences or other contributions by this user. doncram (talk) 15:11, 22 May 2009 (UTC)

    Just to say I'm impressed by the quick response. 217.189.255.152 (talk) 16:23, 22 May 2009 (UTC)

    The Planet Math thing was easy to fix by using proper a attribution template. There is a difference between incorporating and citing, and the attribution template gives proper credit. But citing is also used for fact-checking, so one may need both? 217.189.255.152 (talk) 17:46, 22 May 2009 (UTC)
    I don't agree that slapping on an attribution template solves the problem there, though it brings the degree of crediting up a notch, and that is a level that many have found acceptable. The effect in this page is that pasted in text written by the planet math GFDL writing collective, minus its references, is put into wikipedia. The same text written by the wikipedia GFDL writing collective, with no sources, would not be acceptable: it would be challenged as unreferenced or perhaps OR. In fact, without consulting the actual sources, we cannot write such a text. So, I think we want to rely upon the authority of the Planet Math writers, and credit them/blame them/hold them accountable, by putting the verbatim text in quotes. Why not hold the wikipedia editor who adds the text, responsible for giving credit where due for the wording of the text. It is much harder, later, to try to figure out which words came from which source. I recognize that others think the attribution templates are more helpful than I do. doncram (talk) 07:30, 23 May 2009 (UTC)
    This is a situation I haven't seen before. If I'm reading it right, we have a peer GFDL site writing articles based on specific cited sources, and we're contemplating either importing the text without the sources (which is OK by the technical definition, since we're free to reuse GFDL material in any way, so long as it is properly attributed; but now becomes unsourced material); or we're importing text which inludes references that no wiki editor has actually reviewed for relevance or accuracy. The second option seems wholly unpalatable, thus the best option would seem to be for the original editor to insert the properly-attributed GFDL text sans-sources and take their chances with tagging for V or OR. Or take the simple route and use quote marks. Or they, or anyone else could read the actual sources and make a determination that they are accurate. In any case, I don't think we can rely on sources from a different GFDL site - however, let's look at the similar situation inter-wiki: if an article is translated from de:wiki or fr:wiki. do we accept the sources at face value? Franamax (talk) 08:20, 23 May 2009 (UTC)
    Yep, that's the situation, at least for the portion of the article which is copy-pasted from Planet Math. I agree that pasting in their sources without reading them would be unpalatable. Clarke, in his writing on plagiarism that i've cited in another discussion section, regards that as especially misleading and heinous. Consider if they were copy-pasting from us. I would be offended or irritated or displeased if they just copied our word-for-word text, without using quote marks to convey it was we who wrote it. I would be perfectly satisfied if they credited us by quote marks, relying upon the wikipedia editor collective's reputation that what we say is accurate (based on our own sources). I think we can rely upon the Planet Math GFDL writing collective at least by quoting from them (and clearly giving them credit/responsibility for the translation of sources into their wording). Eventually we might be able to evaluate how reliable a source they are. Actually, a concern we should have is what is their plagiarism policy: are they clear that verbatim text must be put into quotes? I wouldn't want to paste GFDL text into our adventure, only to discover it was plagiarized (copied verbatim) from one of their sources. Likewise, others should be able to rely upon wikipedia as a valid, quotable source that doesn't plagiarize (does not copy verbatim without quoting). doncram (talk) 08:54, 23 May 2009 (UTC)
    P.S. I hope you don't expect me to investigate the edit history of the Planet Math collective, if that is available, in order to look for edit summary comments that may provide hints that their text is copied verbatim from some other source. If it turns out that that collective's plagiarism policy allows for attribution hidden in their equivalent of Talk pages or edit history, then that certainly reduces the practical reliability and value of their product. Likewise, I think wikipedia readers should be able to rely upon what is displayed in a wikipedia article, not search through edit history for hints that all is not what it appears to be. All attribution should be WYSIWYG, IMHO. doncram (talk) 09:04, 23 May 2009 (UTC)

    Promotion to guideline

    The RfC above has (by my count) 26 supports and 6 opposes. That's over 80% in favor. Due to subsequent events including a drive to desysop an administrator, it's necessary to regularize matters by establishing this as a formal guideline. Editors who have misgivings about the current wording are welcome to improve it. Yet it isn't a tenable situation to ask for a desysop over violating a proposal.[8][9] The concerns are substantive; it's time to formalize this. DurovaCharge! 15:56, 20 May 2009 (UTC)

    Can't talk. Eating popcorn and reading, reading, reading...
    Oh my gosh, what a mess. Despite my misgivings about the wording, I agree that it's time to give this some weight and work out the details over time. Support promotion. Franamax (talk) 20:13, 20 May 2009 (UTC)
    Thank you. Maybe after a brief hiccup this will all work out well: more commitment to taking plagiarism seriously and better screening for RfA candidates. Last summer, the episode that prompted this proposal led to serious improvements in DYK screening. That's been a long term positive; let's hope this turns out for the best also. DurovaCharge! 23:25, 20 May 2009 (UTC)

    Frankly I don't think this matter has been thought through enough. WikiPedia is not an academic publication and it is far from clear to me that notions of "plagarism" really apply. 26 editors think this guideline is a good idea - but that does not give them or anyone else the right to impose their views on the other editors (100k, 1M, I don't know) who have not voted for it. Nor can this novel guideline be applied retrospectively. If it applies at all it should only apply to articles written after 20 May 2009.

    In addition the guideline conspicuously fails to define plagarism. The nearest thing to a definition is "Wikipedia needs proper attribution of those sources to ensure that readers are aware of who actually authored the material they see and read." But

    1. Readers of Wikipedia are perfectly aware that the concept of "who actually authored the material" does not really apply to Wikipedia articles. All they know, or care about, is that at least one editor thought that the words they are reading should be inserted and either no editors thought they should be deleted or that if they did others disagreed and the current version has the words in.
    2. Most edits are done anonymously so in most cases readers have no idea who actually authored the material they see and read. NBeale (talk) 14:16, 23 May 2009 (UTC)
    This was an RfC, publicized as all other policy and guideline RfCs at Wikipedia:Requests for comment/Policies and open for participation for anyone who cared to contribute. In terms of retrospective application, this isn't the first Wikipedia process page to address (and forbid) plagiarism. Since 2007, it has been briefly covered at copyright problems. The guideline used to conspicuously define plagiarism, but a few days ago another contributor to this page removed that definition. --Moonriddengirl (talk) 14:36, 23 May 2009 (UTC)
    That is all very well, but the fact is that 99.99% of Wikipedia Editors will never have heard of this matter until a handful of zealots come around and interfere with articles that they have written, trying to apply this guideline retrospectively. It really will not do, with an RfC vote of 26:6. NBeale (talk) 05:54, 24 May 2009 (UTC)
    What problems do you actually foresee arising from this guideline, and attempts to enforce it? Rd232 talk 10:06, 24 May 2009 (UTC)
    That's the way Wikipedia works. Those with an interest in policy and guideline are expected to keep up with what's going on themselves. And, again, plagiarism has long been forbidden. Rd232, the problem arising form this guideline and attempts to enforce it, I would guess, are viewable at User talk:NBeale and at Talk:Marie Galway. --Moonriddengirl (talk) 11:06, 24 May 2009 (UTC)
    Most of those problems are violations of Wikipedia:Copyrights. The ban on copy-pasting from non-free sources wasn't invented by this proposal. I added a section that summarizes the already existing policies about the insertion of text into Wikipedia. I'm not sure about NBeale, but in my browser the first thing under the save page button is the sentence: "Do not copy text from other websites without a GFDL-compatible license." Maybe there should be a checkbox, even better, a captcha where you would have to type in variations of that sentence before you can insert something. I'm joking, obviously, but people seem to be immune to such disclaimers. CopoCop (talk) 20:48, 25 May 2009 (UTC)
    One should note in addition to the 26 assenting editors at the RFC, the several dozen editors who have contributed in the discussion and editing of the proposed guideline over the last 10 months. These include many experienced editors (two of whom were later elected as arbitrators) - hardly a tiny group of zealots acting in secret. I think it's fair to say that the guideline represents a consensus description of the standards currently expected by the community. This guideline was not started as an effort to impose novel practice from without, rather simply as a means to codify existing community expectations, and to do so in a cetralized place that would act as a reference beyond that single sentence at WP:COPY. Franamax (talk) 23:20, 25 May 2009 (UTC)

    the scylla and charybdis

    Considering the passion some editors seem to have developed for enforcing this new guideline, we perhaps ought to be aware of the danger of inhibiting contributions if editors are trapped between this guideline and the contradictory imperative to avoid original research. Editors are often rebuked for making statements that aren’t supported by the source, and if they’re also wary of re-phrasing sentences from the source while keeping the same meaning, they might just decide not to say anything at all. While it has to be a good thing to publicise the need to avoid legal issues, and we always want to credit sources, perhaps we ought to be wary of making plagiarism as big a deal as it is in sections of academia. An alternative solution might be to loosen up on OR , but I dont personally think that s in the best interests of the encyclopaedia. FeydHuxtable (talk) 23:31, 20 May 2009 (UTC)

    That misconception is a reason to clarify the plagiarism guideline. It's easy to restate concepts in their own words without crossing the line into original research. Editors who have difficulty doing that are usually inexperienced as writers, and would do well to seek the assistance of more experienced contributors. Alternatively, passages that are difficult to paraphrase can simply be quoted--but the use of quotation marks is very important. DurovaCharge! 00:16, 21 May 2009 (UTC)
    Well, actually, it isn't easy, according to Clarke (cited above): "If such recitations stray too far from the words used by prior theorists, then the author of the new work would be subject to accusations of misrepresentation or at least inaccuracy. Hence it is very challenging to 'use one's own words' while being faithful to the sources. Paraphrasing and generic attributions are therefore tolerated. The context in any case implies that little or no originality is being claimed." [10]
    While your count may be correct, I actually had the impression the subsequent discussion might have affected some people's view on this. I think it would have been more appropriate to check for consensus again. Jayen466 15:32, 21 May 2009 (UTC)
    "It's easy to restate..." - sure, provided that the restatement remains frozen in time. But I also have to keep in mind other editors who might want to improve my less-than-perfect writing and, inadvertently, change the original author's statement. They may revert it to this author's brilliant prose, - least evil, - or, worse, remove or alter the whole point of citing the source. So for the sake of future editors, if the source is not readily available online, citing as close to original source should be preferred.NVO (talk) 07:07, 25 May 2009 (UTC)
    Then the edit history will reflect those subsequent contributors' actions, and any resulting plagiarism would be theirs rather than yours. Your comment outlines one reason why Wikipedia's good articles and featured articles have moved toward heavy sentence-by-sentence citations, yet that's more an issue for WP:V than here. DurovaCharge! 07:27, 25 May 2009 (UTC)
    I recently noticed using 2-3 cite refs per sentence - in start-class texts; one day it will be each word. The problem is, good faith copyeditors don't have the sources, neither time to check them. Even when everyone acts in good faith, there seems to be no safeguard against a collaborative fubar, short of a full FA review. NVO (talk) 11:06, 25 May 2009 (UTC)