Talk:RIS (file format)

Latest comment: 6 months ago by RPaschotta in topic Character Encoding

Expansion edit

If someone would be willing to flesh this page out a bit to describe what the abbreviations mean it would be much appreciated.

Thanks! — Preceding unsigned comment added by 76.182.117.65 (talk) 03:27, 7 May 2008 (UTC)Reply

RIS? edit

RIS? (213.86.160.200 (talk) 17:10, 24 November 2009 (UTC))Reply

Removal of list of abbreviations edit

Please don't edit this page by removing the list of the RIS tag abbreviations is a valuable resource that is difficult to find elsewhere. Please don't remove or edit. The tags are used for import and export between various reference software packages. For instance, the open source software Mendeley does not correctly generate and export .RIS file with the correct tags. Using this page as a reference I can batch edit the RIS file tags and then successfully import it into another application saving hours of manual editing. Thanks — Preceding unsigned comment added by 132.156.149.150 (talk) 17:58, 14 November 2011 (UTC)Reply

Conflict between two different sets of RIS tags: edit

Warning of plan to make major edit: I intend to edit this page to provide an alternate list of the RIS tags. There appears to be a conflict between the version currently displayed (which includes C1-C8 as custom fields) vs. the list available from the Reference Manager user manual (which includes U1-U5 for custom fields instead), and also available through the series of pages on the Reference Manager website[1]. I believe this alternate list is more authoritative and more widely used than the one that appears on the current Wikipedia page. And indeed, it looks like the article originally used some mangled version of this tag list (the one that includes U1-U5) back in 2008[2].

The best approach I think will be to include both lists of tags, and to identify one list as more widely used, but to keep both lists there until further clarification can be made. If there is anyone following this article, then please let me know your thoughts on my plan. Otherwise, I will probably proceed with the major edit sometime over the next 7 days or so. PKiff (talk) 19:13, 10 September 2012 (UTC)Reply

WP:NOTMANUAL. There's no reason to try to fully document the format. Thomson-Reuters established and maintains the specification, and so the Reference Manager website should probably be the authoritative reference, but note that (i) they do change the spec and (ii) you're looking at an out-of-date version. The spec is now hosted at [1], and this describes the C[1-8] custom fields. There appears to be some precedence of trying to use the "custom" fields in a semi-common way (for journal articles, T-R uses C2 for PMCID, for example) & for reserving the U[1-?] fields for "user-defined" way that programs don't try to enforce. But, in any case, I would not make a radical change based on out-of-date info. --Karnesky (talk) 00:28, 12 September 2012 (UTC)Reply
I agree there is no reason to fully document the format, but I think the current listing of the format is misleading, and is not based on more up-to-date info. I am aware of the other version of the RIS spec that you cite, but I am not convinced that Thomson-Reuters own the RIS format spec, nor am I convinced that they are actively maintaining it. The specification I think dates way back to before Thomson-Reuters was in the bibliographic software game. A product called simply "Reference Manager" was created by a company named Research Information Systems (RIS), who later morphed into Institute for Scientific Information. We're talking the early 1990s here, DOS and Windows 3.1 era, possibly earlier than that. Many other bibliographic software packages started using the RIS format as a standard in the 1990s, and at that time there were no C[1-8] fields. I don't think the authoritative version of the spec has substantially changed since that time, despite the existence of this alternate version of the spec that T-R has made available. What I am referencing as "authoritative" are also documents supposedly from T-R and the Reference Manager website. Reference Manager 12, which is the most recent version of Reference Manager, and which is currently available through T-R, has a user manual in PDF format, available from the same refman website where the C[1-8] spec resides.[1],[2] The Reference Manager user manual lists the RIS format with the U[1-5], and makes no mention of C[1-8]. And if you actually export out an RIS file from Reference Manager 12, then your custom fields are tagged U1[1-5]. My theory is that the C[1-8] fields come from how Endnote tags its custom fields when it exports into the RIS format. But both Endnote and Reference Manager have been owned by the same company (T-R, and before that someone else) for some time now, and so there are actually two competing, current specs between the use of U[1-5] and C[1-8] for custom fields, and both have external references to support their validity. I might be wrong about all this: I admit to finding the whole matter rather frustrating and confusing. But if I'm right, then leaving the Wikipedia entry with just the Endnote version of custom fields as C[1-8] leaves an inaccurate list of tags which may actually create errors for someone attempting to figure out their export/import tags in RIS format.PKiff (talk) 20:34, 17 September 2012 (UTC)Reply
You are, more-or-less, correct of the origins of RIS, but not the current status. Reference Manager, the ISI, and most associated intellectual property was acquired by Thomson Scientific in 1992. Thomson merged with Reuters in 2008. The "authoritative" specification has always been linked to from [2] (the Reference Manager site!) and this has changed semi-recently from the older pages you link to to the zip file containing a PDF and MS Excel description of the format. These descriptions are, more or less, the current way that RIS is implemented in the maintained Thomson Reuters products (and surely these can be considered the current reference implementations). The most recent Reference Manager manual is dated 2008, which is before the revised spec that is on the website. Thomson Reuters, while selling and supporting Reference Manager, is not actively developing it. Other T-R formats, such as Endnote XML, are even worse from an ever-changing implementation/specification standpoint, even though they exist primarily in only a single product line. The C fields have been adopted by Zotero, Bibutils (which denotes the U fields as 'Deprecated?', along with A1,BT,CP,CT,DI,ED,ID,JA,J1,JF,JO,M2,N2,T1,VO,Y1), and other reference managers. Yes, they're new...but I'd say there are more entities than EndNote that use them. Of course this gets awfully close to WP:OR & I don't think the article would benefit from a description of the U and C tags. But you may want to have your drupal plugin import both, but export just the C tags in the future.--Karnesky (talk) 13:18, 18 September 2012 (UTC)Reply
OK, thanks for taking the time to explain all this in more detail and to do some follow-up research. While I'm still not entirely convinced about the status of where one should look to find the "authoritative" version of the RIS format, or whether or not T-R can actually make "authoritative" changes to the format, I find your points about these fields having been adopted by Zotero and Bibutils persuasive. And you are right that at this point the debate is becoming WP:OR and probably doesn't belong in Wikipedia. I'll leave the article as it stands. Cheers. PKiff (talk) 04:58, 19 September 2012 (UTC)Reply
This is somewhat misleading, as Zotero will use these tags, but the developers recognize that some of them are not part of the standard. For example, see the comments on JO here. I'd rather see some comment in this entry on how the standard has been expanded in a non-standard way- Eponymous-Archon (talk) 13:31, 15 August 2016 (UTC)Reply

Versions of the specification edit

It might be useful to note that the RIS specification was changed near the end of 2011, from a version that specified a single set of tags[3] to a (partly backwards-incompatible) version that specifies a different set of tags for each article type[4]. — Preceding unsigned comment added by Hubpedia (talkcontribs) 15:32, 15 July 2013 (UTC)Reply

Issues with the 2011 version edit

Source: refman.com on archive.org

  • the RIS-type "DATA" occurs on twice. Once on sheet "Computer Program" together with RIS-type "COMP" and once alone on sheet "Data File". Some of the properties are identical, some of the properties don't exist at the other, and some clash. (that is: the same propertyCode has a different propertyLabel. To solve this, I would use the union of both sheets, and let sheet "Data File" win the clashes.
  • On sheet "Newspaper Article", a two property occurs more then once.
    • "AN - ":"Accession Number" occurs twice for RIS-type "NEWS"
    • "SP - ":"Pages" occurs twice for RIS-type "NEWS"
  • 85 PropertyLabels start (erroneously) with a space.

Some statistics on the 2011 version edit

Source: refman.com on archive.org

  • 2142 distinct combinations of RIS-Types and RIS-Properties
  • 15 PropertyCodes exist for ALL RIS-types (The PropertyLabel of those PropertyCodes is not necessarily the same for all RIS-types)
    • AB, AD, DO, DP,KW, L1, L4, LA, N1, PY, RN, TA, TI, TT, UR
  • 3 PropertyCode exist only for 1 RIS-type
    • "C8" only for "GEN",
    • "CT" only for "FIGURE",
    • "SV" only for "CHAP"
  • There are 275 distinct PropertyLabels (ignoring leading white space)
  • 3 PropertyCode exist for all RIS-types but 1
    • "AN" not specified for "UNPB",
    • "CA" not for "FIGURE",
    • "LB" not for "NEWS"
  • There are 8 PropertyLabels with a separator (for some RIS-types)
    • "A4":"Department/Division",
    • "C3":"Size/Length",
    • "C4":"Attorney/Agent",
    • "C5":"Format/Length",
    • "PB":"Library/Archive",
    • "SN":"ISBN/ISSN",
    • "SN":"ISSN/ISBN":"VL":"Volume/Storage Container"

EdgarSchouten (talk) 15:15, 6 December 2019 (UTC)Reply

Character Encoding edit

It should be explained which character encoding is assumed for RIS. Can some programs automatically detect that encoding so that they work e.g. with UTF-8? Or should one always use ISO-8859-1, as some web pages seems to suggest?

RPaschotta (talk) 14:09, 2 November 2023 (UTC)Reply

Notes edit