Wikipedia:WikiProject CRUK/July & August Report

Diagram of three different types of blood cell - One of the CRUK images now on Commons
Diagram of what is in blood
Diagram showing stage 1 Hodgkin's lymphoma
Diagram showing the part of the stomach removed for a Bilroth 1
Diagram showing where melanoma is most likely to develop

Cancer Research UK ‘Wikipedian in Residence’ project - Second bimonthly update – for July and August 2014

Wellcome People Award no. WT103116AIA. September 8, 2014

Introduction edit

Cancer Research UK's Wellcome-funded Wikipedian in Residence, John Byrne or User:Wiki CRUK John in this role, started on 1st May 2014, on a 4-day a week contract due to run until mid-December. The overall aim of the project is to forge links and dialogue between CRUK, Wikipedia, and the wider cancer research community, and so begin to improve the cancer-related content on Wikipedia.

This report covers July and August 2014, taking the project just past its midway point.

The project focuses on four areas:

1. Research: a short project to understand a) how people use Wikipedia when searching for cancer information online, and b) how they rate the information on Wikipedia before and after article improvements. 2. Content: forging links between CRUK experts, clinicians and Wikipedians, to improve Wikipedia content. We will initially focus on improving four cancer type articles – Pancreatic cancer, Oesophageal cancer, Lung cancer, and Brain tumour – since these are the four cancers of ‘unmet need’ identified in the CRUK research strategy. 3. Creative Commons: investigating the possibility of releasing CRUK content to Wikimedia Commons under a Creative Commons license. 4. Training: To increase awareness of how Wikipedia works, and how to edit and review articles, across CRUK staff, researchers and more widely.

With a further objective to disseminate lessons learnt during the project. Progress so far:

Objective 1 – Research edit

We have been refining our ideas as to both methodology and the technical practicalities. Two pieces of research are planned, one quantitative and the other qualitative. The qualitative study will take a small group of subjects who are asked to research a cancer topic on the internet. They will be observed and tracked as they do so, followed by an interview with the researcher. We expect this will give new insights into how Wikipedia fits into such a search, suggesting answers to such questions as: how many sites do people look at? In what order? How long do they spend on Wikipedia and other pages? While such behaviours have been the subject of a good deal of general research, there is very little indeed specifically looking at Wikipedia.

The quantitative study will use a larger group supplied by an agency, and be conducted online. Again the subjects, here a much larger group, will be asked to research cancer questions, but the pages will be served to them. After the research they will be asked to complete a questionnaire about their experience of using the page.

We are continuing to work with Professor John Powell from the Oxford Internet Institute, and Dr Henry Potts from the UCL health informatics centre. A detailed protocol was produced in early September.

Objective 2 - Content edit

The process decided upon article updates (as detailed in the last report) is:

I. John and CRUK info staff go through the page together, making notes as to areas of improvement, and highlighting key references/source material. II. The output of this session is then posted into the WikiProject:Medicine talk page on Wikipedia. III. WP:Med volunteer experts then work on the article in the light of this new material, with John filling in where necessary. IV. We will then send the updated page to a clinical expert reviewer for input. V. Repeat III

All target articles have now completed stages I and II, but none have completed state III, which is rather disappointing by this mid-way point in the project.

Many of the key Wikipedia medical editors helping the project attended Wikimania 2014, which intensified the usual summer lull. Nevertheless, the pace of progress has generally been slower than hoped for, especially as the initial CRUK review confirmed that “Brain tumor” was in a significantly worse state than the other main target articles, partly because the topic includes a group of some 200 different types, some malignant and some not, about which what is a useful generalization is a tricky question.

Lung cancer is already a featured article and has been extensively revised and updated, though some further work is needed. The Oesophageal cancer and Pancreatic cancer pages have progressed within stage III, but less than hoped for by this point. Many sections are essentially complete, but others are not.

Meanwhile, there has been an unexpected addition of the article on Endometrial cancer. Independently a Wikipedian in the US, User:Keilana (Emily), began working on this in May, and has given it a thorough rewriting and updating (339 edits to date). It has received a CRUK review from the CancerHelp team, and is now a candidate for Featured Article, which for now we anticipate will be successful. Emily attended the pre-Wikimania event at CRUK, and has also been one of the editors placing CRUK images on articles.

Objective 3 – Creative Commons edit

A group of 390 CRUK images were uploaded on July 30th (see the Commons category), the actual upload kindly being done by User:Fae, a Wikimedia volunteer. These represent all the available diagrams created for CRUK’s patient-directed online cancer content managed by the CancerHelp team. The Wikipedia medical editors had been enthusiastic when told they were in the pipeline, and once they were uploaded moved quickly to place them in articles. Within a month 176 were already used in Wikipedia articles, 14 twice, which is a phenomenally quick uptake. They are SVG format files, which means that where appropriate the text labels in the images can easily be translated for use in other languages. That hasn’t happened yet, but in time it will.

John has done relatively little placement in articles, to avoid COI issues, but has done a lot of work “tagging” or placing the images in all their correct categories on Wikimedia Commons, which is necessary for them to be used to the maximum. The typical image from this group should ideally be in about 5-6 categories, so around 2,000-2400 tagging operations are needed. Well over half of these have been done, concentrating initially on the largest tagging groups. Completing the task will be more time-consuming, as the tags needed become more obscure and apply to fewer images.

The usage in articles should increase further over time, both on English and other Wikipedias, and other contexts beyond the Wikimedia projects (a contact at NHS England expressed interest when she heard of the release, as the NHS Choices site has very few such images).

The viewing figures on Wikipedia for these articles will be high, as they are now on most major cancer articles. Monthly stats should be available (with a certain delay) though a “BaGLAMa” report, for which the application has been made. The last set to run was June 2014 though, with July in production as at September 4th.

The CRUK Governance Panel to confirm that other images are suitable for release has had its first meeting. Only certain types of unproblematic images will be involved; there are many reasons why other types are not available for release, from model/patient confidentiality, uncertainty over copyright status, agreements with journals who have published research, to showing the previous CRUK brand identity, a fatal flaw as far as the Brand team are concerned.

The idea of releasing the images is not just to allow use on Wikipedia, but anywhere else. One advantage of this is that it will save CancerHelp from having to respond to the many requests for permission to reuse they already get.

A further batch of images of different types has been identified and should be uploaded in the next few weeks, with other types of images being looked at. The implications of the existing model release forms for some types of video material are being considered. Changes may be made to these for works created in the future.

The new open licenses are Creative Commons Attribution-Share Alike 4.0 International, and the arrangements by which they are formally placed and authenticated to Wikimedia are designed to remain in place after the end of the project. There is a dedicated Commons user account User:Cancer_Research_UK_uploader, and a letter from CRUK authorizes the licenses on images it uploads (and the initial batch by Fae). During the project the user account is being operated by John, but after he leaves it will be handed on to Henry Scowcroft, or someone else in CRUK.

Unfortunately most of the cutting edge false colour microscopy images produced at the London Research Institute are covered by a marketing arrangement (with the Wellcome Library) and so not available, but we hope to be given samples outside the agreement for release on open licenses.

Objective 4 – Training edit

Further presentations have been made to CRUK staff at Angel, but the only full workshop planned for this summer period was postponed after a number of people had to drop out. There will be at least three workshop sessions in September, two at Angel and the first full session at CRUK’s London Research Institute, where unfortunately the only training room with computers only takes 6 trainees. The programme of shorter presentations to staff at CRUK’s Angel HQ is now complete. Dissemination

Wikimania 2014, the annual international conference which this year was held in London over three days at the Barbican Centre, attended by over 2,000 people from around the world. This included a number of the regular medical editors, who provided an important opportunity for John, who attended all days, to meet and enthuse the many regular medical editors from around the world.

The day before the main conference, CRUK hosted a meeting with medical editors of Wikipedia which was a great success (Event page, with Wikipedian attendees from 5 continents, and CRUK people from several departments.

The programme concentrated on using accessible language, with a presentation by Henry, and a discussion of some specific issues CRUK staff had identified with some of the headings in the recommended layout and contents for articles on medical conditions at Wikipedia's Manual of Style for medical articles. The need for changes was readily agreed by those present, and is now being discussed by the community online – changes look certain to happen. At the meeting we also explained the different types of online information available from CRUK.

In the main conference programme Henry and John co-presented with others for two talks, with John presenting a further one. These were: “Medical information online; Wikipedia's place in the ecosystem”, with Dr Henry Potts and User:Doc James, a Canadian Wikipedian doctor, "What does a Wikipedian in Residence in the scientific sector do?", with the Wikipedian in Residence for the Cochrane Collaboration in the US and "Face to face editing training: is it worth the effort?" (John with two others).

There was a large group of mainly medical editors discussing issues together, and sometimes editing, for much of the conference. There were also a number of other mentions of the CRUK project by other presenters, all favourable. The project has strong supporters in the medical editing community, and has attracted plenty of attention more widely.

A CRUK press release on the project rather got pushed out by the monkey selfie story, and a round of interviews with Jimmy Wales.

At the start of September the first issue of a Wikipedia Newsletter was sent to a list of 73 CRUK staff contacts at Angel, with a different “edition” going to 16 people at the London Research Institute. This updates progress on the project, and has links to talk page discussions where CRUK expertise would be especially valuable.

Summary edit

  • We have released a substantial body of content onto Wikimedia Commons, which is being extremely well used.
  • Firmed up ideas for research
  • Considerably improved several articles