Template talk:GNF Protein box/Archive 9

Archive 5 Archive 7 Archive 8 Archive 9

minor updates

{{edit protected}} Can an admin update this page to the latest sandbox version? Only updates are to use en dash in the genome location and to use the new BioGPS URL. Thanks... AndrewGNF (talk) 17:22, 8 August 2011 (UTC)

Done. --Closedmouth (talk) 05:46, 20 August 2011 (UTC)

IDs

I would like to suggest a change in the current box in the ID section: Along with references such as GeneCard, adding reference to the Atlas of Genetics and Cytogenetics in Oncology and Haematology database for gene. Example: MLL and MLL. Disclaimer: I met the owner of the Atlas some time ago and started early september to help him to set up a fundraising system. Though I graduated to a diploma somehow about genetics, I must say the entire pages on gene, as well as this box and most of the discussion on this talk page, look like chinese to me. As such, I am not able to really establish whether it would be highly relevant or only mildly relevant at the moment and given our policies. But it is seems worth considering to me. Another link [1]. I will add a bit to the English page if I can.

Please any feedback is welcome :)

Whilst I am at it.... List of human genes... all the links to GeneCard are broken :(

Anthere (talk)

Hey there, sorry it took some time to get back to you. I've fixed the GeneCard links on List of human genes - thanks for pointing that out. Regarding the Atlas: I checked out a couple of pages on it and it seems that the majority of pages don't include significantly more information than the article/templates on Wikipedia. The Atlas pages that do have more extensive annotations may be good sources for enriching the corresponding articles on Wikipedia, but I'm not sure linking to the Atlas in every template would be useful at this point. Finally, if you have any suggestions on how to make the template easier to understand, that'd be great :) You may want to look at the Gene Wiki project portal and the discussions on there as well. Best, Pleiotrope (talk) 22:26, 28 September 2011 (UTC)
Also, I may have simply misunderstood how to use the Atlas. If there's something I missed that would be specifically important, please let me know. Thanks, Pleiotrope (talk) 22:30, 28 September 2011 (UTC)

Making the template easier to understand.... ROFL. I agree, it is awful to understand; but then, I tend to have trouble to understand any template generally :)))) I try to avoid editing any infobox as long as I can avoid to do so. I am sorry, I really do not think I can help you on this one :(

Yeah, I did a sampling of the gene database as well, and I agree that linking it by default in every template might defeat the goal and not help the reader as many gene pages do not yet bring much information. Maybe later, but not yet. What I will do is to add a couple of links to 3-4 genes they indicated had cool content, then show them how to do that themselves on other gene pages *when* their own information is rich enough to really bring a benefit.

Thanks Pleiotrope.

Anthere (talk) 14:12, 29 September 2011 (UTC)

Collapsable table and Allen brain atlas

Hello. I am new to this. Sorry if I am missing points.

1. Currently, only "Available structures" and "Gene Ontrololy" are shown as collapsable table. Is it possible to make "Identifiers", "RNA expression pattern", and "Orthologs" also collapsable? I want to show more than one genes in a same page (in our own gene Wiki effort in Japanese) then the current version is a bit long vertically.

2. Allen Brain Atlas is a collection of in situ hybridization data in brain tissue. I wonder if it is possible to include link to the data in "RNA expression pattern".

Thanks. Yasunori 16:32, 8 November 2011 (JST)

I agree that the "RNA expression pattern" pattern section probably should be made collapsable. The identifiers section is very short and contains critical information that should not be collapsed in my opinion. The orthologs section is a bit longer but again contains critical information. The Allen Brain Atlas is interesting, but it may be a bit too specialized since it only deals with one tissue, brain. . RNA expression data while valuable may in some cases mislead since there is often a poor correlation between RNA and protein expression. Hence a link to The Human Protein Atlas would be worth considering Boghog (talk) 21:26, 28 January 2012 (UTC)

PDB links

Currently PDB links are based on the PDB accession codes and also appear for the most part restricted to human proteins. Furthermore these links are not updated very often. A better solution would be to query the Protein Data Bank using the UniProt accession numbers for the human protein. Both the RCSB PDB and the PDBe support such queries (see User_talk:A2-33 for more details). An even better solution would be to use the HomoloGene accession number so that the structures of orthologs from other species would also be returned, but unfortunately the external databases currently do not support such queries.

I have created {{Homologene2uniprot}} that returns a list of corresponding UniProt codes. These codes in turn can be used to query the external databases (see for example the protein box in P53). I think it would be a good idea to replace the static lists of PBD in the GNF Protein box with dynamic queries to the RCSB PDB and PDBe. A fall back procedure could be implemented that would return links in the following order:

  1. If a Homologene parameter is defined in the template, then create links to the RCSB PDB and PDBe based on the UniProt IDs returned from the {{Homologene2uniprot}} template
  2. If the Homologene parameter is not defined but the human and/or mouse UniProt parameters, use these to create the RCSB PDB and PDBe query link
  3. If neither the Homologene nor UniProt parameters are define, use the current PDB parameter to create the links

Does this sound reasonable? If so, I will try to prototype this in the sandbox. Boghog (talk) 21:49, 28 January 2012 (UTC)

This is a good idea in my opinion. This will ensure up-to-date list of structures is available from these pages A2-25 (talk) 10:01, 30 January 2012 (UTC)
Thanks again for providing the complete Homologene to UniProt ID mappings. I have updated both the {{Homologene2uniprot}} and {{Homologene2PDBe}} templates using the complete mappings. I have also implemented the new Homologene/UniProt based PDB (including the fall back procedure proposed above) in the {{GNF Protein box/sandbox}}. Examples of the new links can be seen here (right hand side examples). Does this look OK? Boghog (talk) 18:19, 5 February 2012 (UTC)
Thank you very much the new template looks great. It may be better to show the PDB links instead of "show/hide" box. I see that when there is structure image sometimes there are links based on PDB id code and on some pages there are no links. It will be better to have a consistent behavior. A2-25 (talk)
It would also help to know pages that do not have PDB images. I can make these images available to update all pages where we can find PDB structures. A2-25 (talk) —Preceding undated comment added 20:21, 5 February 2012 (UTC).
Good suggestion about not collapsing the PBD links. The original version of the GNF Protein box did not collapse these links. The "show/hide" feature was added later because many of these lists were very long. I agree that with the new compact PDB query links, it is no longer necessary to collapse this section.
The reason why there is unfortunately no consistency in what is stored in the image_source parameter is these were added in several waves of bot edits. The first set of graphics were downloaded the the RCSB PDB. Many but not all of these figures were replaced with much higher quality PyMOL ray traced images (see discussion). Finally some of these figures and captions were added manually by individual editors who have a wide range of editing styles. Perhaps we could have another bot run to cleanup the image_source parameters. I need to think more about this. Boghog (talk) 06:57, 6 February 2012 (UTC)
Its a huge improvement, giving us an up-to-date PDB search rather than a (limited) list of anonymous ID codes. Thanks for all your work on this. I guess the default of having the structure list hidden is mostly unneeded now that there won't be a long ID code list to elongate the page?A2-33 (talk) 22:40, 5 February 2012 (UTC)
Thanks and thank you for providing the Homologene to UniProt mappings. Assembling the mapping was the most difficult part. As stated above, I think we no longer need to collapse the PDB links section. Boghog (talk) 06:57, 6 February 2012 (UTC)

Automated polishing of PDB images

Many of the protein structure images using this template seem like they could benefit from some automated polishing. For example, a structural image like the one for MMP9 could be made transparent by downloading the associated PDB file (http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId=1ITV) and following something along the lines of the instructions here. Although it would take a while for the computer running the script, it may also be nice to ray trace the images of each protein. Emw2012 (talk) 19:06, 23 September 2009 (UTC)

Great idea. I've added it to our running ideas list. If someone wants to volunteer to take this project on, speak up! Cheers, AndrewGNF (talk) 20:57, 23 September 2009 (UTC)
I may be able to help here. Could you point me to the code that would need to be modified to add this feature? Emw2012 (talk) 21:38, 26 September 2009 (UTC)
Great! Actually, I think the easiest thing would be for you to create a web service that takes as input a PDB ID, and outputs an image. Once that's complete, then we can decide what to do with it. We can wrap it in a bot to automatically replace a bunch of the default images that we uploaded (leaving alone the images that were uploaded or changed by human users). Or we can create an on-demand web interface, similar to Diberri's template tool. Or we can do a mass upload over to WikiCommons. Anyway, creating a standalone tool gives us greater flexibility in terms of how we use it later. Plus, I'm trying to avoid having one centralized monolith of code, since so much of the overall effort comes from small contributions from many people. When/if you need more feedback, I suggest we take followup discussion to the Gene Wiki discussion page. (Oh, and if you don't have access to a web server, we can certainly host your code, or you might look into Google's App Engine.) Cheers, AndrewGNF (talk) 00:46, 27 September 2009 (UTC)
Were these changes implemented? If not I can help with both images and server hosted at EBI. There was some discussion about image borders below in the page. A2-25 (talk)
For a continuation of this discussion, see here. A large number of the images originally obtained from the RCSB PDB were updated with high quality ray traced PyMOL images. However as far as I am aware, a web server was never implemented. Boghog (talk) 13:18, 12 February 2012 (UTC)
Thank you for the update. I had a look and if it is of any use we can create a server to serve the images. We anyway create images in pymol for each PDB entry so it will not be very difficult to create a server to serve these images. A2-25 (talk) 15:14, 12 February 2012 (UTC)

Edit request on 12 February 2012

Sync with {{GNF Protein box/sandbox}} to include changes as discussed here and here. (Also see testcases for test of requested changes.) Thanks. Boghog (talk) 13:10, 12 February 2012 (UTC)

As I understand it, you're asking to take out the grey border, and to replace the links in the PDB section with some other links where the PDB details can be looked up. But I noticed AndrewGNF raised an issue since this request was made. Although this has been discussed, as a non-biologist this all goes completely over my head. So my question is, are we still happy to go ahead with the change or are there any other issues that still need to be addressed? Tra (Talk) 01:35, 18 February 2012 (UTC)
I've done the border, but it looks like you're still discussing the PDB stuff. Re-enable {{editprotected}} whenever you're ready. Tra (Talk) 09:09, 18 February 2012 (UTC)
Thanks for taking out the grey border. Andrew expressed some valid concerns. So we will submit a second request once we work out a mutually agreeable solution. Cheers. Boghog (talk) 09:13, 18 February 2012 (UTC)

PDB links: take 2

Sorry for chiming in late here. I have to admit I'm not completely sold on mapping through Uniprot and homologene. Aren't we just creating another set of mapping files that needs to be maintained? I'm not passionately against it, but I'm not convinced this solves the problem either... I'll also note that Pleiotrope is close to making the PBB bot sync on a much more real-time basis with http://mygene.info, which in turn imports PDB mappings from Ensembl. Cheers, AndrewGNF (talk) 19:22, 16 February 2012 (UTC)

Hi Andrew. I think the critical point here is that the UniProt → PDB mappings are maintained by the external RCSB PDB and PDBe databases and there is no need for us to maintain these mappings. Furthermore Homologene → Uniprot mappings change relatively slowly over time and already include the vast majority of human and rodent proteins. What will change over time is the addition of more species. One thing that is not at all clear to me at this point is whether the present PDB lists are restricted to human proteins or also include othologs. If the present lists are restricted to human proteins, a far better solution would be to query the RCSB PDB and PDBe databases using the human UniProt code. This would produce continuously up-to-date list that would require no maintenance at all. Adding the mouse UniProt accession number to the query would return an up-to-date human and mouse structures. Using the Homologene mapping would in addition return structures for all homologs. The best long term solution would be if the external RCSB PDB and PDBe databases supported homologene searches, but unfortunately they do not as yet support such queries. Analogous maintenance free PDB searches are supported by the {{infobox enzyme}} and {{infobox protein family}} templates based on EC and Pfam accession number queries respectively. The proposed changes to the {{GNF Protein box}} template are low maintenance and would result in a far wider coverage of orthologous proteins than is presently provided. Boghog (talk) 20:34, 16 February 2012 (UTC)
Hi The list that I created at the EBI includes mapping to UniProt for all proteins mentioned in the Homologene data. Only when we could not find UniProt accession number that was not included in the list. This process is done weekly so as UniProt accessions become available these will be added to the list. A2-25 (talk) 21:19, 16 February 2012 (UTC)
I think the Ensembl mapping to PDB is based on UniProt<->PDB mapping so it will not be different to the list that I create. A2-25 (talk) 21:23, 16 February 2012 (UTC)
There are two issues here. First is the use of a mapping vs the current situation. Even if the mapping were not maintained, it would be a huge improvement on the current situation where there is no mapping at all and it would still result in providing more up-to-date data. For example, I've just added a structure search to Template:PBB/31 which had no structures listed even though one was published in 2008. Template:PBB/28 had around 35 structures listed, but there are currently 63 in the PDB archive. These are by no means isolated cases as I'm sure we all know.
A seperate question is whether or not orthologs should be included in the search, which in turn determines whether a homologene (or similar) mapping need to be employed or if just the Uniprot accession can be used. Personally, I think it is valuable to list homologous proteins. Most boxes seem to have only the human protein structures, but myoglobin did also include pig. Perhaps an ideal solution (and I'm not suggesting its implemented now) is to have two search links, one for human and one for orthologs.
What was the rationale for including mouse ortholog info (Entrez Uniprot acc etc) rather than any other orthologous species? This might inform the 'ortholog or not' debate. Mouse proteins are the second most populous eukaryotic entries in the PDB (after human) though A2-33 (talk) 23:29, 16 February 2012 (UTC)
Just to clarify, I strongly support including all orthologs in the PDB searches since the scope of these Gene Wiki articles encompases not only the human gene/protein but also orthologs. Furthermore I don't think it is necessary to provide separate links for human vs other species, at least for the RCSB PDB link, since the query results are tabulated by species and one can easily drill down to select structures from only one species if desired (see for example RCSB PDB Hemoglobin, beta). Boghog (talk) 04:14, 17 February 2012 (UTC)
There is an additional advantage of replacing long lists of PDB links to individual structures with query links. With the query links, we can easily include links to both the PBDe and RCSB PDB. This eliminates any arguments over which database is best to link to. Providing links to more than one database becomes very cumbersome if each individual structure is linked. Boghog (talk) 07:31, 17 February 2012 (UTC)
I agree having both links eliminates all arguments especially since Worldwide Protein Data Bank (wwPDB) was established, there is no "official" PDB site both RCSB and PDBe are part of wwPDB and maintain PDB archive. Both sites provide different services and have different focus. There are useful features on both sites. In future if wwPDB makes an official site the id codes could be linked to that site but the users will benefit from having up-to-date search.A2-25 (talk) 16:54, 17 February 2012 (UTC)
Thanks to you both for the detailed replies. I think something in me likes seeing lists of PDB IDs. Is it true that when no PDB ID exists, you still see the link to RCSB and PDBe? If so, is it also true that that the majority of gene pages will not have any relevant structures (whether or not we include orthologs)? If both are true, then I worry a bit about having two links that pretty much go to blank results pages. At least in the current system, you don't have to click through to see if any structural info is available. (The missing links of course are a perfectly valid issue, but one that can be addressed with a bot.) Should we consider a hybrid display where both the known PDB IDs and the RCSB/PDBe links are shown? Thanks for indulging me here... Cheers, AndrewGNF (talk) 05:31, 18 February 2012 (UTC)
Good point on providing large numbers query links that that don't provide any structure hits. A simple solution to this problem is to include a logical parameter similar to the current IUPHAR parameter. The PDB query links would only be displayed if the PDB_exist parameter were set to "yes". A bot could go through the PBB templates add add PDB_exist parameters as appropriate. A maintenance bot could be run once a week to examine newly released entries in the PDB database to see if any of these represent the first structure solved for a particular protein. If so, the bot could then add the PDB_exist parameter to the corresponding PBB box.
At the other extreme of the spectrum are proteins for which hundreds of structures exist. For these cases, very long lists of PDB links are not very helpful. Ideally these links should be organized, by species, resolution, sequence length, etc. and the problem becomes compounded if links to more than one database are desired. The RCSB PDB already does a very good job organizing the structures (see again for example Hemoglobin, beta). It would be very difficult if not impossible to provide similarily formatted output using Wikipedia templates. Boghog (talk) 07:45, 18 February 2012 (UTC)
I am not sure that a hybrid system displaying both direct and query PDB links would add much (and in fact would be redundant). Of course, we still can keep the current PDB parameter to allow for special cases where display of direct PDB links is desired for some reason. Also were you intending to provide structure links for human, human + mouse, or all species? I think the last option is best since it is not uncommon that for a given protein, there may no human structure available but a structure for a non-human ortholog is. Boghog (talk) 09:57, 18 February 2012 (UTC)
I'm completely non-technical and have no idea how much work it would be to construct a bot to add the links when structures are available, but for what its worth I'm not averse to lots of links which don't provide any structure hits. This still provides information as clicking a link that returns "There are no PDB entries matching your query" tells me that there are not yet any structures available. Absence of a link means either there are no structures, or that no-one has got round to putting them on the WP page, and I have to go digging to find out which.
Like Boghog, I don't think including the lists of PDB codes, even in addition to the search links, is useful as they're not informative. Gone are the days when one could request an ID code of ones choosing! A2-33 (talk) 10:46, 18 February 2012 (UTC)
I think it is much more useful to give up-to-date information. I also think both RCSB and PDB will improve their interfaces as that is what they do all the time. Over the years the way PDB data is presented to users has evolved as both these sites have implemented new ways to access data or tools for analysing data. I am not sure that replicating it in Wikipedia is good option. If it is included in Wikipedia pages it will have to be updated as the thinking on "structure representation" evolves and user demands evolve. The two sites also try and give much more information than just list of structures as discussed above. So listing PDB id codes is not a good idea in my opinion. I very much support the idea of adding links to query so users get the latest both in terms of structures and also the data representation/information. A2-25 (talk) 12:49, 18 February 2012 (UTC)
It is not at all difficult to only include those homologene ids for which there is PDB entry in the mapping file that I create. I did have such a file but thought it would be better to give all homologene ids because every week PDB entries are released and then we will need to update the mapping every week in WIkipedia if we include only those UniProt accession that have PDB entries. I am not expert in how bots work in this scenario but if it helps I can also create a server that gives UniProt accession numbers given a homologene id and also provides a flag to say if there are PDB entries. Let me know if that is a solution and I will implement a server. A2-25 (talk) 12:49, 18 February 2012 (UTC)
Sorry for the lengthy gaps in communication -- I'm away at a conference at the moment. I think the PDB data are pretty similar to the Gene Ontology annotations in the infobox. There often are a lot of them (hence the show/hide button) and they are constantly updated by GO annotators (so there sometime is a gap between what is in the infobox and what is available from the source database). I rather like the solution that we implemented there. There is a reasonably up-to-date listing directly in the infobox, and then there are links to do real-time searches at Amigo and QuickGO. In truth, the fact that we aren't exactly synced with the source databases doesn't bother me all that much. What's there is enough to give the reader a flavor of what's known as far as GO annotations, but Wikipedia will never be the source for serious researchers. I think those same characteristics are true for the PDB IDs too.
There are two differences that I can see between PDBs and GO annotations. First, as A2-25 points out, PDB IDs aren't inherently useful, whereas GO annotations are. Second, pretty much all genes have GO terms, while that isn't true (I don't think) for PDB IDs. The first issue argues for hiding PDB IDs, whereas the second factor argues for not having _only_ the search links. I'm still conflicted about how to weight these...
On Boghog's bot suggestion, I do think a bot is the right answer here. But rather than have a bot update a flag for whether there are any valid PDBs available, why not just insert the valid PDB IDs directly so we're never more than a week out of date? Then we could adjust the template so that _if_ PDB IDs are listed then the RCSB/PDBe search links are shown, and then the actual PDB IDs can be hidden by default behind a show/hide link. Thoughts on this idea? Cheers, AndrewGNF (talk) 04:11, 19 February 2012 (UTC)

Andrew, I think your last suggestion is a good compromise as long as orthologs are included in the list of individual PDB links. As mygene.info supports homologene queries, this would appear possible, but I would like to confirm that this is what you had in mind.

As a side note, I think in certain situations, Wikipedia can be used for serious research. A Homologene based PDB search is one example. A bioinformatician could probably quickly write a script to do such a search, but for other consumers of this data such as structural biologists and computational chemists, preforming such a search would be cumbersome and time consuming. Finally I would like to point out that interactions between Wikipedia editors and external database administrators has been very productive. Requests from Wikipedia editors have lead to some significant enhancements in the search capabilities of the external databases that are useful for serious research. See for example this request and this and this response. Boghog (talk) 09:58, 19 February 2012 (UTC)

I agree with Andrew that having a PDB id code is nice because it gives immediate access to a structure and having search gives access to up-to-date information and representation. So my suggestion would be to ensure there is image for the protein whenever there are PDB id codes and link the PDB id code to one of the PDBe, RCSB or PDBsum site. This will give immediate access to a structure and then instead of listing more pDB id codes, have search links without the hide/show. This way we solve the problem of giving access to a structure without going through the search. How we choose which image to give can be as follows - we can get the combination of highest resolution (if there is X-ray structure) and maximum coverage in UniProt sequence (i.e. to avoid giving structure of a domain when structure of bigger construct is available). We then take one of the assembly images for such pdb id code and show that. I can create a server that gives this kind of information and associated image. This way I think we can give a meaningful structure image and not have to list all PDB id codes which as someone has pointed out have no meaning unlike GO accessions. Any comments? A2-25 (talk) 10:01, 19 February 2012 (UTC)
I do not think PDB id codes are same as GO id's because each GO id has a meaning and is a concept, while PDB id code is an experiment could be on the same protein or different protein - the id code does not say anything about it. So having a list of PDB id codes is not same as having a list of GO ids and its description. A2-25 (talk) 10:20, 19 February 2012 (UTC)
As I stated above, I think including a list of individual PDB links is redundant, but as long as their display is collapsed by default, I do not have any strong objections to including them. Regardless of whether individual PDB links are included or not, I strongly support A2-25 suggestion of making sure that that a relevant image is provided along with description of the structure (species, domain(s), etc.) and a PDBe link in the image caption. Maximum coverage of the UniProt sequence in my opinion is far more important than the resolution of the structure, especially considering that cartoon diagrams are used to depict the structure and the display of side chains is normally hidden. Of course, all other things being equal, choose the higher resolution structure. If you can set up a server to provide such structures, that would be fantastic. We can set up a bot on our end to add these images. Boghog (talk) 10:39, 19 February 2012 (UTC)
I agree that UniProt coverage is more important than resolution. If everyone agrees I will set up the server and let you know. I am planning to do the following - Given a homologene id the server will search for the most complete (based on UniProt coverage) structure and give the PDB id code and a link to the image from PDBe. In addition it will give the taxonomy id and name for the selected pdb id code. Do you want me to include the Pfam domains in the return? It will also send a list of all other pdb id codes for the given homologene id. What other information might be useful? Is there a convention for format for such information or shall I come up with a format and let you know? A2-25 (talk) 16:29, 19 February 2012 (UTC)
Super, I like how this is shaping up. Selection of the "best" PDB image to show has always been an issue -- for lack of a better system, I think we default to showing a random structure (or the first alphabetically). A2-25, if you create a web service (JSON output preferred) that takes a homologene ID and outputs a preferred image, then I think we can prioritize that one in our template code generation tool (click "show template code" -- example shown for CDK2). We will continue to do mine the full list of PDBs from Ensembl (though truthfully Boghog I'm not sure how we handle orthologs. Will need to check on that later...)
One detail of the implementation to discuss. I don't think we have a bot that keeps PDB thumbnails in Wikimedia commons in sync with the PDB. So what happens when the "best" structure hasn't had its thumbnail uploaded? I seem to remember a previous editor wrote some custom rendering code to improve image quality -- Boghog, do you remember the details of this? I'm not sure if we have the bandwidth to implement an auto-upload on our end (but happy to provide repository access if someone wants to add that feature to the tool linked above). Cheers, AndrewGNF (talk) 19:39, 19 February 2012 (UTC)
Previous work by User:Emw who runs User:PDBbot to create high quality protein images is described here and here. Just to reiterate, I think including orthologs in the PDB list is very important. The PDB query links that we have set up return structures for orthologs and there are many cases where a human structure is not available but an ortholog is. If the PDB lists do not contain orthologs, the display of many PDB queries that would otherwise return structures would be suppressed. Boghog (talk) 20:54, 19 February 2012 (UTC)
Ok I will implement the server to give best structure and a link to image. I will also include taxonomy and pfam domain information. This will be returned as JSON and the query will be based on homologene id. The images link will be for assembly image that PDBe generates for their portfolio widget. So I am clear what we are going for -
  1. PDB image based on "best structure" discussion above. This image will have a legend with PDB id code linked to PDBe.
  2. A search link (as described above by boghog) to PDBe and RCSB based on homologene <-> UniProt mapping i.e. I will maintain the file I have started generating.
  3. We will have list of PDB id codes with show/hide. This information will be obtained from Ensembl.
Let me know if I have misunderstood something otherwise I will go ahead and start working on the query server. A2-25 (talk) 21:09, 19 February 2012 (UTC)

This sounds perfect! Including these images, captions, and search links in Gene Wiki articles will be a huge improvement. The PDB id codes that you supply would presumably also include orthologs so that suppression of valid PDB search queries would not occur. Boghog (talk) 21:41, 19 February 2012 (UTC)

Yes the PDB id codes will be based on UniProt accession numbers obtained for each homologene id so will contain orthologs where there is UniProt accession available. — Preceding unsigned comment added by A2-25 (talkcontribs) 21:47, 19 February 2012 (UTC)

Help in generating GeneAtlas Image

Hi Andrew. I have been looking for help file describing how to generate a GeneAtlas expression pattern images (both full size and thumbnail). The protein in question is myelin-associated glycoprotein, which has an entry in BioGPS. Can you please point to a place where I can look up? Thanks. Yasunori Hayashi, RIKEB Brain Science Institute. — Preceding unsigned comment added by 113.146.100.23 (talk) 03:09, 26 February 2012 (UTC)

Hi Yasunori, the gene atlas image can be found at http://biogps.org/gene/4099 (the data for 216617_s_at seems better than for 217447_at). Unfortunately, I think we no longer have the code to create the thumbnails that are standard in the Gene Wiki infoboxes. Hmmm, we'll need to fix that sometime... Sorry... Cheers, AndrewGNF (talk) 00:48, 1 March 2012 (UTC)

Image borders

Is there a reason why the images of this have a 1 pixel grey border? Said borders are unusual in Wikipedia. The only purpose I personally use them for is images that have a white background and thus look weird without something separating them from the almost white background of infoboxes. However, molecule pictures should not have such a background.

I actually was coming here to perhaps remove the border and see how it looked, but I see the template is locked. My primary contributions to Wikipedia as of late are making it look aesthetically better.

I know this isn't a pressing concern, but I'd love a reply. — trlkly 17:40, 3 February 2012 (UTC)

I have removed the border in the sandbox and an example can been seen here (right hand side examples). I don't have any strong feelings one way or the other which looks better. I would be interested in hearing what others think. Boghog (talk) 18:22, 5 February 2012 (UTC)

I don't have any strong views on this either.A2-33 (talk) 21:05, 8 February 2012 (UTC)

Thanks, guys. I just now saw this on my watchlist. I'll go ahead and keep removing the white backgrounds that aren't present on other computer-generated images like that. I'm pretty sure their existence is why the border was added in the first place. — trlkly 06:47, 12 March 2012 (UTC)

PDB links: take 3

I have modified the {{GNF Protein box/sandbox}} so that PDB search links are only displayed is the PDB parameter is defined. I also modified the display to distinguish between search links and the individual PDB IDs and the result may be seen in the testcases (right hand side infoboxes). Please note that if the Homologene parameter has not been defined, then the search is instead based on the human UniProt accession number and the title of the link changes to "UniProt search". Ideally I would like to display the query links uncollapsed and the structure IDs collapsed (collapse a single row), but I do not know how to accomplish that. Does this look OK? Boghog (talk) 16:11, 3 March 2012 (UTC)

Can we have the search based on UniProt accession visible and only pdb id codes hidden. Can you also have a look at -
The only issue with JSON is the url for image is not correct but that is minor thing to change. Can you let me know if the rest of JOSN looks ok? A2-25 (talk) 17:17, 3 March 2012 (UTC)
The JSON looks great! The only potential problem (and I apologize for not pointing out this before) is there are apparently some human genes for which a homologene id has not been assigned (18,981 human genes have been grouped into homogene families which would leave a few thousand ungrouped human genes). See for example DEFA6 (homologene Entrez gene ID 1671) for which there are two crystal structures available. To handle these cases, would it be possible to query the jsonizer server for a UniProt ID and return the best structure? Boghog (talk) 18:13, 3 March 2012 (UTC)
As stated above, I would really like to collapse only the PDB structure IDs and not the search links, but I have not figured out a way to doi this. Are there any HTML table experts out there that could help? Boghog (talk) 18:38, 3 March 2012 (UTC)
We can have method to search based on UniProt id. WIll let you know once we have it implemented.A2-25 (talk) 21:17, 3 March 2012 (UTC)
I have checked and we have a method to get information based on UniProt accession number - http://wwwdev.ebi.ac.uk/pdbe-apps/jsonizer/mappings/Q01524/. Will this work for you? It does not give best structure but gives all structures and has resolution may be I can get UniProt coverage added both start and end residue in UniProt and number of residues. A2-25 (talk) 21:22, 3 March 2012 (UTC)
That comes close and given the relatively small percentage of structures involved, we can probably live with that. Boghog (talk) 23:18, 3 March 2012 (UTC)
I may be wrong but the table is collapsible because of 'table class="collapsible collapsed" part of the table so it will work if you create a new table without collapsible class and move the search part td to the new table. If you want I can try it out in sandbox and see if it works. A2-25 (talk) 21:36, 3 March 2012 (UTC)
  • We can certainly create two separate tables for the PDB links. But logically, related links should be kept in the same table. Otherwise the PDB wiki link would need to be duplicated. Boghog (talk) 23:05, 3 March 2012 (UTC)
  • In fact, I already tried splitting the PDB links into two tables in this version, but reverted it because it didn't look right. Boghog (talk) 23:11, 3 March 2012 (UTC)
  • Another option is to collapse the table only if the list of PDB IDs is very long (e.g., class= {{str ≥ len | {{{PDB}}} | 38 | "collapsible collapsed" | "collapsible" }}), but that is not an ideal solution either. Boghog (talk) 23:32, 3 March 2012 (UTC)
  • Another option is to add a table under td for list of id codes with style collapsible. e.g.
html code
<table style="border: none; padding: 0; margin: 0; width: 100%; text-align: left">
<tr style="background-color: #ddd; text-align: center">
<th colspan=2>Available structures</th>
</tr>
<tr>
<th rowspan = "2" style="background-color: #c3fdb8; width:43px">[[Protein Data Bank|PDB]]</th>
<td colspan = "2" style="background-color: #eee"> {{#if:{{{Homologene|}}}|
Ortholog search: [http://www.ebi.ac.uk/pdbe/searchResults.html?display=both&term={{Homologene2PDBe|{{{Homologene}}}}} PDBe], [http://www.rcsb.org/pdb/search/smartSubquery.do?smartSearchSubtype=UpAccessionIdQuery&accessionIdList={{Homologene2uniprot|{{{Homologene}}}}} RCSB]|
{{#if:{{{Hs_Uniprot|}}}|
Human UniProt search: [http://www.ebi.ac.uk/pdbe/searchResults.html?display=both&term={{{Hs_Uniprot}}} PDBe], [http://www.rcsb.org/pdb/search/smartSubquery.do?smartSearchSubtype=UpAccessionIdQuery&accessionIdList=::{{{Hs_Uniprot}}} RCSB]
</td>
}}
}}
</tr>
<tr>
<td>
<table class="collapsible collapsed" style="border: none; padding: 0; margin: 0; width: 100%; text-align: left">
<tr style="background-color: #ddd; text-align: center">
<th colspan=2>List of PDB id codes</th>
</tr>
<tr>
<td colspan = "2" style="background-color: #eee">
Structure IDs: {{{PDB}}}
</td>
</tr>
</table>
</td></tr>
</table>

A2-25 (talk) 09:23, 4 March 2012 (UTC)

Much better! I have implemented your suggestion in the sandbox and the result may be seen in the testcases (right hand side infoboxes). Thanks for the suggestion. Boghog (talk) 09:38, 4 March 2012 (UTC)
That looks good in my opinion. So are we now waiting for the UniProt search to have coverage in JSON? A2-25 (talk) 10:55, 4 March 2012 (UTC)
This is definitely an improvement. At the moment I see a few templates where the PDB ids have been fully replaced by the search links. Your test cases show the search links encoded into the template while the PDB ids remain as field values; when ProteinBoxBot updates the page, those search links will be replaced with current PDB ids. As the search links will eventually be part of the template, I believe this is acceptable behavior, but I can modify the bot to avoid overwriting those fields if needed. Pleiotrope (talk) 23:29, 7 March 2012 (UTC)
Yes, it might be better to modify the bot as it isn't always providing current ID codes. In the case of Template:PBB/8736‎, the update replaced the links with 1 PDB ID, even though there are 4 structures from this Uniprot ID available. In the case of Template:PBB/1586 the bot populates the structures section with a PDB code for a theoretical model which is no longer present in the PDB (and has not been for 10 years), but ignores the 2 experimentally determined structures. A2-33 (talk) 17:39, 9 March 2012 (UTC)

It seems many Protein boxes got updated automatically and the search links were removed. Is this temporary because of automatic updates? A2-25 (talk) 17:11, 9 March 2012 (UTC)

Regarding the out-of-date PDB IDs, the bot pulls the information from a gene information collating service called mygene.info. If the PDBs are out-of-date or incorrect, I'll let the developer know and we can get that fixed. And A2-25, yeah, the bot just updates automatically. To prevent bot overwrites, you can add a
{{nobots}}
template somewhere in the wikitext and PBB will ignore it, though in this case, aren't the search links going to be written into the template code itself, complementary to the ID codes? In any case, if you prefer, I can have PBB ignore the PDB fields entirely until this is resolved. Pleiotrope (talk) 18:15, 11 March 2012 (UTC)
Since everyone now appears satisfied with the look of the PDB links in the sandbox, I will now request that this feature be added to the production version. Boghog (talk) 19:00, 11 March 2012 (UTC)
I am assuming that the server does not require any changes. Thanks to Boghog and everyone else for helping out. A2-25 (talk) 20:10, 13 March 2012 (UTC)
The server is a somewhat separate question from enabling the links. I think what Andrew had in mind is to use your server to add PDB images for new Wiki Gene articles which I think is great. In addition, there are existing Wiki Gene articles for which PDB structures are available, but no image has currently been added to the article. Furthermore, there are non optimal images (images based on structures of small fragment of the protein despite the availability of much more complete structures). Finally the image captions need to improved so that it is clear what species the structure represents and what domains the structure contains (see this discussion). I am busy in real life at the moment, but my intention is to use your server and BogBot to address the last three concerns. But my efforts need to be coordinates with both User:Emw and User:Pleiotrope. The output of your server at the moment looks perfect, but we might request tweaks after we get started. Boghog (talk) 20:58, 13 March 2012 (UTC)

Edit request on 11 March 2012

Sync with {{GNF Protein box/sandbox}} (diff) to include PDB query links under the "Available structures" section of the infobox as discussed here. (Also see testcases for test of requested changes.) Thanks. Boghog (talk) 19:00, 11 March 2012 (UTC)

  Done Tra (Talk) 21:46, 15 March 2012 (UTC)

Adding ChEMBL Links

Hi, after the success of adding ChEMBL compound links to Wikipedia, we would like to extend our links to the protein pages. How would we about doing this and what would we have to provide? We are linked to UniProt, so we can give you a CSV file of the UniProt links to ChEMBLIDs? It would be great to start creating reciprocal links to our pages, such as: http://www.ebi.ac.uk/chembldb/target/inspect/CHEMBL276 thanks, Louisa (ChEMBL team) Louisajb (talk) 11:16, 22 February 2012 (UTC)

In principle this is of interest. These links would provide information about ligands that interact with proteins of pharmacological interest and would be complementary to the IUPHAR links that we already provide. In the vast majority of cases, we do not have separate Gene Wiki pages for orthologs of the same protein (e.g., we have only one article for CHRM1). In order to keep things simple, my suggestion is that the links be limited to human orthologs (e.g., link to human CHRM1 instead of rat CHRM1). Assuming there are no strong objection from others, we could add links from Gene Wiki articles back to the ChEMBL protein targets. What we would need is a mapping of human UniProt IDs to ChEMBL protein targets. Boghog (talk) 17:09, 25 February 2012 (UTC)
I have added the ChEMBL links to the sandbox and the new links can be seen here (the two right hand side infoboxes). Boghog (talk) 17:27, 25 February 2012 (UTC)
I agree this will be very useful information to add and make available from these pages. Thank you very much. A2-25 (talk) 20:45, 25 February 2012 (UTC)
This looks great. Where shall I send the mapping to? I can create this as xls, txt, csv etc. Thanks again Louisajb (talk) 15:19, 29 February 2012 (UTC)
One minor change, please could we have the full ChEMBL_ID number shown in the box. I am aware that it would look like this 'ChEMBL: CHEMBL2569' but it would keep the ChEMBL ID consistent with the compound pages and maybe prevent user confusion? Thanks again. Louisajb (talk) 16:16, 29 February 2012 (UTC)
You can send the links to me (boghog at me dot com, where "at" and "me" are replaced by "@" and "." respectively) and I can run a bot to add these links to the PBB templates. Boghog (talk) 06:24, 1 March 2012 (UTC)
Hi, have you been able to add these links? I have tried to find some but I can't seem to see them. Thanks Louisajb (talk) 12:07, 2 May 2012 (UTC)
Sorry for the delay. User:BogBot has now started to add these links. Boghog (talk) 04:42, 4 May 2012 (UTC)
Not a problem, I was just checking. The links look brilliant. Thank you! Louisajb (talk) 16:52, 8 May 2012 (UTC)
Hi, is there a list somewhere that I can get hold of that I can use create reciprocal links with from ChEMBL to these pages? I used the Uniprot ID and ChEMBL ID but not all of the Uniprot IDs will link out to their protein pages. Thanks Louisajb (talk) 10:16, 18 June 2012 (UTC)

More than one EC number

Hi. There are some enzyme where more than one EC numbers can be defined. But as the template is now, the EC field does not seem to accept more than two numbers. The link shows error. Can you fix this?

Please see this example: Serine_racemase.

Thank you very much.

Yasunori Hayashi. RIKEN Brain Science Institute

Correct me if I am wrong, but I believe that currently the ECnumber parameter is intended to support one and only one EC number. I do think it would be a good idea to allow the ECnumber parameter to handle more than one EC number since many proteins have more than one enzymatic activity. However I don't think there is an easy way of implementing this because of limitations in the template scripting language. Perhaps we could define another parameter called "ECnumbers" that would accept any number of {{EC number}} templates in analogy to the way the PDB parameter is currently implemented. If we change the current ECnumber parameter, then a bot would need to go through all the PBB templates to update the ECnumber to use {{EC number}} templates. Boghog (talk) 14:11, 1 April 2012 (UTC)
Some time ago, I modified the bot to handle multiple EC numbers, but I see now that it breaks the link. Boghog, if you want to make the changes to the template sandbox and get them pushed through, I'll modify PBB accordingly. Pleiotrope (talk) 19:45, 3 April 2012 (UTC)
Thanks Pleiotrope. Modifying the ECnumber parameter would be the best solution. I have edited the ECnumber parameter in the sandbox template so that it will now accept as an argument any number of {{KEGG enzyme 2}} templates instead of the current direct hard wired link. See the testcases to verify this works. If everyone is in agreement, I will request that an administrator update the template. Cheers. Boghog (talk) 05:41, 4 April 2012 (UTC)
I went back a revision to see the ECnumber change and it looks good. If this is still present in the sandbox I'm all for updating the template. Thanks for getting that working! Pleiotrope (talk) 19:21, 6 April 2012 (UTC)
The proteins in the current testcases only have one EC number. I entered three numbers in the test case just to verify that it works and it does but I went back to one number in the testcases because I did not want to create any confusion. The sandbox version will still take any number of EC templates. Boghog (talk) 21:01, 6 April 2012 (UTC)

Edit request on 4 May 2013

A request was made above to allow more than one EC number. This was never implemented as proposed above because it would require changes both to the template and to the data in the transcluded templates. The new WP:Lua based Module:String now provides a much more elegant solution that only requires changes to the template and not to the data.

The following change made in the sandbox allows linking to more than one EC number. The sandbox test cases demonstrate that multiple EC number link as implemented in the sandbox is functional. Also the change is backwards compatible (i.e., still works with a single EC number, see second row in test cases). Boghog (talk) 08:35, 4 May 2013 (UTC)

  Done — Martin (MSGJ · talk) 12:04, 9 May 2013 (UTC)

Categories

This templates add articles that use it to content categories so if it is used on user pages they then turn up in content categories. See Category:Genes on chromosome 20 for example. I thought the use of automatically adding categories using templates was discouraged. -- Alan Liefting (talk - contribs) 05:38, 12 June 2012 (UTC)

It's discouraged, but in the absence of any heads-up at all it's not productive simply to remove it unilaterally. The project should be informed if all of these categories are to be depopulated so that someone can start adding them manually. Chris Cunningham (user:thumperward) (talk) 08:33, 12 June 2012 (UTC)
In this particular case, automatic addition of gene categories make a lot of sense. These templates were created as part of the Gene Wiki project whose purpose is to create Wikipedia articles for genes coding human proteins. Each of these genes is unambiguously located on a single human chromosome and therefore each gene can unambiguously be assigned to a specific gene category. We could run a bot to explicitly add these categories to the individual articles, but what is the point? The end result (i.e., the population of the categories) would be identical. The proposed change creates a lot of unnecessary work. The only downside to the present system is that certain sandboxes have ended up in categories. These sandboxes have not been used in some time and should be deleted. Boghog (talk) 05:17, 13 June 2012 (UTC)
Deleting the sandboxes is fine by me since it achieves the aim of getting user pages out of content categories. -- Alan Liefting (talk - contribs) 05:52, 13 June 2012 (UTC)

I understand that the volume of articles involved makes categorization via template more palatable than usual in this case, but there are still something like 75 non-mainspace pages in Category:Human proteins as a result. Can we get some category suppression in place to avoid categorizing non-mainspace pages via the template? Maralia (talk) 17:03, 28 January 2013 (UTC)

I've updated this template and also {{GNF Ortholog box}} so that they don't categorize user pages anymore. -- WOSlinker (talk) 17:37, 9 May 2013 (UTC)
Brilliant, thank you! Cheers, Andrew Su (talk) 16:31, 15 May 2013 (UTC)

URL Edit request

I am a developer for the HGNC and we have changed a few of our URLs on our site. The old URLs still work thanks to redirects but we would like the URL for the HGNC gene symbol request pages in the GNF protein box changed as seen in the sandbox (diff).

The changes we have made have been noted in our New features and changes page

Many thanks,

Kristian Gray 12:08, 27 January 2014 (UTC) — Preceding unsigned comment added by KrisGray (talkcontribs)

Template-protected edit request on 7 March 2014

I am a developer for the HGNC and we have changed a few of our URLs on our site. The old URLs still work thanks to redirects but we would like the URL for the HGNC gene symbol request pages in the GNF protein box changed. These are the changes we would like to see:

| label4 = Symbol | data4 = {{{Symbol}}}{{{AltSymbols}}}

changed to

| label4 = Symbol | data4 = {{{Symbol}}}{{{AltSymbols}}}

Many thanks, Kristian Gray



As we are the official group (HUGO Gene Nomenclature Committee HGNC) that approves gene symbols how do we achieve "consensus" for the incorrect link on the Symbol label to change? Many thanks for the fix on the URL to our site.

Kris Gray

Kristian Gray 15:03, 7 March 2014 (UTC)

  Partly done: Human Genome Organisation is a valid non-redirect target, so you will need a consensus to change that, but I have updated the external link for you per your request and made a few other minor formating changes to show that it is an external link and properly show AltSymbols as (AltSymbols) — {{U|Technical 13}} (tec) 16:16, 7 March 2014 (UTC)
I might suggest bringing it up on the talk page for Human Genome Organisation as an RfC on whether HUGO Gene Nomenclature Committee HGNC should be the primary topic as Human Genome Organisation appears to be a disambiguation page. — {{U|Technical 13}} (tec) 17:51, 7 March 2014 (UTC)

Edit request, 17 April 2014

The word "ontology" in Gene Ontology should be decapitalized per Gene ontology since it's not a proper noun. Brandmeistertalk 13:43, 17 April 2014 (UTC)

  Done Jackmcbarn (talk) 19:20, 17 April 2014 (UTC)


Requested move 21 July 2014

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move review. No further edits should be made to this section.

The result of the move request was: no consensus. Proposed titles created as redirects. Jenks24 (talk) 10:53, 13 August 2014 (UTC)



Template:GNF Protein boxTemplate:Infobox GNF protein – Per other infobox templates and sentence casing (or is "GNF Protein" a proper noun?). Relisted. Jenks24 (talk) 12:03, 6 August 2014 (UTC) Relisted. Jenks24 (talk) 03:56, 29 July 2014 (UTC) Sardanaphalus (talk) 11:30, 21 July 2014 (UTC)

Survey

Feel free to state your position on the renaming proposal by beginning a new line in this section with *'''Support''' or *'''Oppose''', then sign your comment with ~~~~. Since polling is not a substitute for discussion, please explain your reasons, taking into account Wikipedia's policy on article titles.

Discussion

Any additional comments:
  • Comment@Sardanaphalus: Per consistency between infoboxes, this renaming proposal is probably a good idea, but I would encourage you to directly contact the bot operator @Andrew Su: Boghog (talk) 18:24, 21 July 2014 (UTC)
  • Comment -- As the operator of the bot that maintains these templates and the person who I think originally created the template, I would support a name change. However, "GNF" should be removed. The function here is in principle the same as {{Infobox protein}}, so perhaps "Infobox protein 2" or "Infobox protein full"? Cheers, Andrew Su (talk) 18:35, 21 July 2014 (UTC)
A couple of alternative suggestions:
I would oppose merging with {{infobox protein}}. The later is much more compact and is useful to include on protein family pages where as {{GNF Protein box}} is appropriate for gene/protein specific pages. Boghog (talk) 22:21, 29 July 2014 (UTC)

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page or in a move review. No further edits should be made to this section.