Talk:Phylogenetic inference using transcriptomic data

	This article is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Molecular BiologyWikipedia:WikiProject Molecular BiologyTemplate:WikiProject Molecular BiologyMolecular Biology articles
???	This article has not yet received a rating on the importance scale.
	This article is supported by the Genetics task force (assessed as Mid-importance).
	This article is supported by the Computational Biology task force (assessed as Low-importance).

Evolutionary biology Mid‑importance

	Evolutionary biology portal This article is part of WikiProject Evolutionary biology, an attempt at building a useful set of articles on evolutionary biology and its associated subfields such as population genetics, quantitative genetics, molecular evolution, phylogenetics, and evolutionary developmental biology. It is distinct from the WikiProject Tree of Life in that it attempts to cover patterns, process and theory rather than systematics and taxonomy. If you would like to participate, there are some suggestions on this page (see also Wikipedia:Contributing FAQ for more information) or visit WikiProject Evolutionary biologyEvolutionary biologyWikipedia:WikiProject Evolutionary biologyTemplate:WikiProject Evolutionary biologyEvolutionary biology articles
Mid	This article has been rated as Mid-importance on the project's importance scale.

Untitled

Latest comment: 7 years ago1 comment1 person in discussion

1. I hope it is better to explain that transcriptome data is one of the most important sources for phylogeny/phylogenomics among many other methods such as genomic data, gene family based data, captured nucleotide data, expressed sequenced tags etc. So, if we use transcriptomes for phylogeny, explain what factors that we should take into account: e.g. tissue type, developmental stage, environmental conditions, depth of coverage are some ideas.

2. The last section, inferring phylogenetic relationships, is the most important section of this article. But, it needs more information. It is required to provide information regarding how transcriptomes are used in the major areas of phylogenomics (comparative transcriptomics is applied in gene expression-functional genomics, character evolution, key innovations, RNA-Seq phylogenetics) with examples (lots of research articles and external sources are available)

3. Public databases: There are some more sources worth to mention e.g. 1 kp project for plant transcriptomes, 1 kite-1000 insect transcriptome project etc.

4. In the current article, Orthology connects the link: Homology. But, that does not include information you want to provide here about the distinction between orthology and paralogy.

5. Mention more about methods of predicting orthology: start with BLAST and add clustering or building step, More information on pHMM, HAMSter etc.

6. A large number of algorithms used for orthology detection and assembly. This article mentioned some programs under the section: computationally inferring orthologs. However, I hope it is better to briefly introduce about the algorithms first and mention those programs apply them.

7. Assembly includes lots of terminology (e.g. singled end, paired end, mate pair, k-mer, scaffold) which are useful to mention with links to definitions. Hope you could also provide links to the following terms that you have already mentioned: E-value, percent alignment.

8. Use some diagrams to explain assembly methods/ bioinformatics work flow. Lots of bioinformatics steps/tools involve in transcriptome analysis. The main steps include obtaining raw data, assembly, annotation, orthology detection, compilation of multiple taxa, alignment, phylogenetic inference. A diagram including main steps and needed software/tool would be beneficial for the readers.

Overall, this is an important topic needed to be addressed in Wikipedia, main sections included in the article are satisfactory, and writing is great! I think the above aspects will also be useful to you.

Pamarasinghe 10 (talk) 20:09, 2 March 2017 (UTC)Reply

Feedback from Emily

Latest comment: 7 years ago2 comments2 people in discussion

Emilysessa (talk) 16:39, 15 April 2017 (UTC)Reply

Very nice, comprehensive page! You've covered a really impressive amount of information.

Some comments:

The Sequence Acquisition section needs to be reorganized a bit. First, sequence acquisition starts with something before assembly - actually prepping and sequencing your DNA or RNA. But you start this section with assembly, so I feel like you're jumping ahead a few steps. I think you should add a section, even it's only a few sentences, that outline the basic approaches used to gather transcriptomic data, and that's what actually comes under Sequence Acquisition. You could even just say that most of this page deals with reads obtained from RNAseq, and link to the page for RNAseq, and be done with it. But this is important - sequence acquisition does not start with assembly, so you need to tell us where the actual sequences are coming from. Also Public Databases should be included in this new section - this is a way to acquire raw RNA data, besides RNAseq, that can then be used in assembly, so this should also come in before the section on assembly. I would then give Assembly its own heading at the level of Sequence Acquisition, and below it.

Under assembly, mapping assembly is most commonly called reference-guided assembly, please include that name there.

To the sentence "However, proper identification of gene-level constructs may be complicated by recent duplications, paralogs or gene fusions", here you need to add alternative splicing again.

In the Approaches section, you have one combined link to "Phylogenetic analyses and sequence alignment ", these should be separated because each has its own page.

Check the approaches section for citations; there are a few sentences that need them.

The Multiple Sequence Alignment and Phylogenetic analysis sections... I'm actually not totally sure why you included these. There are already extensive pages on wikipedia on both topics, and the information you include here is not specific just to transcriptome data. I would like to see you edit these, removing much of what you have (sorry!) so that it only focuses on details that are specific to transcriptomics. Otherwise, I would remove these sections from the page, since they are really not appropriate to include under this topic. Ditto with Minimizing Bias; not really sure why that's here, as none of it is transcriptomic-specific. One way you could improve these sections would be by reviewing methods that are specifically phylogenomic in nature, and meant to deal with large datasets that consist of numbers of genes. These include things like Astral and SVD-quartets. You could say a few sentences about how these methods differ from typical phylogenetic inference, and then link to the pages for them, if they exist.

Finally, the page right now is showing up as an orphan, meaning nothing links to it. Please fix this; certainly the general assembly or transcriptome pages would be appropriate ones on which to find a place to link to your page. You've put in so much work, let's make sure other readers can find it!

Great job overall!

Thanks. I believe I've made all suggested edits. I ended up removing the Minimizing Bias and Phylogenetic analysis sections altogether as I didn't have any transcriptome-specific points in either. I added a number of citations to the Approaches section and believe everything is accounted for. Please, let me know if anything was missed or you have more suggestions.

Lboat (talk) 18:29, 17 April 2017 (UTC)Reply

Add topic