How to perform taxonomic analysis of 16s rRNA NGS .fastq files?

Question

Mattana Pongsopon am 29 Mär. 2019

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/453232-how-to-perform-taxonomic-analysis-of-16s-rrna-ngs-fastq-files

Kommentiert: Mattana Pongsopon am 2 Apr. 2019

I have raw files from Next-generation sequencing of 16s rRNA in .fastq format and I want to analyse them to obtain the OTU and taxonomy relative abundance of all the microbial species present in the sample.

Thank you.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Tim DeFreitas am 29 Mär. 2019

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/453232-how-to-perform-taxonomic-analysis-of-16s-rrna-ngs-fastq-files#answer_368130

A complete answer to this question is outside the scope of a single MATLAB Answers post, I suggest reading some published papers on various approaches to reconstructing phylogeny with 16s rRNA. Here's one such paper, though there are many others: https://academic.oup.com/nar/article/36/18/e120/1070009.

In general, you will need to perform the following series of steps:

Obtain reference sequences of the 16s gene (likely in FASTA format) for each of the microbial species you wish to test for. These can likely be obtained from public databases like the NCBI: https://www.ncbi.nlm.nih.gov/gene/?term=16s%20rrna. For particular sequences of interest, you can obtain these in MATLAB using getgetbank
Assign each of your input reads to it's closest species match. There are several methods to do so, one way is to use blastlocal using the FASTA reference sequences from step 1 as the database, and your FASTQ reads as the queries. The relative abundance of each species can be inferred from the number of matches to each of your reference sequences.
To construct a taxonomy, you must then perform a multiple alignment of the 16s gene for each of your observed species (likely a subset of your references from (1)), and construct a phylogenetic tree using the distances between each sequence. In MATLAB, this can be done with multialign, seqpdist, and seqlinkage. The definition of an OTU is not set in stone, but in general is a common set of very similar sequences. From the phytree created with seqlinkage, you can construct OTUs by providing a similarity threshold using cluster(phytree).

Feel free to ask more specific questions about any of these steps in a follow up question. If you need broader help with constructing a pipeline to do this analysis, we do offer consulting.

Hope this helps,

-Tim

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Mattana Pongsopon am 2 Apr. 2019

Hi Tim,

Thank you so much for your clear guideline. I will work through them and see if I need further help.

Best,

Mattana

Melden Sie sich an, um zu kommentieren.

How to perform taxonomic analysis of 16s rRNA NGS .fastq files?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

How to perform taxonomic analysis of 16s rRNA NGS .fastq files?

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (0)

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden