How to correct nt2aa to skip codons with gaps?
Ältere Kommentare anzeigen
Hello!
I am using MATLAB to analyze a large number of gene sequences in a .fasta file. Part of my analysis then requires the amino acid sequences coded by the genes. I am using the nt2aa function in MATLAB. However, at least one of the sequences has a gap in at least one of its codons (A-A). As such, I am receiving the following error:
"Error using nt2aa (line 116) The sequence includes a codon A-A containing a gap. Gaps are supported only when a complete codon is made up of gaps (---)."
Any suggestions as to how I may be able to get around this? I am very hesitant to start messing with MATLAB's nt2aa function.
Thank you in advance for all of your time and attention!
Best,
Kendall
Antworten (1)
I don't know nt2aa, but I just had a fast look. Do you want to:
- Modify nt2aa so it eliminates codons with gaps? Not sure what the license says about it, but I guess that it could be done.
- Find a specialist who could tell you how to do it correctly with the bioinformatics toolbox? In that case, you might want to check what folks from the newsgroup have to say I guess. It is certainly possible, maybe even with nt2aa as its seems that it has features for managing ambiguous sequences.
- Build some solution by yourself to pre-process or post-process your codons/AA chains?
If you are game for the latter option, we can discuss some solution a bit in the style of this post.
For example, if you have your codons in a cell array like
NT = {'AAA','AAT','AAG','A-T','AGC','--G'} ;
you can easily find cells that contain a codon with one or more '-':
>> hasDash = cellfun(@(x)any(x=='-'), NT)
hasDash = 0 0 0 1 0 1
and remove these cells:
>> NTclean = NT ; % In case you want to keep
>> NTclean(hasDash) = [] % the original cell array.
NTclean = 'AAA' 'AAT' 'AAG' 'AGC'
Then you can feed nt2aa with the 'cleaned' version of NT:
>> AAclean = nt2aa(NTclean)
AAclean = 'K' 'N' 'K' 'S'
If you wanted to insert empty cells in AAclean afterwards at locations where there were codons with gaps (to have a record), you could do as follows:
>> buffer = 1:numel(NT) ;
>> validId = buffer(~hasDash) ;
>> AA = cell(1, numel(NT)) ;
>> AA(validId) = AAclean(:)
AA = 'K' 'N' 'K' [] 'S' []
Cheers,
Cedric
Kategorien
Mehr zu Nucleotide Sequence Analysis finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!