How do I make a cell with the following contents?

1 Ansicht (letzte 30 Tage)
Mohannad Abboushi
Mohannad Abboushi am 15 Jan. 2017
Kommentiert: Guillaume am 16 Jan. 2017
I am making a program that basically takes a string s as a single strand of DNA and returns the amino acid sequence of the longest gene it finds. Whereby, a gene is defined as a reading frame that: starts with AUG codon, ends with one of UAA,UAG, or UGA codon.
I tried making a cell of different "frames" but since they are not the same length I can't put them into an array. How do i work around this? Here's my code:
function [ptn]=Seq_transcribe2(x)
y=seq_transcribe1(x);
frames={};
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
starts=[];
stops=[];
allorfs={};
for i=1:3:numel(frames)-2
codon= frames([i i+1 i+2])
if codon=='AUG'
starts(end+1)=codon;
if strcmp(codon,'UAA') || strcmp(codon,'UAG') || strcmp(codon,'UGA')
stops(end+1)=codon;
end
stops= find(stops>starts,1)
lengthofthisstart=stops-starts
allorfs{end+1}=frame(starts:stops-1)
  2 Kommentare
Niels
Niels am 15 Jan. 2017
would be helpfull if you add the error message
Mohannad Abboushi
Mohannad Abboushi am 15 Jan. 2017
Error using vertcat Dimensions of matrices being concatenated are not consistent.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Guillaume
Guillaume am 15 Jan. 2017
If I understood correctly, a simple way to find all genes would be:
[genesequences, starts, stops] = regexp(x, 'AUG.*?(UAA|UAG|UGA)', 'match', 'start', 'end');
And the longest sequence is of course:
[~, longestidx] = max(stops - starts);
longestsequence = genesequences{longestidx}
  2 Kommentare
Arthur Goldsipe
Arthur Goldsipe am 16 Jan. 2017
I think you need a slight change to account for the fact that all codons are 3 characters long:
[genesequences, starts, stops] = regexp(x, 'AUG(...)*?(UAA|UAG|UGA)', 'match', 'start', 'end');
Guillaume
Guillaume am 16 Jan. 2017
Oh yes, as I know nothing about genes and codons, I didn't know that the number of characters between the start and end codon must be a multiple of three, but I should have inferred that from the original code.
Thanks.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Niels
Niels am 15 Jan. 2017
if i understood you right your problem is in one of the following lines:
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
allorfs{end+1}=frame(starts:stops-1)
if so, i cant replicate your problem, in cell arrays the length of the elements is irrelevant
>> a=1:3;
>> b=1:4;
>> c=1:5;
>> cell={a b c}
cell =
1×3 cell array
[1×3 double] [1×4 double] [1×5 double]
%=================================
>> a={}
a =
0×0 empty cell array
>> a{end+1}=1
a =
cell
[1]
>> a{end+1}=2
a =
1×2 cell array
[1] [2]
>> a{end+1}=[2 1]
a =
1×3 cell array
[1] [2] [1×2 double]
  2 Kommentare
Mohannad Abboushi
Mohannad Abboushi am 15 Jan. 2017
I feel like there's got to be an easier way to do this
Guillaume
Guillaume am 15 Jan. 2017
If the following line
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
is indeed written on two lines, then yes matlab is going to issue a concatenation error since the line return is interpreted as a vertical concatenation.
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) y(3:end)};
or
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) ...
y(3:end)};
would fix the error

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Bioinformatics Toolbox finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by