Getting strings to combine multiple times

8 Ansichten (letzte 30 Tage)
Matthew Zehner
Matthew Zehner am 28 Apr. 2016
Kommentiert: Walter Roberson am 28 Apr. 2016
Ok I have a school project that I have to group a DNA sequence of 550437 codons together. At the moment I have it set up as a string. Basically 1 letter per cell on 550437 cells. I have to show how many times AAA, ATC, and CGG show up in that sequence without overlap. I also have to show the location of the first 10. I've tried reshaping from a 550437x1 to a 183479x3 but the order doesn't align every third from left to right. Column 1 will have the first 183479, the second column will have the second and the third column will have the final set. I would either like to group every 3 cells into one cell, or just give me a numeric notation telling me when my selected sequence shows up. Here's what I have so far to show me how many times each sequence shows up. Now I can't figure out how to find where the first 10 instances of each show up.
x=1;
i=1;%%%Variable for AAA
h=1;%%%Variable for ATC
t=1;%%%Variable for CGG
AAAmatch=0;%%%Sets up for exact match
ATCmatch=0;%%%Sets up for exact match
CGGmatch=0;%%%Sets up for exact match
AAAcount=0;%%%Counter for AAA match
ATCcount=0;%%%Counter for ATC match
CGGcount=0;%%%Counter for CGG match
%%%Locates AAA match in entire sequence without overlap
for i=1:length(DNA)-2
if strcmp(DNA(i),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+1),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+2),'A')
AAAmatch=AAAmatch+1;
end
if AAAmatch==3
AAAcount=1+AAAcount;
end
AAAmatch=0;
end
%%%Locates ATC match in entire sequence without overlap
for h=1:length(DNA)-2
if strcmp(DNA(h),'A')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+1),'T')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+2),'C')
ATCmatch=ATCmatch+1;
end
if ATCmatch==3
ATCcount=1+ATCcount;
end
ATCmatch=0;
end
%%%Locates CGG match in entire sequence without overlap
for t=1:length(DNA)-2
if strcmp(DNA(t),'C')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+1),'G')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+2),'G')
CGGmatch=CGGmatch+1;
end
if CGGmatch==3
CGGcount=1+CGGcount;
end
CGGmatch=0;
end
Thoughts?
  1 Kommentar
Azzi Abdelmalek
Azzi Abdelmalek am 28 Apr. 2016
You can make your question clear and brief, by posting an example with the expected result. You can also add some explanations.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Walter Roberson
Walter Roberson am 28 Apr. 2016
Consider using strfind() . But you do need to put in some logic to detect a potential overlap between the final character of one and the first of the next. Also if you had something like 'AAAA' then strfind() of 'AAA' will return both 1 and 2 (that is, strfind does not care about overlaps.) Still, strfind() will help give you candidate positions that you can winnow out.
What would you want the result to be if there was 'AAATCGG' in the sequence? Is that one AAA and one CGG, or is it one ATC ?
  2 Kommentare
Matthew Zehner
Matthew Zehner am 28 Apr. 2016
Bearbeitet: Matthew Zehner am 28 Apr. 2016
I've tried strfind. Since I'm working with cells with a single letter in them it doesn't work. I need to figure out AAA, ATC, and CGG individually. strfind only returns a [1] if it's true or []. And I only get the true or false if I use a single letter and not the 3 letters together. I don't get a numerical output as you would if you had a normal string like DNA='ATCAAACGGATCAACGTACAGTCATAC'. That would work rather easily. But since I have an array with over half a million cells strfind just tells me if there is the letter I'm looking for or not. Doesn't tell me there number.
Walter Roberson
Walter Roberson am 28 Apr. 2016
horzcat(DNA{:}) and the result will be a string.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Workspace Variables and MAT Files finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by