Cody

Problem 79. DNA N-Gram Distribution

Created by Cody Team in Community

Given a string s and a number n, find the most frequently occurring n-gram in the string, where the n-grams can begin at any point in the string. This comes up in DNA analysis, where the 3-base reading frame for a codon can begin at any point in the sequence.

So for

 s = 'AACTGAACG' 

and

 n = 3 

we get the following n-grams (trigrams):

 AAC, ACT, CTG, TGA, GAA, AAC, ACG

Since AAC appears twice, then the answer, hifreq, is AAC. There will always be exactly one highest frequency n-gram.

This problem was originally inspired by a MATLAB Newsgroup discussion.

Solution Stats

50.74% Correct | 49.26% Incorrect
Last solution submitted on Mar 19, 2019

Problem Comments

Solution Comments

Recent Solvers356

Suggested Problems

More from this Author95

Tags

Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

MATLAB Academy

New to MATLAB?

Learn MATLAB today!