Statistical properties of DNA sequences
We review evidence supporting the idea that the DNA sequence in genese containing non-coding regions is correlated, and that the correlation is remarkably long range — indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the “non-stationarity” feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33 301 coding and 29 453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Year of publication: |
1995
|
---|---|
Authors: | Peng, C.-K. ; Buldyrev, S.V. ; Goldberger, A.L. ; Havlin, S. ; Mantegna, R.N. ; Simons, M. ; Stanley, H.E. |
Published in: |
Physica A: Statistical Mechanics and its Applications. - Elsevier, ISSN 0378-4371. - Vol. 221.1995, 1, p. 180-192
|
Publisher: |
Elsevier |
Saved in:
Online Resource
Saved in favorites
Similar items by person
-
Statistical mechanics in biology: how ubiquitous are long-range correlations?
Stanley, H.E., (1994)
-
Stanley, H.E., (1992)
-
Anomalous fluctuations in the dynamics of complex systems: from DNA and physiology to econophysics
Stanley, H.E., (1996)
- More ...