Deoxyribonucleic acid (/diˈɒksiˌraɪboʊnjʊˌkliːɪk, -ˌkleɪɪk/ DNA) is a molecule that carries the genetic instructions used in the growth, development, functioning and reproduction of all known living organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids; alongside proteins, lipids and complex carbohydrates (polysaccharides), they are one of the four major types of macromolecules that are essential for all known forms of life. Most DNA molecules consist of two biopolymer strands coiled around each other to form a double helix.
The two DNA strands are called polynucleotides since they are composed of simpler monomer units called nucleotides. Each nucleotide is composed of one of four nitrogen-containingnucleobases — cytosine (C), guanine (G), adenine (A) or thymine (T) — a sugar called deoxyribose and a phosphate group. The nucleotides are joined to one another in a chain by covalent bonds between the sugar of one nucleotide and the phosphate of the next, resulting in an alternating sugar-phosphate backbone. The nitrogenous bases of the two separate polynucleotide strands are bound together, according to base pairing rules (A with T and C with G), with hydrogen bonds to make double-stranded DNA. The total amount of related DNA base pairs on Earth is estimated at 5.0 x 1037 and weighs 50 billion tonnes. In comparison, the total mass of the biosphere has been estimated to be as much as 4 trillion tons of carbon (TtC).
DNA stores biological information. The DNA backbone is resistant to cleavage, and both strands of the double-stranded structure store the same biological information. This information is replicated as and when the two strands separate. A large part of DNA (more than 98% for humans) is non-coding, meaning that these sections do not serve as patterns for protein sequences.
The two strands of DNA run in opposite directions to each other and are thus antiparallel. Attached to each sugar is one of four types of nucleobases (informally, bases). It is the sequence of these four nucleobases along the backbone that encodes biological information. RNA strands are created using DNA strands as a template in a process called transcription. Under the genetic code, these RNA strands are translated to specify the sequence of amino acids within proteins in a process called translation.
Within eukaryotic cells DNA is organized into long structures called chromosomes. During cell division these chromosomes are duplicated in the process of DNA replication, providing each cell its own complete set of chromosomes. Eukaryotic organisms (animals, plants, fungi and protists) store most of their DNA inside the cell nucleus and some of their DNA in organelles, such as mitochondria or chloroplasts. In contrast prokaryotes (bacteria and archaea) store their DNA only in the cytoplasm. Within the eukaryotic chromosomes, chromatin proteins such as histones compact and organize DNA. These compact structures guide the interactions between DNA and other proteins, helping control which parts of the DNA are transcribed.
DNA was first isolated by Friedrich Miescher in 1869. Its molecular structure was first identified by James Watson and Francis Crick at the Cavendish Laboratory within the University of Cambridge in 1953, whose model-building efforts were guided by X-ray diffraction data acquired by Raymond Gosling, who was a post-graduate student of Rosalind Franklin. DNA is used by researchers as a molecular tool to explore physical laws and theories, such as the ergodic theorem and the theory of elasticity. The unique material properties of DNA have made it an attractive molecule for material scientists and engineers interested in micro- and nano-fabrication. Among notable advances in this field are DNA origami and DNA-based hybrid materials
The structure of DNA
DNA is a two-stranded molecule that appears twisted, giving it a unique shape referred to as the double helix.
Each of the two strands is a long sequence of nucleotides or individual units made of:
- a phosphate molecule
- a sugar molecule called deoxyribose, containing five carbons
- a nitrogen-containing region
There are four types of nitrogen-containing regions called bases:
- adenine (A)
- cytosine (C)
- guanine (G)
- thymine (T)
The order of these four bases forms the genetic code, which is our instructions for life.
The bases of the two strands of DNA are stuck together to create a ladder-like shape. Within the ladder, A always sticks to T, and G always sticks to C to create the "rungs." The length of the ladder is formed by the sugar and phosphate groups.
Packaging DNA: Chromatin and chromosomes
Most DNA lives in the nuclei of cells and some is found in mitochondria, which are the powerhouses of the cells.
Because we have so much DNA (2 meters in each cell) and our nuclei are so small, DNA has to be packaged incredibly neatly.
Strands of DNA are looped, coiled and wrapped around proteins called histones. In this coiled state, it is called chromatin.
Chromatin is further condensed, through a process called supercoiling, and it is then packaged into structures called chromosomes. These chromosomes form the familiar "X" shape as seen in the image above.
Each chromosome contains one DNA molecule. Humans have 23 pairs of chromosomes or 46 chromosomes in total.
Chromosome 1 is the largest and contains around 8,000 genes. The smallest is chromosome 21 with around 3,000 genes.
What is a gene?
Each length of DNA that codes for a specific protein is called a gene. For instance, one gene codes for the protein insulin, the hormone that helps control levels of sugar in the blood. Humans have around 20,000–30,000 genes, although estimates vary.
Our genes only account for around 3 percent of our DNA, the remaining 97 percent is less well understood. The outstanding DNA is thought to be involved in regulating transcription and translation.
How does DNA create proteins?
For genes to create a protein, there are two main steps:
Transcription: The DNA code is copied to create messenger RNA (mRNA). RNA is a copy of DNA, but it is normally single-stranded. Another difference is that RNA does not contain the base thymine (T), which is replaced by uracil (U).
Translation: The mRNA is translated into amino acids by transfer RNA (tRNA).
mRNA is read in three-letter sections called codons. Each codon codes for a specific amino acid or building block of a protein. For instance, the codon GUG codes for the amino acid valine.
There are 20 possible amino acids.