Primary Protein Structure and Sequencing: Study Notes for Biochemistry

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Primary Protein Structure

Introduction to Protein Structure

The structure and function of proteins are dictated by their sequence of amino acids, known as the primary structure. Understanding this sequence is fundamental for biochemists, as it underpins protein folding, function, and interactions. Misfolding or mutations in the primary structure can lead to diseases and altered biological activity.

Primary structure: Linear sequence of amino acids from N-terminus to C-terminus.
Secondary structure: Local spatial arrangement (e.g., α-helix, β-sheet).
Tertiary structure: Three-dimensional folding of a single polypeptide.
Quaternary structure: Association of multiple polypeptide chains.

Function is dictated by protein structure!

Examples of Protein Structure Levels

Each level of protein structure contributes to the overall function and stability of the protein. For example, insulin consists of two chains (A and B) connected by disulfide bonds, illustrating both primary and quaternary structure.

Insulin A and B chain sequence with disulfide bonds

Protein Sequencing

Importance of Protein Sequencing

Protein sequencing reveals the exact order of amino acids, enabling the study of gene-to-protein relationships, drug design, enzyme mechanisms, protein engineering, evolutionary biology, and analytical techniques. Sequencing is foundational for understanding protein function and dysfunction.

Gene-to-protein relationship: Links genomics to proteomics.
Drug design: Targets specific residues or motifs.
Enzyme mechanisms: Identifies catalytic residues.
Protein engineering: Enables rational design of new proteins.
Evolutionary insight: Sequence comparison reveals homology.

Steps in Protein Sequencing

Sequencing a protein involves several key steps:

Purification: Isolate the protein.
Determine number of polypeptides: Analyze N- and C-termini.
Cleavage of disulfide bonds: Separate chains and prevent refolding.
Amino acid composition: Hydrolyze and analyze liberated amino acids.
Fragmentation: Use endopeptidases or chemical reagents to break protein into smaller peptides.
Sequencing fragments: Sequence each fragment and align overlapping regions.

End Group Analysis

Determining the number of peptides in a protein is achieved by analyzing the N- and C-termini. Dansyl chloride reacts with amines at the N-terminus and lysine side chains, allowing identification of terminal residues.

Graph showing major quantities of amino acids released over time for different starting peptides

Amino Acid Composition Analysis

The amino acid composition is determined by complete hydrolysis of the protein, followed by quantitative analysis. Acid hydrolysis destroys some amino acids, while base hydrolysis affects others. An amino acid analyzer is used to quantitate residues after derivatization.

Amino acid analyzer chromatogram showing retention times for different amino acids

Common amino acids: Leu, Ala, Gly, Ser, Val, Glu, Ile.
Rare amino acids: His, Met, Cys, Trp.
Structural proteins: Often have repeating sequences (e.g., collagen).

Fragmentation of Proteins

Proteins are fragmented using specific endopeptidases (e.g., trypsin, chymotrypsin) or chemical reagents (e.g., cyanogen bromide). Trypsin cleaves after positively charged residues (R, K), while chymotrypsin targets bulky hydrophobic residues (F, W, Y). Cyanogen bromide cleaves after methionine (M).

Diagram showing CNBr and trypsin cleavage sites in a peptide sequence

Assembly of Protein Sequence

To assemble the complete sequence, fragments generated by different cleavage methods are aligned based on overlapping regions. Logical deduction and double-checking are essential to ensure accuracy.

Protein Mass Spectrometry

Mass Spectrometry Techniques

Mass spectrometry (MS) is a powerful tool for sequencing proteins. Proteins are digested into fragments (often by trypsin), which are then analyzed by electrospray ionization or MALDI-TOF. The mass/charge (m/z) ratio of fragments is measured, and tandem MS can further fragment ions for sequence determination.

Electrospray: Produces multiply charged ions for MS analysis.
MALDI-TOF: Uses laser ionization and time-of-flight measurement.
Tandem MS: Provides sequence tags for database identification.

Peptide Sequence Tag Notation

Peptide fragments are labeled based on where the charge is retained: a, b, c (N-terminus) and x, y, z (C-terminus). Subscripts indicate the number of residues, and superscripts denote neutral losses (e.g., * for ammonia, ° for water).

Peptide backbone fragmentation notation for mass spectrometry

Protein Databases

Protein Database Utility

Protein sequences are stored in databases (e.g., UniProt) for comparison, identification, and study. Each protein entry includes an accession number, organism, known mutations, and modified amino acids. Database searching is essential for modern proteomics and bioinformatics.

Accession number: Unique identifier for each protein.
Sequence comparison: Reveals homology and functional domains.
Mutation tracking: Links sequence changes to phenotypic effects.