Multiscale modeling of protein self-assembly

Multiscale theoretical approaches have become indispensable in understanding protein self-assembly, a complex process spanning multiple length and time scales. These computational methods integrate quantum mechanics (QM), molecular dynamics (MD), and continuum models to capture the hierarchical nature of protein interactions, from atomic-level electronic structure to mesoscale collective behavior. This article explores how these approaches elucidate three key phenomena: amyloid formation, virus capsid assembly, and chaperone-mediated folding.

At the finest scale, quantum mechanical calculations provide insights into the electronic interactions governing protein folding and aggregation. Density functional theory (DFT) and ab initio methods reveal the energetics of hydrogen bonding, van der Waals forces, and electrostatic interactions that drive early-stage misfolding events. For amyloidogenic peptides like Aβ42, QM simulations predict β-sheet propensity by analyzing dihedral angle preferences and backbone solvation effects. These calculations show that specific residues, such as hydrophobic phenylalanine, exhibit strong π-stacking interactions that nucleate amyloid fibrils. However, QM alone cannot simulate large systems or long timescales, necessitating coarser-grained methods.

Molecular dynamics bridges the gap between atomic detail and biologically relevant scales. All-atom MD simulations of amyloid precursors, such as tau or α-synuclein, track conformational changes over microseconds, identifying intermediate states prone to aggregation. Force fields like CHARMM36 and AMBER accurately model protein-water interactions, crucial for simulating aggregation kinetics. Coarse-grained MD, such as the MARTINI model, extends simulations to milliseconds, revealing how amyloid fibrils grow via monomer addition. These simulations predict that fibril elongation rates depend on secondary structure stability, with β-sheet content increasing from 20% in monomers to over 60% in mature fibrils.

Continuum models further extend the scope to micrometer scales and beyond. Reaction-diffusion equations model the spatial distribution of amyloid plaques in neural tissue, incorporating parameters like diffusion coefficients (1–10 μm²/s for Aβ peptides) and aggregation rate constants (10⁶–10⁷ M⁻¹s⁻¹). Phase-field theories capture the liquid-to-solid transition of protein condensates, predicting fibril morphology as a function of concentration and pH. These models align with clinical observations of amyloid deposition patterns in neurodegenerative diseases.

Virus capsid assembly presents a distinct challenge due to its symmetry and cooperative interactions. All-atom MD simulations of capsid proteins, such as those in hepatitis B virus, reveal that subunit flexibility enables conformational switching during assembly. Coarse-grained models, like the patchy particle approach, demonstrate how weak protein-protein interactions (1–5 kBT) lead to error-free assembly under kinetic control. Continuum theories based on elastic network models predict that capsid stability depends on bending rigidity (10–50 kBT) and spontaneous curvature (0.05–0.1 nm⁻¹). Multiscale simulations show that misassembly occurs when subunit concentration exceeds 1 mM or when interaction strengths deviate by more than 15% from optimal values.

Chaperone-mediated folding introduces external regulation into the self-assembly process. All-atom MD simulations of GroEL-GroES complexes reveal how ATP hydrolysis (with energy ~20 kBT per cycle) drives conformational changes that unfold misfolded proteins. Coarse-grained models quantify the iterative annealing mechanism, showing that substrates typically require 3–7 cycles to reach native states. Continuum theories describe chaperone action as a kinetic proofreading process, where folding rates increase from 0.01 s⁻¹ without chaperones to 1 s⁻¹ with them. These models predict that chaperone efficiency peaks at physiological temperatures (310 K) and declines sharply above 320 K.

The integration of these methods faces challenges in parameter passing and scale matching. QM/MM hybrid schemes couple electronic structure calculations with classical MD, but require careful treatment of boundary regions. Markov state models connect MD trajectories to master equations, enabling predictions of assembly pathways over seconds. Machine learning accelerates these workflows by identifying reaction coordinates from high-dimensional simulation data, reducing computational cost by up to 90%.

Validation remains critical, with theoretical predictions tested against known structural databases like the Protein Data Bank. For amyloid systems, simulations correctly reproduce fibril diameters of 5–15 nm and twist periodicities of 80–120 nm. Capsid models accurately predict T-number transitions in viral shells, while chaperone simulations match experimental folding yields within 10%.

Future directions include adaptive resolution schemes that dynamically adjust model fidelity and enhanced sampling techniques to capture rare assembly events. These advances will deepen understanding of pathological aggregation, viral infection mechanisms, and protein homeostasis, guiding therapeutic strategies at the molecular level.