Multiple Evolutionary Mechanisms Reduce Protein Aggregation

The folding of polypeptides into stable globular protein structures requires protein sequences with a relatively high hydrophobicity and secondary structure propensity. These biophysical properties, however, also favor protein aggregation via the formation of intermolecular beta-sheets and, as a result, globular structure and aggregation are inextricable properties of protein polypeptides. Aggregates that are enriched in beta-sheet structures have been found in diseased tissues in association with at least twenty different human disorders and the effect of aggregation on protein function include simple loss-of-function but also often a gain of toxicity. Given both the ubiquity and the potentially lethal consequences of protein aggregation, negative selective pressure strongly minimizes aggregation. Various evolutionary strategies keep aggregation in check, including (1) the optimisation of the thermodynamic stability of the protein, which precludes aggregation by burial of the aggregation prone regions in solvent inaccessible regions of the structure, (2) segregation between folding nuclei and aggregation nuclei within a protein sequence, (3) the placement of so-called gatekeeper residues at the flanks of aggregating segments, that reduce the aggregation rate of (partially) unfolded proteins, and (4) molecular chaperones that target aggregation nucleating sequences directly, thereby further suppressing aggregation in a cellular environment. In this review we describe the intrinsic features built into protein sequence and structure that protect against aggregation.


INTRODUCTION
Misfolding and the associated aggregation of proteins have been the object of intensive study in the last decade, as they appear to be the molecular basis of neurodegenerative disorders such as Alzheimer's and Parkinson's disease, and other diseases such as type 2 diabetes [1].To date, circa 40 disorders have been linked to protein aggregation [2].Aggregation is unavoidable in globular proteins, because nonnative conformations can be adopted during or immediately after synthesis, under stress conditions or as a consequence of mutations or proteolysis.Although it seems that almost all proteins are able to form aggregates when expressed at high concentrations in vitro, they differ substantially in their intrinsic propensity to do so under physiological conditions [3].Most importantly, aggregation is nucleated by short sequence segments with specific physical properties and the amino acid residues involved in aggregation are usually segregated in the primary structure from the residues that are critical for proper folding [4].The major contributors to aggregation propensity have been identified as hydrophobicity, net charge and propensity to form secondary structure, i.e. a predisposition for beta-sheet formation and an aversion for alpha-helical structures [4].The identification of these determinants of aggregation facilitated the development of prediction algorithms that assess the effect of mutations on aggregation, identify the regions in the protein sequence that promote aggregation, and quantify the aggrega-tion rates of unfolded proteins [5][6][7][8][9].These computational methods have enabled the large-scale analyses of the aggregation behavior of full proteomes [10][11][12], which have confirmed the ubiquity of aggregation propensity in proteomes of all kingdoms of life.Protein aggregation represents an enormous burden for cellular organisms: not only the loss-of function of the individual aggregating proteins imposes stress on the cell, but also the energy consumed by the ATP-dependent protection mechanisms of the protein quality control machinery.Hence, proteomes are subject to strong evolutionary pressure to minimize aggregation [13].The different mechanisms hindering effective protein aggregation in the cell are illustrated in Fig. (1) with the example of alpha-1-antitrypsin (A1AT).The deficiency of antitrypsin has been associated with aggregation of this enzyme and results in liver dysfunction [14].The enzyme has two predicted aggregation-prone regions, which are buried in the correctly folded form of the protein (i).These two regions are flanked by so-called gatekeeper residues, which in the unfolded state of the protein will prevent self-association through charge repulsion or steric hindrance (ii).In addition to these two mechanisms embedded in the protein's sequence and structure, the cell has developed a highly advanced protein quality control system (iii) [15].A large variety of chaperones in the cell hinder the formation of aggregates, not only by shielding the aggregation-nucleating regions in the nascent chain, but also by sequestering unfolded proteins from other identical proteins, and by untangling partial aggregates [16].In this review we will focus on the intrinsic protein characteristics that counteract aggregation.

AGGREGATION-PRONE SEQUENCES ARE BURIED INSIDE PROTEIN STRUCTURES
Protein folding and aggregation are competing conformational reactions.As a result, the first defense mechanism against aggregation is the stability of the native protein conformation itself: in a folded protein the backbone is locked in the tertiary structure of the protein and therefore not accessible to form the inter-chain hydrogen bonds that are a determining factor in cross-beta aggregated structures [17].Cooperativity in folding is related with resistance against aggregation [18], and studies of the folding of a computationally designed protein suggest that the smooth folding pathways of small polypeptides are the result of negative selection against aggregation, and not a general property of proteins that fold into a unique stable structure [19].
Although it has been discovered that globular native structures can also form aggregates through intermolecularstrand interactions at edges of individual -sheets [20] or three dimensional domain swapping [21][22][23], it is still universally accepted that unfolded or partially unfolded proteins generally have a higher propensity to aggregate than the fully native states [24].
In a large-scale study using experimentally determined stability measurements of 2351 mutations in globular proteins, Serrano and colleagues showed that stability is the main evolutionary pressure in the absence of other factors such as binding and catalysis [25].Their analysis revealed that misfolding is avoided primarily by selection for stability, and also that avoiding misfolding-prone sequences compromises stability, emphasizing the inextricable tie between protein structure and aggregation.However, the maintenance of an aggregating segment within a sequence does not have negative consequences if the aggregation load is not too high.There exists a "permissive" window for aggregation: highly aggregating sequences are prevented but moderately aggregating ones are tolerated [26].This is confirmed in proteome-wide studies of aggregation propensities where the majority of the proteins have low predicted aggregation scores, and only a small portion have very high tendency to aggregate [10][11][12].
The relation between the tolerance for aggregation-prone regions and the burial of these regions in folded proteins is underlined by the differences in aggregation propensities between globular and intrinsically disordered proteins (IDPs).Proteome-wide studies of aggregation propensities showed that globular proteins from all-alpha, all-beta and mixed alpha/beta SCOP classes showed similar levels of aggregation propensity, while natively unstructured proteins show much lower average aggregation loads [10,27].In a Monte Carlo simulation of small hydrophobic peptides with and without disordered flanks, Abeln & Frenkel showed that disordered flanks next to aggregating regions even prevent aggregation [28].Small hydrophobic peptides without disor-  1).The different strategies used to oppose the formation of protein aggregates.The structure and sequence shown is alpha-1antritrypsin (AAT), of which the deficiency, caused by aggregation, is associated with liver disease [14]. 1) Folding buries the sticky regions in the core of the protein.2) Well-placed gatekeeper residues prevent self-association by charge or steric repulsion and therefore inhibit the aggregation process.3) The protein quality control system has evolved to oppose and invert aggregation.For example members of the Hsp70-family recognize the positive charged residues at the flanks of aggregation-nucleating regions.In the case of secreted AAT, quality control is performed in the ER by the Hsp70 family member BiP [14].
dered flanks aggregated, while the peptides with unstructured flanks were stable as monomers or small micelle-like clusters.The disordered flanks have no effect on the native function of the motif, i.e. binding energy is not affected.

POINT MUTATIONS CAN MODULATE PROTEIN STABILITY AND AGGREGATION TENDENCY
The amyloidogenicity of a protein can be reduced by stabilization of the native structure (reviewed in [29]); conversely, many mutations associated with increased aggregation have been shown to destabilize the native structure.This has been shown experimentally for several (disease related) proteins, such as transthyretin in amyloidosis [30] and Cu/Zn-superoxide-dismutase (SOD1) for amyotrophic lateral sclerosis (ALS) [31].In the latter case, Oliveberg and co-workers could link experimentally determined stability differences of apo-SOD1 to survival time of ALS patients [32].The stability change for ALS-associated mutations that do not alter the net charge of SOD shows a high correlation with survival time (Fig. 2).An additional large scale study showed that the combination of increased aggregation propensity and decreased protein stability can account for 69% of the variability in familial ALS patient survival times [31].The link between increased destabilization and more aggressive disease development can also be found among transthyretin mutations in amyloidosis [30], where the rate of tetramer dissociation needed for amyloid formation influences both disease penetrance and age of onset.
Chiti and coworkers examined the interplay between decreased stability and increased aggregation in vivo: the solubility of mutations in the N-terminal domain of Escherichia coli HypF protein in the cell was compared with their effect on stability of the protein.HypF-N has been shown to convert to amyloid fibrils in vitro that are morphologically similar to those found in amyloid disease [33].HypF-N variants carrying destabilizing mutations aggregate after expression, whereas mutants with stability similar to the wild type protein remain soluble in the E. coli cytosol [34].Although these studies show that destabilisation is the major factor contributing to misfolding, it is certainly not the only factor.Destabilisation does not always imply misfolding and vice versa, as demonstrated by mutations that affect aggregation independent of stability [35,36].

GATEKEEPER RESIDUES DISRUPT STRETCHES OF HYDROPHOBIC RESIDUES TO MINIMIZE AGGREGATION PROPENSITY
The term structural gatekeeper was first introduced in the context of the two-state folding pathway of protein S6 [37], as residues that steer the folding process by blocking certain paths.It was later introduced in the context of the A amyloid peptide aggregation by the same researchers as ``charged side chains that prevent aggregation by interrupting contiguous stretches of hydrophobic residues in the primary sequence'' [38].A computational analysis of the aggregation properties of 26 proteomes by Rousseau and coworkers [12] with the TANGO algorithm [6] revealed a strong enrichment of charged residues (arginine, lysine, aspartate and glutamate) and proline at the flanks of aggregation prone regions.Their study showed that 90% of aggregation-prone regions are capped with at least one gatekeeper residue, with a bias for positively charged residues at regions with the highest aggregation propensities.A similar result was obtained by Chiti and co-workers in the analysis of the human proteome [10] with a different computational method [7].In accordance with the aforementioned study by Rousseau et al., they found that Arg, Lys and Pro had higher frequencies at the flanks of regions with high aggregation propensity.In a follow-up study of the human proteome, Rousseau et al. investigated the composition of the three G for ALS-associated mutations that do not alter the net charge of SOD shows a high correlation with survival time (R = 0.91).Increasing the net charge of the protein causes a shift toward longer survival time, whereas decreasing the charge has the opposite effect.Reprinted with permission from [32].
amino acid positions before and after aggregation prone regions [11].Due to the long-range effects of electrostatic interactions, the boundaries of aggregation nucleating zones may not be strictly defined.The elevated usage of the 5 previously identified gatekeepers (P, R, K, D, E) on the direct flanks of aggregation-prone regions (Fig. 3A) was confirmed in the three C-terminal and three N-terminal flanking positions (Fig. 3B).The enrichment was most prominent for the charged residues and less pronounced for proline.Another feature of gatekeeper motifs that was highlighted in this study is the use of multiple gatekeepers: nearly 75% of all aggregation nucleating regions in the human proteome uses two or more gatekeepers.The type of gatekeeper used varies between single and multiple gatekeeper motifs: when using one single gatekeeper residue, proline is used most often, but its usage decreases with the introduction of more gatekeepers.Using multiple gatekeepers may be a protection mechanism against mutation: redundancy in the gatekeeper motif reduces the risk of initiating aggregation by a single point mutation.

GLOBAL NET CHARGE AND STERIC HINDRANCE PROTECT AGAINST AGGREGATION
In addition to safeguarding the flanks of aggregation nuclei, charged residues and structure breakers such as  (P, R, K, D, E) are enriched in the flanks (ratio >1).The pattern is very distinct at the first position before and after the regions, but the broader flanks also show this enrichment.When taking into account three positions (B) we also see an enrichment of Histidine (H) and Asparagine (N), and to a lesser extent glycine (G) and glutamine (Q).Adapted from [11].C+D.Opposition of aggregation by conserved structure breakers.C. Aggregation of fibronectin type III domains is limited by conserved proline residues.Adapted from [40].D. Conserved glycines in human muscle acylphosphatase slow down the formation of aggregates.Adapted from [41].
glycine and proline provide protection on the overall protein sequence.For instance, the use of multiple structure breakers to oppose aggregation was also found in Huntingtin, from which the aggregation is associated to Huntington's disease, where the polyglutamine stretch is flanked by a proline-rich region that keeps aggregation in check [39].Other studies show examples of prolines [40] and glycines [41] that were evolutionary conserved to modulate aggregation.In the investigation of three highly conserved prolines in fibronectin type III domains (Fig. 3C), no obvious structural or functional role could be appointed to these conserved residues.The stability of alanine mutations of these three prolines in the 10 th domain of human fibronection was similar to that of the wild-type domain, but the aggregation rate of the mutant proteins was significantly higher than that of the original domain [40].Analogous results were obtained in the study of conserved glycine residues in human muscle acylphosphatase (AcP, Fig. 3D) [41]: mutating these glycines to alanine does not affect stability more than mutating non-conserved positions, but it does accelerate amyloid formation of AcP.Furthermore, an earlier extensive mutation study of the same enzyme already demonstrated that the aggregation behaviour of AcP could be modified by the mutation of single amino acids.More specifically, the authors showed an inverse correlation between the net charge of the protein and its aggregation rate [42].This anticorrelation between charge and aggregation is also illustrated in amyotrophic lateral sclerosis, where the majority of disease-related SOD1 mutations reduce the net charge of the protein [43].Oliveberg et al. performed a computational analysis of 100 ALS-associated mutations in SOD1 and showed that in comparison with other well-described disease related genes and mutations, the charge bias of SOD1 is significantly higher.In a similar comparison of SOD1 with all disease-related proteins in the SwissProt database with more than 50 causal mutations, the average charge difference of SOD1 mutations was ranked second.The generality of these protection mechanisms has been shown in mutational studies of several proteins, where the introduction of charged residues, proline and glycine resulted in reduced aggregation kinetics or compromised stability of the formed aggregates [34,42,44,45].The protection against aggregation provided by these charged residues and structure breakers is also employed in intrinsically disordered proteins, where a higher proline content [46] and higher net charge [27,47] contributes to lower aggregation.
In addition to the involvement in the pathology of misfolding disorders, protein aggregation also poses a problem in vitro, in biotechnology and biomedical research.Building further on the aforementioned observations of opposing aggregation with charges, Liu and colleagues set out to design supercharged versions of naturally occurring proteins [48].By replacing solvent-exposed residues of a monomeric (green fluorescent protein, GFP), a dimeric (glutathione-S-transferase, GST) and a tetrameric protein (streptavidin, SAV) with charged amino acids, they demonstrated that by supercharging proteins it is feasible to obtain correctly folded variants of the natural protein.The design of GST and SAV mutants was performed with an automated mutagenesis strategy: residues were ranked by increasing solvent exposure calculated from the crystallographic structure, and then the highest ranked residues were replaced by lysine for positive supercharging, or by glutamate or aspartate for negative supercharging.These supercharged variants displayed their native functionality in vitro but also remained soluble in conditions that normally cause the proteins to aggregate.This approach might solve some of the unwanted behavior of de novo designed proteins, and may contribute to the adaptation of natural proteins to thrive in non-natural conditions, such as increased temperature or the presence of denaturing chemical additives [49].These combined results suggest a strong evolutionary pressure on the flanks of aggregation-prone regions, and confirms the use of structural gatekeepers as a universal mechanism against aggregation.

CHAPERONE BINDING IS MODULATED BY GATEKEEPER RESIDUES
However different in mechanism, most chaperones display a remarkable resemblance in substrate specificity and prefer binding to hydrophobic stretches flanked by positive charges.This has been shown by affinity studies for Hsp70 [50,51], Hsp90 [52], and many other chaperones [53][54][55].Although Hsp60 substrate specificity studies on GroEL have not revealed such clear charge preferences, it is suggested that proteins with negative charges fold rapidly by repulsion forces in the negatively charged cage [56].Application of the substrate specificity preferences for DnaK and trigger factor developed by Bukau and co-workers [50,54] on the aggregating segments of the Escherichia coli proteome showed that together these two chaperones target almost 100% of the strongly aggregating sequences in E. coli [12].This suggests that chaperones recognize aggregation-prone regions by the double criterion of having a hydrophobic stretch flanked by (mostly positive) charges.The high prevalence of these motifs in proteomes in the various kingdoms of life suggest that the evolutionary pressure on proteomes to counteract aggregation with charges and structure breakers also shaped the specificity for chaperones to recognize these patterns.These findings are in accordance with the observation that intrinsically disordered proteins, which have significantly lower aggregation propensities than globular proteins [27], also bind less to chaperones [57].

GATEKEEPER MUTATIONS CONTRIBUTE TO HUMAN DISEASE
Although gatekeeper residues appear to be very effective in guarding stretches of aggregation-prone residues, they also imply a risk for disease development.As seen in several examples, such as mutations of tau [58], the Alzheimer betapeptide [59] and -synuclein [60], mutating a single amino acid can substantially change aggregation propensity and can have dramatic effects on disease etiology.The TANGO algorithm was used to study difference in aggregation caused by known human disease mutations and neutral single nucleotide polymorphisms (SNPs) from the UniProt database [11].The two main observations of this analysis were that i) the distribution of differences in the TANGO aggregation scores for disease mutations showed more extreme differences and a smaller fraction of neutral changes than the distribution for the neutral SNPs; and ii) the fraction of disease mutations that cause a significant increase of protein aggregation due to the disruption of a gatekeeper motif was almost twice as large as the fraction of these mutations found among SNPs (3.5% of the disease mutations versus 1.9% of the SNPs).These findings suggest that indeed gatekeeper residues are crucial for correct protein function and that disruption of the gatekeeper pattern introduces a risk of disease.

NEGATIVE SELECTION AGAINST UNWANTED SELF-ASSOCIATION IN CORRECTLY FOLDED PROTEINS
There are some examples where proteins with native structure, more specifically -sheet rich proteins, can form aggregates through intermolecular interactions of peripheral -strands [20].These external -strands propose a risk for possible self-interaction and thus -aggregation, but the placement of -bulges, superposition of short loops, helices or distorted -strands on the peripheral strand, and other ways to distort the -structure are used to avoid inducing aggregation of peripheral strands with those of other molecules [20].Two types of (functional) self-interactions of identical or homologous sequences that can be found in nature are homo-oligomeric complex formation and domain repeats in multidomain proteins.Especially in the former case, formation of functional homo-oligomers and nonfunctional aggregates are competing processes, as has been shown in studies of the C-Src SH3 domain [61].Using protein-protein interaction data of fly, yeast and worm, Chen & Dokholyan showed proteins that have native self-interactions patterns (such as homo-oligomeric complex formation) have overall lower aggregation scores than proteins without these patterns [62], suggesting negative selection for aggregation in these proteins.Dobson and co-workers investigated the multidomain constructs of immunoglobulin domains in human cardiac titin and the ability of these homologous domains to co-aggregate [63].Their conclusion was that the efficiency of co-aggregation lowers with decreasing sequence identity, with a lower bound at 30-40% sequence identity.Further computational analysis of homologous domains in large multidomain proteins (i.e. the immunoglobulin and fibronection type III superfamilies) showed that the sequence identity between repeats remains largely below this threshold.Comparison of the sequence identity between adjacent and non-adjacent domain pairs also revealed that there is a higher evolutionary pressure on adjacent domains: sequence identity between adjacent pairs is significantly lower.

DETAILED FEATURE ANALYSIS OF AGGREGA-TING PROTEINS REVEALS ADDITIONAL SELEC-TION AGAINST AGGREGATION
Besides the apparent evolutionary pressure related to folding, gatekeeper patterns, chaperone binding and native self-interaction, various studies have revealed additional evidence for selection against aggregation-prone segments.Simple patterns that favor aggregation, such as the alternation of polar and non-polar stretches are rare in natural proteins [64].This is a first example that not only the amino acid composition itself is a determinant for aggregation propensity, but also the order of the residues.Further proof was provided in a detailed study using horse heart apomyoglobin (apoMb) [26].The core of the amyloid fibrils formed by apoMb is the region spanning from residue 7 to 18. Independent from the full length protein, the N-terminal region of apoMb (residue 1-29) is soluble at neutral pH but self-assembles into fibrils at pH 2. Keeping the same amino acid composition and length, four scrambled versions of the N-terminus were designed and their aggregation properties were investigated.The naturally occurring sequence is at the lower boundary of aggregation.Comparing the aggregation profile of the scrambled sequence with that of 745 peptides from the globin family homologous to the apoMb Nterminus showed that the former had significantly higher aggregation tendencies than their natural counterparts, confirming that the prevention of aggregation has been a driving force in protein evolution.Another piece of evidence that corroborates this evolutionary pressure was provided by investigating the aggregation propensities of essential versus non-essential proteins in Saccharomyces cerevisiae and Caenorhabditis elegans [62] .Essential genes were defined as those genes of which the knockdown led to lethality.Both in yeast and worm it was shown that essential proteins have lower aggregation propensity than non-essential ones, which is consistent with a higher evolutionary pressure on essential proteins.

EXPRESSION LEVELS AND SUBCELLULAR LOCA-LIZATION ARE OTHER DETERMINANTS IN PRO-TECTION AGAINST AGGREGATION
As almost all proteins can be driven to aggregate when overexpressed in vitro, the high divergence in the expression levels of proteins in the cell is another determinant in the risk for aggregation [17].Vendruscolo et al examined the in vivo expression levels, as measured by DNA microarray technology, of 12 human proteins of which experimentally determined aggregation rates were available.The expression levels of these human genes were anti-correlated with the aggregation rates of the corresponding proteins in vitro [65].This suggests that polypeptide chains have co-evolved with their cellular environments to be soluble as far as is needed to effectively perform their functional role.These results are in accordance with previous observations that even small perturbations of expression levels can have dramatic pathological consequences in misfolding diseases [66].In eukaryotes, not only the individual expression levels of proteins but also the overall biochemical properties in different cellular compartments can vary greatly.Studies on the differences in aggregation property between proteins of different subcellular locations in the yeast [67] and human proteome [10] agree on the observation that the aggregation propensity of secreted and ER proteins is on average higher than that of intracellular districts such as the nucleus and ribosome (Fig. 4).This evolutionary pressure against aggregation in cellular organelles is expected because on the one hand overall protein concentrations are high in these compartments, and on the other hand it has been shown that these compartments contain a large portion of unfolded molecules [68].

CONCLUSION
Since the biophysical properties underlying the correct folding of globular proteins and the formation of protein aggregates are alike, the two processes are inescapably linked.The combination of the generality of protein aggregation propensity of globular proteins and the putative detrimental effects of protein aggregation on the cell has resulted in negative selective pressure to minimize aggregation.In this review we have described the intrinsic features of protein sequence and structure that keep aggregation in check.The main contributor to the avoidance of aggregation is correct folding: aggregation nuclei are buried within the hydrophobic core of globular proteins.However, during the lifetime of a protein (partial) unfolding cannot always be avoided.Charge repulsion and steric hindrance are used to disrupt the formation of intermolecular -sheets by placing so-called structural gatekeepers (aspartate, glutamate, lysine, proline) at the flanks of aggregation nuclei.The use of these charged residues and structure breakers to minimize aggregation is not only observed at the flanks of aggregation nuclei, but also global net charge and conservation of well placed prolines and glycines can limit the aggregation propensity of a protein.The charged-hydrophobic-charged pattern that characterizes the regions with high aggregation propensity flanked by gatekeepers is recognized by molecular chaperones, and optimizes chaperone binding to potentially dangerous motifs.Negative selection of aggregation-prone regions in multimeric proteins and within protein families further illustrates the evolutionary pressure against unwanted self-association.Selective pressure within the cell can vary between the different cellular compartments, related to variability in concentration, partial unfolding, and presence of chaperones in these compartments.Modulation of the protection level against aggregation in these varying situations can be achieved by combining different types of protection.This type of redundancy in protection can also serve as a fail-safe if mutation disrupts one of the protection mechanisms.

Fig. (
Fig. (1).The different strategies used to oppose the formation of protein aggregates.The structure and sequence shown is alpha-1antritrypsin (AAT), of which the deficiency, caused by aggregation, is associated with liver disease[14].1) Folding buries the sticky regions in the core of the protein.2) Well-placed gatekeeper residues prevent self-association by charge or steric repulsion and therefore inhibit the aggregation process.3) The protein quality control system has evolved to oppose and invert aggregation.For example members of the Hsp70-family recognize the positive charged residues at the flanks of aggregation-nucleating regions.In the case of secreted AAT, quality control is performed in the ER by the Hsp70 family member BiP[14].

Fig. ( 2
Fig. (2).Interplay between stability and net charge determines the age of onset in familial ALS.The average survival time after diagnosis is plotted in function of the protein stability changes ( G) in SOD1.G for ALS-associated mutations that do not alter the net charge of SOD shows a high correlation with survival time (R = 0.91).Increasing the net charge of the protein causes a shift toward longer survival time, whereas decreasing the charge has the opposite effect.Reprinted with permission from[32].

Fig. ( 3 )
Fig. (3).Disruption of aggregation motifs by polar residues and structure breakers.A+B.Enrichment of gatekeeper residues at the flanks of aggregating regions.The ratio of amino acid frequency in the flanks versus the frequency of amino acids in the full data set is shown for each gatekeeper type, considering 1 position (A) or three positions (B) before and after each aggregation-nucleating region.All gatekeepers(P, R, K, D, E) are enriched in the flanks (ratio >1).The pattern is very distinct at the first position before and after the regions, but the broader flanks also show this enrichment.When taking into account three positions (B) we also see an enrichment of Histidine (H) and Asparagine (N), and to a lesser extent glycine (G) and glutamine (Q).Adapted from[11].C+D.Opposition of aggregation by conserved structure breakers.C. Aggregation of fibronectin type III domains is limited by conserved proline residues.Adapted from[40].D. Conserved glycines in human muscle acylphosphatase slow down the formation of aggregates.Adapted from[41].

Fig. ( 4
Fig. (4).Ranking of the aggregation propensity in different subcellular regions in human and yeast.The average aggregation propensity of proteins in different subcellular locations in Homo sapiens and Saccharomyces cerevisae are very similar: the aggregation propensity of intracellular districts such as the nucleus and ribosome is much lower than that of secreted proteins or those located in the endoplasmatic reticulum.Adapted from[10] and[67].