Non-permanent presentation

How the TATA Box Selects its Protein Partner

Nina Pastor, Leonardo Pardo and Harel Weinstein

Department of Physiology & Biophysics,
Mount Sinai School of Medicine, New York, NY 10029
Laboratorio de Medicina Computacional, Unidad de Bioestadística,
Facultad de Medicina, Universidad Autónoma de Barcelona,
08193 Bellaterra, Barcelona, España

pastor@inka.mssm.edu
ikbe0@cc.uab.es
hweinstein@inka.mssm.edu

  • Introduction
  • Methods
  • Results
  • Summary of findings
  • Conclusions
  • References
  • Acknowledgements
  • I. Introduction

    The TATA box-binding protein (TBP) is a basal transcription factor absolutely required for transcription by the three nuclear RNA polymerases. In the case of genes transcribed by RNA polymerase II, TBP is responsible for promoter recognition and binds directly to DNA in the minor groove. The structural characteristics of the complexes between TBPs and various TATA sequences elucidated recently from X-ray crystallography (PDB) suggest that both direct readout mechanisms and dynamic determinants, such as the ease for deformation, may be functional in the formation of these constructs.

    TBP binds to DNA in the minor groove of an AT rich sequence (consensus TATA T/A A T/A X), causing unwinding, bending and compression of the major groove without disrupting the hydrogen bonds between base pairs. In the crystal structures of the complexes (PDB codes 1YTB (yeast TBP)1, plant TBP2, 1CDW (human TBP)3, 1TGH (human TBP)4, 1VOL (TBP/TFIIB/DNA)5, 1YTF (TBP/TFIIA/DNA)6), the DNA conformation returns to B-DNA abruptly, immediately outside the eight base paris of contact with TBP. These complexes seem to be an extreme example of induced fit. The large difference in conformation of the bound DNA compared to B-DNA suggests that the energy penalty of the conformational change might be a selectivity determinant. Favorable sequences might have average geometries biased towards forming a complex with TBP; alternatively, but not exclusively, favorable sequences might have weaker stacking energies, and be more amenable to the type of distortions found in the complexes.

    The aim of this work is to gain an understanding of the structural characteristics that TBP exploits in its DNA substrates, and to discern whether there are DNA sequences that are predisposed for attaining the particular geometric requirements imposed by TBP binding. In the absence of high resolution structures in solution of DNA oligomers that contain TATA box sequences, we have carried out molecular dynamics simulations and analyzed the conformational properties of seven DNA dodecamers whose sequences (Table 1) include three functional TATA boxes (mlp, 6t and at), two sequences recognized by mutant TBPs7,8 (2c and 7g), a reversed TATA box (r28) and a negative control (gc).

    Table 1.
    Simulated Sequences
    mlp          C     T A T A A A A G     G  G  C 
    2c           C     C A T A A A A G     G  G  C 
    6t           C     T A T A T A A G     G  G  C 
    7g           C     T A T A A G A G     G  G  C 
    r28          C     T T T T A T A G     G  G  C 
    at           A     T A T A T A T A     T  A  T 
    gc           G     C G C G C G C G     C  G  C 
    
    bp step             1 2 3 4 5 6 7              
    bp           1     2 3 4 5 6 7 8 9     10 11 12
    
    
    * bp steps 1 and 7 are the sites of insertion of Phe residues by TBP.
    * bp step 4 corresponds to the dyad of the TATA element, and is the only step recognized by hydrogen bonding to the side chains of Asn and Thr residues from TBP.

    We carried out molecular dynamics simulations using the CHARMM239 potential, with explicit water molecules (TIP3) and sodium ions, using periodic boundary conditions and a spherical cutoff for the nonbonded interactions. To assess the convergence of the results and the independence on the force field used, mlp was simulated also with the AMBER 4.1 10 potential using Ewald sums for the nonbonded interactions, and with the CHARMM23 potential with a different set of initial velocities.

    II. Methods

    The seven DNA dodecamers were built in fiber B-DNA conformation (see Table 3 below) using QUANTA (Molecular Simulations Inc. 1992); the 5' phosphate groups at the end of the strands were removed. The Na+ counterions were placed at a distance of 5Å from the P atom along the O-P-O bisector; one sodium ion was added per phosphate, for a total of 22 Na+.

    For the CHARMM simulations, the DNA and the sodium ions were solvated in InsightII (Biosym Technologies 1993). The final simulation system, including the DNA dodecamer, 22 Na+ ions and > 3400 TIP3 water molecules, was enclosed in a hexagonal prism of 72Å length with a 24Å side. For the replica of mlp run in AMBER 4.1, the dodecamer and sodiums were solvated in AMBER 4.1 with a 11Å water shell (> 4000 TIP3P water molecules). The final simulation system was enclosed in a square prism (63.1Å X 45.7Å X 44.7Å), whose dimensions were adjusted by running at constant pressure and temperature to ensure the right density.

    The molecular dynamics simulations were run with the CHARMM program in the NVE ensemble, using the CHARMM23 all atom potential, the Verlet integrator and periodic boundary conditions. SHAKE was applied to all hydrogen-containing bonds. A cutoff value of 13Å was used, with the shift and switch functions for the electrostatic and van der Waals interactions, respectively. Keeping the DNA and sodium ions fixed, the water was equilibrated for 36 ps. Subsequently, the whole system was energy minimized, and then it was heated from 0 to 300K in 10 ps. Equilibration was carried out for 30 ps with a time step of 2 fs. For the production run of 510 ps (1260 ps for mlp(a)), the time step was reduced to 1.5 fs. A 2D r.m.s.d. plot for mlp(a) is shown in Figure 1. An independent mlp(b) run was started in parallel using a different seed for assigning velocities for the heating of the whole system. Heating (10.5 ps), equilibration (31.5 ps) and production (510 ps) were done with a time step of 1.5 fs, using the Leapfrog Verlet integrator. Structures from the trajectories were saved every 0.075 ps. The molecular dynamics simulation of mlp run in AMBER 4.1 was carried out in the NVT ensemble, using the AMBER 4.1 all atom potential, the Verlet integrator and periodic boundary conditions. SHAKE was applied to all hydrogen-containing bonds. A cutoff value of 9Å was used for the van der Waals interactions, and the electrostatic interactions were treated with the PME algorithm. In this run, water was heated during 15 ps and equlibrated for 85 ps; then, the system was energy minimized, heated from 0 to 300K in 15 ps, and equilibrated for 50 ps. The production run was extended to a total of 1 ns, all the phases carried out with a time step of 2 fs. Structures from this trajectory were saved every 0.1 ps. The comparison of radial distribution functions for Na+-Na+, P-Na+ and P-P, for the mlp runs done with CHARMM and AMBER 4.1, is shown plotted from 0 to 20Å (Figure 2) and amplified around the cutoff distance (Figure 3). There is no apparent accumulation of pairs at the cutoff distance of the electrostatic potential (12Å), suggesting that there are no gross problems with the spherical cutoff and shifting functions used in CHARMM.

    DNA conformational analysis was carried out with the CURVES algorithm implemented in Dials and Windows of the MD Toolchest. Since this algorithm performs a global fit to the DNA axis, the reported angles and displacements depend on the DNA length. To allow for a comparison of the local base pair step geometry between the different simulations and the NMR and crystal structures, all the DNA oligomers were disassembled into their constitutive base pair steps. Data were collected for each base pair step, except for those at the ends of the oligomers.

    III. Results

    a. what does DNA look llike in the complex with TBP?

    From the eight available independent structures of TBP/DNA complexes, we took the eight base pairs of the TATA box, calculated the base pair step conformational parameters for each step, and averaged them among the structures. The results, summarized in Table 2, show the ranges in these parameters in the crystal structures that are therefore compatible with forming a complex with TBP.

    Table 2.
    step     shift       slide       rise       tilt        roll       twist
    -------------------------------------------------------------------------
      1     0.2±0.4    -1.9±0.4     4.9±0.4   -0.6±2.8    40.0±4.0   19.8±2.1
      2    -1.3±0.5    -1.5±0.1     3.4±0.2    1.7±1.7    16.5±2.4   16.4±1.1
      3     0.2±0.3     1.3±0.1     3.3±0.1    3.4±1.5     7.6±2.5   25.5±1.4
      4     0.3±0.3     1.2±0.3     3.4±0.2   -1.8±0.9    26.2±2.7    8.1±5.3
      5    -0.4±0.2     2.0±0.2     3.5±0.3    1.0±2.1    25.9±3.6   22.4±2.8
      6     0.4±0.2     1.2±0.4     3.3±0.2    0.8±3.8    24.2±3.5   22.7±3.1
      7    -0.1±0.6     0.8±0.8     5.4±0.5    1.7±3.7    44.6±5.3   22.5±5.1
    -------------------------------------------------------------------------
    
    to notice:
    a) high rise at steps 1 and 7 (sites of Phe insertion)
    b) overall positive roll, greatest at steps 1 and 7
    c) overall unwinding, especially at step 4 (TATA dyad)

    b. what does general sequence DNA look like?

    In order to determine which of the conformational parameters of TBP-bound DNA are different from free DNA in solution, and also as a measure of how reliable the simulations are, we calculated the conformational parameters for each base pair step (except the end steps) along the production phase of all the simulations, and averaged them. We compared them to the conformation of NMR-derived structures(PDB), high resolution crystal structures (NDB) and fiber diffraction models of DNA.

    Table 3.
    tetrad    shift       slide       rise       tilt        roll      twist 
    -------------------------------------------------------------------------
    charmm   0.2±0.8    -1.2±0.8    3.3±0.5    2.2±6.5    3.1±13.1   32.2±5.6
    mlp(a)   0.2±0.7    -1.4±0.8    3.3±0.5    2.1±6.2    2.4±11.9   31.9±5.1
    mlp(b)   0.2±0.7    -1.4±0.7    3.3±0.5    2.5±5.9    3.2±12.0   31.9±5.0
    amber    0.3±0.6    -1.5±0.6    3.4±0.4    3.0±5.9    2.7± 8.4   30.5±4.5
    -------------------------------------------------------------------------
    -------------------------------------------------------------------------
    NMR      0.7±0.4    -0.9±0.5    3.1±0.3    7.9±3.8    3.9± 7.1   33.2±2.6
    NDB-A    0.4±0.6    -1.9±0.4    3.4±0.4    3.3±4.1    6.5± 6.2   30.4±4.4
    NDB-B    0.4±0.5     0.1±0.9    3.3±0.2    4.8±3.3    0.6± 6.4   36.2±6.7
    -------------------------------------------------------------------------
    -------------------------------------------------------------------------
    fiber A    -0.3       -2.0        3.2       -2.8        10.5       30.7  
    fiber B     0.0       -0.6        3.3        0.4        -2.6       35.9  
    -------------------------------------------------------------------------
    
    Noteworthy observations are:
    a) all the simulations yield very similar values for the geometries of general sequence DNA, regardless of the potential used, the protocol for handling the electrostatic interactions, the integrator for the equations of motion, and the initial distribution of velocities.
    b) the simulation results are more similar to NMR derived structures and A-DNA crystal structures (NDB-A) -negative slide, low twist- than to B-DNA crystal structures (NDB-B).
    c) the simulations reproduce adequately the anisotropy of motion of general sequence DNA: the standard deviation of roll is larger than that of tilt or twist.

    Following Olson and coworkers11, we define for further use the "thermally accessible range of conformations for general sequence DNA" as the interval included between the mean ± one standard deviation in the charmm entry of Table 3.

    c. do the dodecamers differ in their average structures?

    From the 2D rmsd plots of the simulations of the dodecamers, we identified the longest time interval with a low rmsd (< 1.8Å, see the square marked by a black line in Figure 1). An average structure was calculated for each dodecamer representing the structure in that interval. Figure 4 shows the average structure for the last 750 ps of simulation of mlp, and the helix axis for that structure, compared to the superimposed axes calculated for the average structures of mlp, 6t, at and gc. It is impossible to separate the axis of gc, which is a sequence poorly bound by TBP, from those of known binding DNA sequences (mlp, 6t, at).

    d. what does each DNA tetrad look like?

    As the differences among the dodecamer properties were not discernable from their averaged structures, we looked at the properties of their constituent base pair steps. The dodecamers were broken down into base pair steps, and these were pooled and classified according to their 5' and 3' flanking nucleotides.

    Table 4.
    Average Base Pair Step Conformational Parameters
    source     shift     slide     rise     tilt     roll     twist
    ----------------------------------------------------------------
    aAAa        0.1      -1.4      3.2       2.8      6.1     31.4
    aAAg        0.2      -1.4      3.2       1.6     -2.1     32.7
    tAAa        0.3      -1.3      3.3       2.7      5.5     31.6
    tAAg        0.1      -1.3      2.9      -3.2     10.0     28.4
    ----------------------------------------------------------------
    aAGa       -0.4      -1.1      3.7       0.7     -8.8     34.2
    aAGg        0.0      -1.7      3.3       0.6      2.2     31.7
    gAGg       -0.4      -0.8      3.3      -6.2      6.2     34.0
    tAGg       -0.5      -0.8      3.4      -5.1      1.8     34.7
    ----------------------------------------------------------------
    aGAg        0.0      -1.2      3.0       0.6      7.5     30.1
    ----------------------------------------------------------------
    aGGg        0.2      -2.2      3.5       3.0      2.1     32.9
    gGGc        0.4      -2.1      3.3       4.7     -6.8     32.7
    ----------------------------------------------------------------
    ----------------------------------------------------------------
    cCAt        1.1      -1.2      3.6       4.3     -1.7     32.7
    ----------------------------------------------------------------
    gCGc        0.3      -0.5      3.9       1.8     -4.8     35.1
    ----------------------------------------------------------------
    aTAa        0.0      -0.8      3.8       2.7      2.1     33.9
    aTAt        0.8      -0.8      3.5       5.5     -0.7     34.4
    cTAt        0.6      -1.0      3.5       3.8      0.7     33.5
    ----------------------------------------------------------------
    ----------------------------------------------------------------
    cATa       -0.4      -0.9      2.7      -3.1     18.4     24.4
    tATa        0.1      -1.0      3.0       1.9      9.2     30.7
    ----------------------------------------------------------------
    cGCg        0.0      -0.7      2.7       0.1     21.5     27.4
    ----------------------------------------------------------------
    
    The blinking entries in the table correspond to average values that are outside the thermally accessible range defined in Table 3.

    The Table identifies variations in base pair step geometry that are sequence dependent and are likely to be relevant for TBP binding: YR steps display the highest rise, while RY steps display the lowest rise and twist, and highest positive roll.

    e. which base pair step properties are essential for TBP recognition?

    Having found differences in the geometry of base pair steps, we looked for the parameters that could be used by TBP to distinguish its binding sites. From the averages and standard deviations in Table 2, we built 99% confidence intervals for each geometrical parameter and for each step. We compared these confidence intervals with the thermally accessible interval defined in Table 3 for general sequence DNA simulated with the CHARMM potential, and selected those parameters/step combinations whose confidence interval does not overlap with the thermally accessible interval; this procedure selects those properties that have values in the TBP-bound DNA, but cannot be adopted frequently (i.e., are outside the thermally accessible interval), and are sequence dependent. The resulting list is shown in Table 5.

    The table was queried for particular tetrads that were the most or the least likely to acquire the conformation found in the crystal complexes with TBP. We used the confidence intervals as a filter in evaluating all the conformations generated in the simulations, to count the number of times that a particular geometrical parameter for each tetrad fell inside the confidence intervals. The frequency with which each tetrad visits the crystal conformations was further rated for significance with a chi squared test. The procedure resulted in the identification of tetrads that appeared more times in the range of properties corresponding to the crystal structures (best tetrad in Table 5) and the ones that made it the least number of times in this category (worst tetrad in Table 5).

    Table 5.
    Possible Base Pair Step Properties Contributing to Selectivity
    bp parameter           best         worst  
    / bp step             tetrad        tetrad 
    -------------------------------------------
      rise  1              aTAa         tata   
      roll  1              cgcg         gggc   
      twist 1              cata         gcgc   
    -------------------------------------------
      shift 2              tagg         atat   
      twist 2              cATa         aggg   
    -------------------------------------------
      slide 3              aTAt         aggg   
    -------------------------------------------
      slide 4              atat         aggg   
      roll  4              cgcg         gggc   
      twist 4              atat         aggg   
    -------------------------------------------
      slide 5              aTAt         tata   
      roll  5              cata         gggc   
      twist 5              cata         gcgc   
    -------------------------------------------
      slide 6              atat         tATa   
      roll  6              cATa         gggc   
      twist 6              cgcg         gcgc   
    -------------------------------------------
      slide 7              gcgc         aggg   
      rise  7              aTAa         tata   
      roll  7              cgcg         gggc   
    -------------------------------------------
    

    Entries highlighted in red correspond to those that identified as the best tetrad a step that actually has been found in a crystal with TBP, and hence are base pair step properties that are very likely to be used as selectivity determinants.
    The entry highlighted in purple corresponds to a property that identified as the worst tetrad a step that has been crystallized with TBP; consequently, such a step property cannot be a selectivity determinant.

    IV. Summary of findings

    In order to gain an understanding of the DNA properties that make TATA boxes a good substrate for TBP, we have first characterized the geometric parameters of DNA base pair steps that are consistent with forming a complex with TBP (Table 2). To distinguish which of these geometric requirements are outside the thermally accessible range for free DNA, besides the obvious requirements for low twist and a large positive roll, we carried out extensive MD simulations of DNA dodecamers in aqueous solution. With these simulations we defined the dynamic range for general sequence DNA (Table 3), which turned out to be fairly independent of the force field used, the treatment of electrostatic interactions, the integration algorithm for the equations of motion, and the initial distribution of velocities.

    We failed to find any difference between the average conformation of a dodecamer with a sequence that is not recognized by TBP (gc) and those of known TBP binding sites (mlp, 6t, at). We next asked whether our simulations were capable of displaying sequence dependent variations in DNA local structure (Table 4), and found differences between YR and RY steps, especially in rise, roll and twist.

    Comparing the thermally accessible range of conformations of general sequence DNA (Table 3) with the range of conformations consistent with complexation with TBP (Table 2) we selected those geometrical parameters specific to the bound DNA that are outside the thermal range. These are candidates for being selectivity determinants. To further narrow down the list of candidates, we probed all the conformations generated during the production phase of the simulations for their ability to reach the conformations found in TBP-bound DNA, and graded each tetrad according to the frequency of visits in these ranges. Those properties that selected a step that has been found in a crystal with TBP are very likely to be selectivity determinants (Table 5, red entries). Conversely, the property that selected as worst tetrad a step that has been found in a complex with TBP cannot be a selectivity determinant.

    V. Conclusions

    Properties of DNA that might be used by TBP to select its binding site have been identified as: positive slide, positive roll, low twist and high rise (Table 5, red entries).

    Best sequences for TBP binding have been identified as: alternating YR sequences, because YR steps have the highest rise (for the kink sites at steps 1 and 7), and RY steps have the low twist and high positive roll needed throughout the recognition element (Table 2 and Table 4).

    The specific structural and dynamic characteristics that predispose a DNA sequence for selective interaction with TBP were shown to be identifiable at the level of base pair steps. They show clear departures from the properties of general sequence DNA. The underlying molecular interactions that produce the special properties of these steps are the subject of continuing investigations.

    VI. References

    1. Kim, Y., Geiger, J.H., Hahn, S. and Sigler, P.B. Crystal structure of a yeast TBP/TATA-box complex. Nature 1993, 365, 512-520.

    2. Kim,J.L. and Burley, S.K. 1.9Å resolution refined structure of TBP recognizing the minor groove of TATAAAAG. Nature Structural Biology 1994, 1, 638-652.

    3. Nikolov, D.B., Chen, H., Halay, E.D., Hoffmann, A., Roeder, R.G. and Burley, S.K. Crystal structure of a human TATA box-binding protein/TATA element complex. Proc. Natl. Acad. Sci. U.S.A. 1996, 93, 4862-4867.

    4. Juo, Z.S., Chui, T.K., Leiberman, P.M., Baikalov, I., Berk, A.J. and Dickerson, R.E. How proteins recognize the TATA box. J. Mol. Biol. 1996, 261, 239-254.

    5. Nikolov, D.B., Chen, H., Halay, E.D., Usheva, A.A., Hisatake, K., Lee, D.K., Roeder, R.G. and Burley, S.K. Crystal structure of a TFIIB-TBP-TATA-element ternary complex. Nature 1995, 377, 119-128.

    6. Tan, S., Hunziker, Y., Sargent, D.F. and Richmond, T.J. Crystal structure of a yeast TFIIA/TBP/DNA complex. Nature 1996, 381, 127-134.

    7. Arndt, K.M., Ricupero, S.L., Eisenmann, D.M. and Winston, F. Biochemical and genetic characterization of a yeast TFIID mutant that alters transcription in vivo and DNA binding in vitro. Mol. Cell. Biol. 1992, 12, 2372-2382.

    8. Arndt, K.M., Wobbe, C.R., Ricupero-Hovasse, S., Struhl, K. and Winston, F. Equivalent mutations in the two repeats of yeast TATA-binding protein confer distinct TATA recognition specificities. Mol. Cell. Biol. 1994, 14, 3719-3728.

    9. MacKerell Jr., A.D., Wiórkiewicz-Kuczera, J. and Karplus, M. An all-atom empirical energy function for the simulation of nucleic acids. J. Am. Chem. Soc. 1995, 117, 11946-11975.

    10. Cornell, W.D., Cieplak, P., Bayly, C.L., Gould, I.R., Merz Jr., K.M., Ferguson, D.M., Spellmeyer, D.C., Fox, T., Caldwell, J.W. and Kollman, P.A. A second generation force field for the simulation of proteins, nucleic acids and organic molecules. J. Am. Chem. Soc. 1995, 117, 5179-5197.

    11. Olson, W.K. Simulating DNA at low resolution. Curr. Opinion in Struct. Biol. 1996, 6, 242-256.