************************************************************************ * * * The ECEPP Package * * * ************************************************************************* What the Package Does --------------------- The program performs the following calculations: 1) Single Energy Evaluation. 2) Single Energy Minimization 3) Energy evaluation of Multiple Input Conformations 4) Energy Minimization of Multiple Input Conformations 5) Monte Carlo Search using a generalized MCM (EDMC) algorithm. 6) PRODUCE an energy map for a pair of dihedral angles. 7) Carry out an rms deviations analysis. 8) Variable Target Function Procedure for structure determination. Getting Started and Compiling the Eceppak Package ------------------------------------------------- See the file "README" in the main eceppak directory. How To Run this program ----------------------- - The script to run the program is called: recepp.s. When you SOURCE the cshrc file, an ALIAS is set up to SELECT the script for the correct ARCHITECTURE. Files are stored in the proper subdirectory in eceppak/Scripts. To run the program you should give a set of arguments, the number of arguments depends on the architecture. You will get precise information about the arguments that should be used by typing, recepp.s IMPORTANT: if "recepp.s" is not recognized, you need to source the cshrc file. if the command does not execute properly, then, check your cshrc file. It may have been set up incorrectly. Look at previous point "To Start" to do this setup. What's New ---------- * The old set of ECEPP Input files has been replaced by a more flexible file structure. * The main input file contains now a series of cards that define the type of run and parameters. * Residue Data file has been enhanced. This file contains the ECEPP/3 residues and other non-standard ones. There are 72 residues (including N-methyl residues), and new end groups defined. The file is found under eceppak/Data/Residue/rsdata. Among the changes introduced in rsdata are: (a) Data on loop closing pairs was added. The program uses a general treatment for these pairs (introduced by A.Liwo). (b) It includes N-methyl residues. (c) Hydration atom types were added in the description of atoms (old hrs.data). (d) Description of 1-4 interactions is included in a more general format. (e) C' was replaced by C, NP in PRO and HPRO was replaced by N to increase compatibility with PDB format. (f) Atom type of protons in COOH groups ( ASP, GLU, meASP, meGLU and Carboxyl-End terminal) changed to type 1, (as in ECEPP/3, no H-bonding allowed). * Hydration parameters for different surface models are provided under the subdirectory eceppak/data/Hydration_files. The SRFOPT set (srfopt.set) of parameters is defined as the default. Other sets can be used by modifying the recepp.s script (eceppak/Script/$ARCHITECTURE/recepp.s). Examples -------- The Input files provided as examples (directory eceppak/Test) will give you an idea of the calculations the program is able to do. There are several subdirectories here corresponding to the different type of runs eceppak can perform. FILE(S) EXPLANATION ------- ----------- enk_sol.inp Calculation of surface solvation energy. To execute type: "recepp.s ENERGY enk_sol ENK_sol dummy dummy" enk_checkgrad.inp Checking Gradient calculation. "recepp.s CHECKGRAD enk_checkgrad ENKGRAD dummy dummy" enk_sp.inp Calculate energy using a soft-sphere potential. "recepp.s ENERGY enk_sp ENKSP dummy dummy" enk.inp EDMC run. "recepp.s EDMC enk enk_out dummy dummy" mebmt.inp Minimization (with output from minimizer). "recepp.s MINIMIZE mpa1ot MPA1OT dummy dummy" avian.inp ECEPP/3 and solvation energy. "recepp.s ENERGY avian AVIAN dummy dummy" cala6.inp Cyclic peptide and solvation energy. "recepp.s ENERGY cala6 CALA6 dummy dummy" hisp1.inp EDMC run with two possible states for PRO (UP and DOWN). and HIS (HID and HIE) residues. "recepp.s EDMC hisp1 HISP1 dummy dummy" cys1.inp Input sequence with 1-letter code. "recepp.s ENERGY cys1 CYS1 dummy dummy" three_let.inp Input sequence with 3-letters code. "recepp.s ENERGY three_let THREE_LET dummy dummy" CPEP.inp Energy minimization of multiple input conformations. outo.CPEP set of conformations to be minimized. "recepp.s MINIMIZE CPEP CPEPout CPEP dummy" ala_map.inp Energy map. ala_rms1.inp RMS deviation analysis; generation of a reference conformation. outo.ala_rms Input conformations for comparison in ECEPP format. ala_HELIX.pdb Input for reference conformation generation in PDB format To execute type: "recepp.s RMS_FIT ala_rms1 ala_rms1 ala_rms ala_HELIX" As a result you get, among others, a file xray.ala_HELIX that could be save for future use. ala_rms2.inp RMS deviation analysis; comparison of a conformation (file in pdb format) with the reference one. ala135.pdb Input conformation for comparison in PDB format (with end groups). xray.ala_HELIX Reference conformation for comparison (in ECEPP format). To execute type: "recepp.s RMS_FIT ala_rms2 ALA_RMS2 ala135 ala_HELIX" timbck.inp Calculate upper and lower bounds for distance constraints tim.pdb runs from a pdb file. "recepp.s BOUNDS timbck TIMBCK tim" vtf_tim.inp Example of a run using the Variable Target Function procedure. outo.vtf_tim Usually constraints come from NMR experiments bounds.timbck " recepp.s VTF vtf_tim VTFOUT dummy timbck" tim_sp.inp Example of a Monte Carlo run combining distance constraints bounds.timbck and a soft-sphere potential (NMR refinement). Output files for comparison with your results are provided in directory test_output. NOTE: We have noticed that large differences can occur between EDMC runs in different architectures. This appears to be related to machine precision. In general, a single energy calculation will tell you if the ECEPP/3 energy function is working correctly. For EDMC runs, check if the program leads to a sequence of improved energies. ******************* * TABLE 1 * ******************* Conventions: ----------- Residues can be specified using the ECEPP list number, a three-letter code or a ONE letter code. ---------------------------------------------------------------------- ECEPP ECEPP 3-letters 1-letter RESIDUE LIST No. KIND code code ---------------------------------------------------------------------- ALANINE 1 -1 ALA A ASPARTIC ACID 2 -2 ASP D CYSTINE 3 -3 CYS C_ GLUTAMIC ACID 4 -4 GLU E PHENYLALANINE 5 -5 PHE F GLYCINE 6 6 GLY G HISTIDINE (HID) 7 -7 HIS H ISOLEUCINE 8 -8 ILE I LYSINE 9 -9 LYS K LEUCINE 10 -10 LEU L METHIONINE 11 -11 MET M ASPARAGINE 12 -12 ASN N PROLINE-DOWN 13 13 PRO P GLUTAMINE 14 -14 GLN Q ARGININE 15 -15 ARG R SERINE 16 -16 SER S THREONINE 17 -17 THR T VALINE 18 -18 VAL V TRYPTOPHAN 19 -19 TRP W TYROSINE 20 -20 TYR Y CYSTEINE 21 -21 CYX C HYDROXYPRO-DOWN 22 -22 HPD P< NORLEUCINE 23 -23 NOR N< ORNITHINE 24 -24 ORN O HISTIDINE (HIE) 25 -26 HIE H- BENZYL-ASPARTATE 26 -30 BZD B< ORNITHINE + 27 -25 OR+ O+ HISTIDINE+ (HIP) 28 -27 HI+ H+ LYSINE + 29 -28 LY+ K+ ARGININE + 30 -29 AR+ R+ ASPARTIC ACID - 31 -31 AS- D- GLUTAMIC ACID - 32 -32 GL- E- PROLINE-UP 33 13 PRU P% AZETIDIN 34 13 AZE P* HYDROXYPRO-UP 35 -22 HPU P> TYROSINE - 36 -36 TY- Y- AMINOBUTYRIC ACI 37 -33 ABU Z< AMINOISOBUTYRIC 38 -38 AIB Z> SERINOLA 39 -39 SLA S< allo-ISOLEUCINE 40 -40 AIL I* AMINOBUTYRIC LOO 41 -41 ASU U< SXRAYIN1 42 -42 SXY X SLLXRAYIN 43 -43 SLX X* GLUTAMIC LOOP 44 -44 GLP E_ LYSINE LOOP 45 -45 LYP K_ DAB LOOP 46 -46 DAB B_ GLYCINE LOOP 47 47 GYP G_ LEUCINE LOOP 48 -48 LEP L_ ASPARTIC LOOP 49 -49 ASX D_ M-DUMMY50(mGLY) 50 -50 M50 @50 MeALANINE 51 -51 M-A @A MeASPARTIC ACID 52 -52 M-D @D MeCYSTINE 53 -53 M-C @C_ MeGLUTAMIC ACID 54 -54 M-E @E MePHENYLALANINE 55 -55 M-F @F SARCOSINE 56 -56 SAR @G MeHISTIDINE 57 -57 M-H @H MeISOLEUCINE 58 -58 M-I @I MeLYSINE 59 -59 M-K @K MeLEUCINE 60 -60 M-L @L MeMETHIONINE 61 -61 M-M @M MeASPARAGINE 62 -62 M-N @N MeDUMMY63 63 -63 M63 @63 MeGLUTAMINE 64 -64 M-Q @Q MeARGININE 65 -65 M-R @R MeSERINE 66 -66 M-S @S MeTHREONINE 67 -67 M-T @T MeVALINE 68 -68 M-V @V MeTRYPTOPHAN 69 -69 M-W @W MeTYROSINE 70 -70 M-Y @Y Me-BMT 71 -71 BMT @Z MeORNITHINE 72 -72 MOR @O ---------------------------------------------------------------------- ECEPP ECEPP 3-letters 1-letter END GROUPS LIST No. KIND code code ---------------------------------------------------------------------- AMINO - H2 1 1 H2N H AMINO - H3+ 2 2 H3N H+ AMINO -CH3 3 3 CH3 M AMINO-COCH3 4 -4 ACE A FORMYL 5 -5 FYL F END-PRO,CIS-H 6 -6 CHP P- END-PRO,TRANS-H 7 -7 THP P END-H2+-PRO 8 -8 AHP P+ PYROGLUTAMIC 9 -9 PGL G AMINO (CYCLIZING 10 10 HN- H_ CARBOXYL - COOH 11 -11 CXH O CARBOXYL - O 12 12 OCC O- CARBOXYL-CH3 13 13 CCC L CARBOXYL-NH2 14 -14 NCC N CARBOXYL-NHCH3 15 -15 NME C N, N - DIMETHYL 16 -16 DME D METHYL ESTER 17 -17 MES T ETHYL ESTER 18 -18 EES E AMINO-T-BOC 19 -9 BOC B CARBOXYL(CYCLIZI 20 20 CXL O_ MPA (HALF S-S) 21 -21 MPA R_ DMP (HALF S-S) 22 -22 DMP D_ CPP(AX) (HALF S- 23 -23 CPP C_ CARBOXYL-CH2F 24 24 CHF S OCA(AX) (HALF S- 25 -25 OCA A_ OCA(EQ) (HALF S- 26 -26 OCE E_ SCA(AX) (HALF S- 27 -27 SCA S_ SCA(EQ) (HALF S- 28 -28 SCE T_ CPP(EQ) (HALF S- 29 -29 CPE F_ DANSYL 30 -30 DAN W CARBOXYL 31 31 CXX X AMINO-CYNAMONIC 32 -32 CYN Y ________________________________________________________________________ Note: ---- `@' is used to indicate N-methyl residues. `_ 'is generally used to indicate a bridging residue (e.g. C_ indicates CYSTINE). `+' and `-' are used to indicate a charged residue (e.g. K+ indicates charged lysine residue). Description of the input file: ----------------------------- The general input to the program is given through a file with a set of instructions. The program uses a parser to read these instructions. The parser reads and interpret the first 78 characters of a line. No distinction is made between lower-case or upper-case letters. The symbols # and ! are used to indicate the beginning of a comment. When any of this symbols are encountered, the parser will ignore the rest of the line. Instructions related to a given procedure are associated into the so called "Data Groups". A "Data Group" is identified by a main keyword which contains the symbol '$' as the first character, i.e. $EDMC, $CNTRL. Also the keyword $end or $END, should be present, indicating the end of the Data Group. Any word included between the main keyword and $end, is considered an instruction. This is an example of a Data Group $CNTRL runtyp=Energy $end The following list contains the Data Groups already defined in ECEPPAK: $BOUNDS, $BOUND_DEF, $BRIDGE, $CNTRL, $DIST_CONST, $EDMC, $FFIELD, $GEOM, $GRID, $MINIM, $REGIONS, $RMSFIT, $SCAN, $SELEC_PDB, $SEQ, $SPEC, $ENERCALC, $VTF, $WINDOWS, $OVERLAP_GRP and $OMCIS. Three of the Data Groups are considered essential and without them the program will abort. They are: $CNTRL, $SEQ and $GEOM. $CNTRL is used, mainly, to indicate the type of calculation the user wants to perform. $SEQ provides the sequence of the molecule under study. $GEOM Contains the set of internal variables (dihedral angles) of the initial conformation. Description of the Data Groups ------------------------------ $CNTRL This Data group is used to define the type of calculation the user would like to carry out. Also, there are a few instructions, common to different modules, that are defined here. The data group is essential. The program will not proceed if the data group is not found. Keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- RUNTYP = Define the type of calculation. ENERGY -Compute energy. CHECKGRAD -Check analytical gradient vs. numerical. MINIMIZE -Carry out energy minimization EDMC -Carry out EDMC/MCM monte Carlo search. RMS_FIT -Compute rms deviations and fitting. BOUNDS -Computes upper and lower bounds from a reference conformation and generate a constraint file for future use. VTF -Carry out a variable Target Function study. VERBOSE Print all information available. CHISCAN Carry out a systematic search with energy minimization for low conformations of side chains dihedral angles. Specification of the keyword RUNTYP = MINIMIZE is required. The set of dihedral angles to be scanned should be specified using the data group $SCAN. Also, NSTEP should be specified. NSTEP = number Number of step for the side chain search using the CHISCAN option. i.e. if nstep=6 the angles will be search in increments of 60 degrees. PRINT_CART To request printing of Cartesian coord. OUTFORMAT = Format required for the output file containing the Cartesian coordinates. ECEPP ECEPP format. PDB PDB format. AMBER AMBER (history) format. CNDO CNDO format. CA_PDB PDB (with CA only)format. SEL_PDB PDB (for selected atoms only) format. This atoms should be specified within the $SELEC_PDB data group. FILE = name_of_file Filename of the output Cartesian file. In case of multiple conformations. A sequence of files will be written as name_of_fileNNN.*, where NNN is an integer from 000 to 999. NO_HYDRG_IN_PDB Omit printing H atoms in PDB files NRES = number number of residue on the specified molecule. It is not essential. The program will compute this value from the sequence (see $SEQ data group). RES_CODE = Specifies the input format of the sequence. ECEPP ECEPP numbers are used. Default. THREE_LETTER Sequence specified using a three-letter code. ONE_LETTER Sequence specified using a one-letter code. VAR_ANGLES = Used to define the set of variables ALL All dihedral angles are variable. Default. BACK Variable are the backbone dihedral angles. SIDE Variable are the side chain dihedral angles. SPEC Variable dihedral angles specified through $SPEC data group. NONE ALL dihedral angles are fixed. PHPS Only PHI and PSI Backbone dihedral angles. BKSD Backbone dihedral angles. VAR_RES = number Used to define as variables a group of dihedral angles from specific residues. VAR_RES represents the number of residues that contain variable dihedral angles. The information of the specific residues (sequence position) is entered through the $SPEC data group. The set of dihedral angles to be varied is defined by selecting a proper value of VAR_ANGLES. NOTE: Since the keyword VAR_RES works in combination with VAR_ANGLES, VAR_ANGLES cannot be set to SPEC. TIME = number Estimated CPU time of the run. Program will end when this time limit is reached. Default is 10.0**10 sec. EMINIMA = number Use to avoid printing of high energy conformations during multiple evaluation of energies or minimizations. Works in conjunction with keywords $ENERCALC or $VTF. NOTE: The usage of the following keywords in $CNTRL data group is kept for consistency with previous version but is not recommended. They were incorporated into other data groups. SURFACE_OUT Print exposed surface for atoms. The keyword SOLVATION= SURFACE must be specified in datagroup $FFIELD MULT_CONF = This flag is used to indicate the energy evaluation or minimization multiple conformations. READ conformations are read from file (outo.*). The name of the input file is passed to the program through the recepp.s script as the 4th argument. RANDOM Generate conformations from random sets of dihedral angles. In this case, MAXIT and SEED must be specified. NOTE: The options of this keyword are equivalent to keywords READ_CONF and RAND_START in $ENERCALC and $VTF data groups. MAXIT = number Maximum no. of randomly generated conformations. Used with MULT_CONF=RANDOM and MAXIT. SEED = number Seed for random number generator. Used with MULT_CONF=RANDOM and MAXIT. REFERENCE Used to stop EDMC when the ZIMMERMAN Code of an accepted conformation matches the one corresponding to the conformation provided as reference. now can be specified in $EDMC. If used with during energy evaluation (or minimization) or VTF, it will print the Zimmerman Code of the conformations. This option is also available (recommended use) in data groups $ENERCALC or $VTF using the keyword ZIMMERMAN_CODE. $BOUND_DEF This data group works in combination with runtyp= BOUNDS (see $CNTRL keyword) and the data group $BOUNDS. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- TYPE_INPUT = PDB_NO_ENDG Default. input file is PDB with no end groups. PDB_WITH_ENDG input file is PDB with end groups. DELT_R = Upper and Lower bounds can be obtained by: PERCENTAGE A- adding and subtracting a percentage (PERCENT) of the actual distance (R) to the computed value of R, i.e upper bound= R+ (PERCENT/100)*R. Default. FIXED B- adding and subtracting a fixed value (FIXVAL) to the actual distances. FIXVAL = number See explanation for DELT_R. PERCENT = number See explanation for DELT_R. WEIGHT = number Weight associated to the constraints. IGNORE_H Don't stop if H cannot be identified. MAXDIST = number Is used to reduce the number of constraints. Only specified atoms separated by distances smaller than MAXDIST will be used. (default is 100000.0). MINDIST = number Is used to reduce the number of constraints. Only specified atoms separated by distances greater than MINDIST will be used (default is 0.0). FIRST_RESIDUE = number This keyword allows the use of a portion of a PDB file to be read and use for generation of distance constraints. FIRST_RESIDUE should correspond to the PDB number of the first residue in the sequence. Note: sequence must be specified sequentially and no residues should be missing. RESIDUE_GAP = number Distance for residues separated in sequence by RESIDUE_GAP or more residues will be computed (default is 0). $BOUNDS This data group works in combination with runtyp= BOUNDS (see $CNTRL keyword) and the data group $BOUNDS_DEF. The group does not have specific keywords. It is used to enter the names of atoms for with distance constraints are requested and the weight assigned to the constraint. example: Computed Bounds between CA atoms and give them a weight of 10.0 CA CA 10.0 $BRIDGE This data group is used to define the linkage between bridging residues. The data group requires the specification of pairs of numbers corresponding to the position in sequence of the bridging residues. The program recognizes residues that forms bridges. Consequently, there is no need to specify the number of them. $DIST_CONST This data group is used to define the set of distance constraints. It works in combination with one of the following keywords: a- RUNTYP= VTF in CNTRL data group, or b- CONSTR_MOV in $EDMC or $FFIELD data groups. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- N1PAIR = number - Number of bounds read using atom number as identification. A tedious procedure but needed from time to time. N2PAIR = number - Number of bounds read using specific alpha-numeric characters for the atoms and corresponding residue. RESN1_IS_ONE This flag is used to introduce distance constraints associated to a sequence without end-groups, i. e. the first full residue is numbered as 1 (usual case of constraints obtained from a typical PDB file). ECEPP ALWAYS assumes that the chain has end groups. Consequently sequence numbering is usually shifted by one (+1) from the PDB sequencing. The flag should be omitted (default) if the residue numbers in the distance-constraint file are the same as in ECEPP. (The sequence number is used to identify the atoms in subroutine CLASS). DIST_WEIGHT = number - A constant with units of kcal/mol/A that converts the "Sum of Squares of Errors" into energy. (WEI) ADAPT_WEI - This and the following keywords are used by EDMC method. (experimental) ADAPT_WEI is used to indicate that the weight assigned to the distance energy term, EDIS, should be adapted during the course of a conformational search. The goal is to control the value of the distance energy term during a simulation. This keyword should be specified in combination with: (a) PERCENT_WEI; or (b) PERCENT_WEI, DELTA_PERC_WEI, MAX_WEI and MIN_WEI. PERCENT_WEI = real_number - Defines the 'expected' ratio between EDIS and the sum of the remaining energy terms. If the DELTA_PERC_WEI is omitted, the algorithm will try to keep this ratio approximately constant during the run. DELTA_PERC_WEI = number This flag is used to modulate the effect of the distance constraint energy term on the search. Works in the following manner: DELTA_PERC_WEI/MAXIT will be added or subtracted from the initial PERCENT_WEI during the course of the run. In this way the algorithm tries to enforce the distance constraints (when DELTA_PERC_WEI is positive) while it proceeds toward lower energies. The search will be directed toward constraints satisfaction. If DELTA_PERC_WEI is negative, on the other hand, the constraints will be less important as the run evolves and the search will be guided by the ECEPP/3 energy terms. MAX_WEI = number Maximum allowed value for DIST_WEIGHT; Works in conjunction with PERCENT_WEI MIN_WEI = number Minimum allowed value for DIST_WEIGHT; Works in conjunction with PERCENT_WEI SOFT_SWITCH = number Use a linear distance constraint function when the actual distance, d, is greater than the upper bound plus the specified number. From Feng Ni (BRI, Montreal). SOFT_SLOPE = number Value of the slope on linear function From Feng Ni (BRI, Montreal). NUMBER_OF_GROUPS = number Indicate the number of groups (set of protons) with overlapping resonances. This value, when specified, should be greater than one (1). From Feng Ni (BRI, Montreal). $EDMC This data group works in combination with runtyp= edmc (see $CNTRL keyword). This data group is used to define parameters and different alternatives for the Monte Carlo search. The EDMC method is a procedure for searching the conformational space a polypeptide. It is based on a Monte Carlo approach that combines minimization of the potential energy and a predictive algorithm that attempts to produce suitable rotations that lead to better energies. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- MCM - Carries out a Monte Carlo with energy Minimization search rather than the search available through the EDMC method. It is a special case of EDMC, in which all the perturbations are produced randomly. MOTION = CRANKSHAFT - (the default) - backbone dihedral angles are associated in rotatable pairs. [ psi(i-1), phi(i)], (where i is the residue in the i-th position on the sequence) When a member of a given pairs is selected for a change, say a rotation 'delta', then, an opposite rotation, '-delta', is added to the the second dihedral angle. This type of movement tend to preserve the global conformation of a folded polypeptide while changing the local conformation. PIELA Varies one backbone angle at a time (makes large changes) LAMBDA Varies the angles of rotation of peptide groups about virtual bond (CA-CA) axes. Doesn't change much backbone shape, but rather optimizes the orientation of peptide groups. CONSTR_MOV Indicates that distance constraints should be used. See $DIST_CONST keyword to find out how to introduce distance constraints. BACKUP = number Time interval in seconds in which restart information is punched. (default 3600 s) RESTART Flag to indicate that the program should continue a previous search. The program will look automatically for a backup file. MAXIT = number Maximum number of steps (accepted conformations) in MCM/EDMC RAND_START Start from a randomly-generated conformation. This key works requires definition of SEED. OMEGA_180 Works with RAND_START. Keep the omega's at 180. RAND_TO_ELEC = number Pre-defined ratio of random to electrostatic sampling; default 0.1. RAND_TO_ELEC=1.0 is equivalent to the flag MCM. MAX_REPM = number - Maximum number of repetitions of a conformation. MAX_RAND = number - Maximum number of random-prediction trials. MAX_EL = number - Maximum number of electrostatic-prediction trials within an iteration. MAX_THERMAL = number - Maximum number of thermal movements. EFINAL = number - Target Energy. This represents a way to stop the search when EFINAL is reached. default is a very large negative number. TEMP = number - Temperature used during normal stages of the search. The default is doing simulations at a constant temperature. However, there are two other alternatives: 'Thermal_shock' and 'adapt_temp'. THERMAL_SHOCK - Thermal shock Monte Carlo scheme. The system is suddenly "heated". Keywords that need to be specified are: T_LOW = number - lower bound of temperature. T_UP = number - upper bound of temperature. NTEMP = number - Number of steps in which the system is heated from T_LOW to T_UP. ADAPT_TEMP - Adaptive temperature scheme. If NHEAT=NCOOL=1, we have THERMAL_SHOCK. NHEAT = number - Number of heating steps. NCOOL = number - Number of cooling steps. T_LOW = number - lower bound of temperature. T_UP = number - upper bound of temperature. NPRINT_ELEC = number - printing of electrostatic diagnosis every NPRINT_ELEC accepted conformations. OMPROB = number - The priori probability that a cis peptide bond is being tried to be converted to a trans bond. The default is 5000 which means that the program will first attempt at making all the peptide bonds trans. HISP_CHANGE = number - The probability that in a given iteration the program attempts at changing the conformations of HIS and PRO in the sequence from PRO-UP to PRO-DOWN, (or vice versa), or from HIE to HID, or vice versa (default ??). CONST_SEQ - The program will not change the protonation form of histidine and the internal geometry of proline. TYPE_BKTK = - Defines the set of dihedral angles altered during backtracking (during heating of the system). BACK - Only backbone dihedral angles can be moved. ALL - All dihedral angles can be moved. MAX_VAR_BKTK = number - Maximum number of variables that can be changed simultaneously during backtrack. REGION_SAMP = - Use the set of sampling regions specified for specific amino acid. UNIFORM - Use uniform sampling through specified regions NONUNIFORM - Sample through specified regions using provided weights. SEED = number - Initialization of the random-number generator. Any negative number PRINT_SAMPLED - Print "extra" information from sampling. NWIND = number - Number of "windows" containing the specifications of the "bombing ranges", i.e. the ranges of the residues whose angles will be targeted by random/electrostatic sampling procedure. The angles of the other residues will only change during minimiza- tions; no changes will be made in them during sampling. This option is useful, if you made a point mutation in a large protein and want to establish quickly the effect of this mutation on conformation. In such a case it is good to "bomb" only the mutated residue, instead of wasting "munitions" on the whole protein. Default is to "bomb" the whole molecule. MAX_BCKB_REP = number The maximum number of times that the same backbone conformation can be accepted. When this limit is attained, the new generated conformations having the same Zimmerman code will be rejected, unless is an improvement on the current global minimum. Default value is 20. PROMET The omegas of Pro and N-Met residues will be searched with similar probabilities as for PHIs and PSIs. NPRINT_CONSTR = number - printing of information about distance constraints every NPRINT_CONSTR accepted conformations. REFERENCE Used to stop EDMC when the Zimmerman code of an accepted conformation matches the one corresponding to the conformation provided as reference (initial conformation in file *.inp). $ENERCALC This data group is used to request energy evaluation or energy minimization of a (or many) conformation(s). The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- SINGLE_CONF - Carry out the procedure using as input the conformation provided in data group $GEOM READ_CONF - Carry out the procedure starting from the set of conformations provided in a separate input file (outo format). RAND_START - Carry out the procedure starting from the set of randomly-generated conformations. OMEGA_180 Works with RAND_START. Keep the omega's at 180. MAXIT = number - Maximum no. of randomly generated conformations. SEED = number - Seed for the random number generator. REGION_SAMP = - Use the set of sampling regions specified for specific amino acid. UNIFORM - Use uniform sampling through specified regions NONUNIFORM - Sample through specified regions using provided weights. BACKUP = number - This keywords should allow to stop the procedure nicely. Not implemented, yet. RESTART - This keywords should allow to restart the procedure. Not implemented, yet. NO_MINIMIZATION - Use to check energy terms related to the distance constraints . No VTF minimization is being carried out. CONSTR_MOV - This keyword is used to indicate that distance constraints are used in the calculation. The key can be included, optionally, in the $FFIELD data group. ZIMMERMAN_CODE - This option is used to print the Zimmerman Code of the conformation(s). $FFIELD - Specific information about the force field used. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- FORCE_FIELD = ECEPP - ECEPP/3 force field (the default). SIMPLE_POTENTIAL - Max Vasquez's quartic potential for VDW distances. AMBER - Not implemented yet. DISCOVER - Not implemented yet. CHARMM - Not implemented yet. SOLVATION = - Compute solvation energy. (the default is NO solvation) SURFACE -use surface-solvation models developed by J. Vila and R. Williams. VOLUME -use volume-solvation model developed by Joe Augspurger (S_PAR_FILE=volume.set must be specified). ELECTROSTATIC - Not implemented yet. Is intended to compute electrostatic solvation using the DELPHI program (B. Honig, Columbia Univ.). ALL - SURFACE + ELECTROSTATIC SURFACE_OUT - Print exposed surface for atoms. The keyword SOLVATION= SURFACE must be specified. NO_SOLV_MIN - Used with SOLVATION to indicate that solvation energy should be added to the total energy after energy-minimization of a conformation, but not used during the energy minimization process. RAD_FILE = character_variable - Input file with radia parameters for different solvation types. S_PAR_FILE = character_variable - input file with solvation parameters for different solvation types. SURFACE-HYDRATION FILES:srfopt.set (default), jrf.set,oons.set,solprmNW.nmr,optsl27.rall. VOLUME-HYDRATION FILE:volume.set. OM_TRANS - Impose a special one-fold potential on all omega angles to keep them trans; this goes with the keyword FORC. FORC = number - The torsional constant; the default value is 100 NO_TORSIONALS - Omit torsional terms of the potential function. THERMO TSTART = number TEND = number NSTEP = number CONTACT_ENE = number - Defines the contact energy when using the simplified potential. Used with FORCE_FIELD=SIMPLE_POTENTIAL PH = number - pH value. Not used in the present version. RES_DBASE = character_variable - Used in some architectures (SUN) to define the residue data file, or to select a different file than the default ``rsdata". Note: In general, the residue data file is specified in the script file recepp.s. CUTOFF = - Used to define cutoff in the energy terms. NONE default. BLOCK Used when a set of dihedral angles are kept fixed during the computations. In that case, the CUTOFF keyword can be used to omit the calculations of 1-4 and 1-5 interactions that don't vary during energy minimization. DISTANCE_CA Not implemented, yet. OVER_CUTOFF = number - Used to pre-minimize a conformation using a simple potential function until every single term of the energy is lower than the value specified by "number". NON_OVERLAP_ENER logical flag to requested printing of the energy of a conformation after relief of atomic overlaps when the conformation is subjected to energy minimization using the simple-potential function. Should be specified with OVER_CUTOFF or FORCE_FIELD=SIMPLE_POTENTIAL VARDIEL - Use a distance-dependent dielectric constant. Implementation of Feng Ni (BRI, Montreal). NOTE: The usage of the following keywords in $FFIELD data group is kept for consistency with previous version but is not recommended. They were incorporated into other data groups. CONSTR_MOV - This keyword is used to indicate that distance constraints are used in the calculation. The key can be included, optionally, in the $EDMC data group. $GEOM This is another essential data group used to define the initial conformation of the molecule. The program will not proceed if the data group is not found. The data group should contain the LIST OF DIHEDRAL ANGLES IN A FORMATTED INPUT (15f8.3). One line per residue (or end group) is necessary or the program will terminated with error. Blank lines are permitted. In this case, all dihedral angles will be set to zero, except when random generation of the starting conformation is requested. $GRID These keyword can work in combination with RUNTYP=ENERGY or RUNTYP= MINIMIZE. Generates an energy grid ( a two-dimensional energy map). if RUNTYP= MINIMIZE is specified, the program will carry out the following procedure: 1. The dihedral angles you define for the Phi-Psi map are kept fixed during minimization. 2.- The program minimize the energy using the remaining variables dihedral angles. The program scans two dihedral angles (ANG1 and ANG2) starting from the values specified in FROM1 and FROM2, respectively. There are two alternative possibilities for specifying the scanning: a- To give the final values of the dihedral angles (TO1 and TO2) and the number of steps (N1 and N2). b- to give the step size (STEP1 and STEP2) and the number of steps (N1 and N2). The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- ANG1 = character_variable - Name used to describe the first dihedral angle.Characters allowed are: PHI(n),PSI(n),OME(n),CHI(n), TAU(n) where n is the residue number. ANG2 = character_variable - Name of the second dihedral angle. FROM1 = number - Initial value of first dihedral angle. FROM2 = number - Initial value of second dihedral angle. TO1 = number - Final value of first dihedral angle. TO2 = number - Final value of second dihedral angle. STEP1 = number - step size of first dihedral angle. STEP2 = number - step size of second dihedral angle. N1 = number - Number of steps for first dihedral angle. N2 = number - Number of steps for second dihedral angle. OMSCAN-OK - Used to confirm scanning over an omega dihedral angle. $MINIM This keyword is used to modify a few parameters in the minimization program of Gay (SUMSL, SMSNO). The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- MINIMIZER = SUMSL - Use the unconstrained minimization solver with analytical gradient. SMSNO - Use the unconstrained minimization solver with numerical gradient. MAXFUN = number - Maximum number of function evaluations allowed. MAXIT = number - Maximum number of iterations allowed. MAXSTEP = number - Maximum value for V(RADFAC). VTNER1 = number - Helps decide when to check for FALSE convergence [V(26)]. ABSTOL = number - The absolute function convergence tolerance [V(31)]. RELTOL = number - The relative function convergence tolerance [V(32)] DSCALE = - Not implemented, yet. NONE FIXED VARIABLE DVALUE = number - Initialization value of the scale vector D. FULL_PRINT - Controls SUMSL printing. PRINT_RES_XG - Prints out values of X's, gradient and D's on return. PRINT_STAT - Prints out summary of statistics. PRINT_INITIAL_X - Print initial X's and D's. $OMCIS This datagroup is used to defined the residues for which the reference conformation for the peptide bond is cis. Format: NRES res1 res2 ... resk where NRES is the number of residues for which the cis conformation of the peptide bond is taken as the reference; and res1, res2,....resk are the numbers representing the position of the residue in the sequence. $OVERLAP_GRP This data group is used in a version under development. It is used within the VTF procedure to defined a sets of atoms with overlapping resonances. The format is as follows: IGP GLB NG IR1 G1 IR2 G2 IR3 G3 .........IGn Gn 1 HX1 1 17 HN 2 HX2 5 7 HM0 6 HM1 6 HM2 15 HM1 15 HM2 3 HX3 2 6 HM0 15 HM0 $REGIONS This data group is used to define the sampling regions for amino acids in a Monte Carlo type of search. The sampling can be UNIFORM or NONUNIFORM (this are keywords defined in data groups $VTF and $EDMC with the keyword REGION_SAMP). If the sample is UNIFORM the input is specified using the following format: residue_no. region1 region2 ..... regionM If the sample is NON-uniform the input is specified as: residue_no. region1 weight1 region2 weight2 ..... regionM weightM where: - residue no. belongs to { 2, inumrs} - regionI is one of the 16 regions of the PHI-PSI map using the Zimmerman's code A,A*,... H* , or any the four POPOV's regions. H-, H+ (HELIX) and S-, S+ (SHEET). - weightI is an integer indicating the weight used to generate the sampling probability for the associated region. Example of UNIFORM sampling: 3 A A* C C* Example of NONUNIFORM sampling: 3 A 40 A* 10 C 30 C* 10 A continuation line should be indicated with the symbol '\' $RMSFIT This data group is used for comparison of one or multiple conformations with a reference one. It works in combination with the keyword RUNTYP= RMS_FIT. This module calculates atomic rms deviations, rms distance deviations, radia of gyration, and is able to produce fitting of conformations. The program reads different types of reference and input files. By default, it tries to read the reference conformation from a file named xray.NAME_REF (where NAME_REF is a name provided by the user. NAME_REF is passed through an argument of the script that runs the program). When this file does not exist, the keyword GENERATE_REF should be used to generate it. As a default for generation of the reference conformation, the set of dihedral angles provided as input in the $GEOM data group is used. If a conformation given in PDB format is going to be used as reference, the user should used the keyword TYPE_REF with the appropriate argument to indicate this. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- GENERATE_REF - Used to indicate the generation of a reference (or target) conformation in ECEPP format. TYPE_REF = - Indicates the type of input format of the file containing the reference (or target) conformation: ECEPP ECEPP dihedral angles provided with the $GEOM data group in *.inp file. Default. PDB_NO_ENDG Typical PDB where residue No.1 is the first full residue. No end groups. PDB_WITH_ENDG Other files written in PDB format where first and last residues are end groups. TYPE_INPUT = - Used to indicate the type of format of the file to be used as input (conformation(s) under study). Acceptable input formats are: ECEPP ECEPP dihedral angles using `outo' format. PDB_NO_ENDG Classical PDB where residue No.1 is the first full residue. No end groups. PDB_WITH_ENDG Other files written in PDB format where first and last residues are end groups. IGNORE_H - Used to indicate the program not to worry about mismatches in the H atom names when reading PDB files. INIT_RES = number - Initial residue used on calculation. IFIN_RES = number - Final residue used on calculation. ALL_HVY_ATOMS - Calculate rms of all heavy atoms of the specified residues. ALPHA_CARBONS - Calculate rms of CA atoms of the the specified residues. BACKBONE - Calculate rms of backbone atoms (including CB) of the the specified residues. SIDE_CHAIN - Calculate rms of side-chain heavy atoms of the the specified residues. DISTANCE_RMS - Produces an additional report of the distance rms deviations for the input conformation(s) with respect to the reference conformation. CA_TRACE - Works in conjunction with the keyword ALPHA_CARBONS. It is a flag to request the generation of a series of aligned pdb file with the CA traces. PDB_ALIGN_HVY - Write a pdb file using alignment of all the heavy atoms. PDB_ALIGN_CA - Write a pdb file using alignment of the CA atoms. PDB_ALIGN_BACK - Write a pdb file using alignment of the backbone atoms. PDB_ALIGN_SIDE - Write a pdb file using alignment of the side-chain heavy atoms. METHOD = - Defines the type of algorithm used to calculate RMS: GOLUB - Golub method. The default. KABSCH - Kabsch method. This requires some IMSL routines. FIRST_RESIDUE = number - This keyword is used to indicate that the first residue of the PDB reference file is numbered as `number' instead of 1. ADOPT_REF_SEQ - This keyword is used to indicate the program to adopt the sequence of the reference conformation when the read conformations have a different (or incompatible) sequence. $SCAN Scan carries out a systematic search of a set of specified dihedral angles. Angles should be specified in the following way (free format). residue_no. no_of_dih_angles no_first_dieh ... no_last_dieh $SELEC_PDB This data group works in combination with the keywords PRINT_CART OUTFORMAT= SEL_PDB (in $CNTRL data group) The data group is used to define the set of atoms included in the output pdb file. A free format is used to enter atom numbers (integer). $SEQ This the last ESSENTIAL data group. It is used to define the sequence of the molecule. There are three different ways in which the sequence is defined: (a) through ECEPP residue numbers (LIST); (b) Using a three-letter code; or (c) using a one-letter code. The keyword RES_CODE (in $CNTRL data group) is used to specify the options described previously. If this keyword is omitted, the program will attempt to read the sequence as ECEPP residue numbers. Rules: ----- (a) ECEPP residue numbers are read using free format (default). Numbers are integers defined as ECEPP LIST numbers. Check column 2 of Table I for correct assignment. A blank space is required between numbers. (b) Three-letter code. These are characters variables defined in column 3 of Table I. A blank space is required between words. (c) One-letter code. These are characters variables defined in column 4 of Table I. No blank space is required between descriptors (letters, usually). $SPEC This data group is used to specify the set of variables dihedral angles. This card usage depends on the values of the keywords VAR_ANGLES and VAR_RES ($CNTRL data group). (a) When VAR_ANGLES = SPEC is specified in the $CNTRL data group, 1- The VAR_RES should not be present in the $CNTRL data group. 2- The $SPEC data group is obligatory, and it must contain the following specifications (in free format and one line per residue): res_num num_var num_1st_var ... ... num_last_var where: res_num is the sequence number of the residue containing variables dihedral angles; num_var is the number of variables dihedral angles in the residue; num_1st_var, ..., num_last_var is a list of numbers (integers) that point to the specific variables dihedral angles in the residue. The list must contain `num_var' integers. (a) When VAR_RES = number_of_residues (number_of_residues is an integer) is specified in the $CNTRL data group, 1- VAR_ANGLES can be given ANY value (all, back, bksd, etc.) with the EXCEPTION of `SPEC'. 2- The $SPEC data group is required, and it must contain the sequence numbers of the residue for which the set of dihedral angles will be defined as the variables. The residues should be given as a list of integers in free format. (c) If VAR_RES and VAR_ANGLES are both omitted in the $CNTRL data group, or VAR_RES is omitted and VAR_ANGLES is set to a value different from SPEC, then, the SPEC data group is not required. $VTF This data group is used to define the parameters for a Variable Target Function (VTF) calculation ( see as a reference Va'squez and Scheraga. J. Biomol. Struct. & Dyn. Vol 5(4) 757-784 (1988)). It works in combination with the keyword RUNTYP = VTF (data group $CNTRL) and the data group $DIST_CONST. The specific keywords of this data group are: KEYWORD ARGUMENT DESCRIPTION ------ ------- ----------- SINGLE_CONF - Carry out the procedure using as input the conformation provided in data group $GEOM READ_CONF - Carry out the procedure starting from the set of conformations provided in a separate input file (outo format). RAND_START - Carry out the procedure starting from the set of randomly-generated conformations. OMEGA_180 Works with RAND_START. Keep the omega's at 180. CONST_SEQ - The program will not change the protonation form of histidine and the internal geometry of proline. MAXIT = number - Maximum no. of randomly generated conformations. SEED = number - Seed for the random number generator. RANK_ORDER = number - Determines the way the distances are order for the minimization steps during an iteration (IORDER). There are three possibilities: RANK_ORDER= 0, Order by range, `a la Braun-Go', i.e. rank one ==> distance between nearest-neighbor residues; rank two ==> distance between second nearest-neighbor residues; etc.; RANK_ORDER= 1, Keep same order as in input distances; RANK_ORDER= 2, Order by growing from N-terminus. MAX_RANK - Parameter used to control the VTF procedure. Usually, the procedure is carried out by starting from random conformations and introducing the distance constraints up to MAX_RANK (typically 10). From this run, a set of conformations is selected and a second run is carried out with the full set of distance (i.e. the final rank is equal to the number of residues in the chain. VTF_BY_RANK Indicates that distance should be included using the established ranks. Otherwise, the procedure introduces a few distance per minimization. NOTE: It is generally recommended to use the keyword VTF_BY_RANK STEP_RANK = [-]number - STEP_RANK > 0 defines the increment of the rank for sequential minimizations within an iteration of the VTF procedure (IFLOV). Additionally, STEP_RANK different from zero implies that torsional energy terms will NOT be included in the energy minimization and disulfide bridge (DSB) information will NOT be used. DSB closing at beginning of the VTF procedure interferes greatly with the possibility of satisfying distance constraints for smaller ranks. Consequently, it is recommended to add the DSB as an extra set of distance constraints. Since ECEPP assigns very high weights to force DSB, adding the DSB as additional distances constraints allows you to play with different values of the weights. If STEP_RANK < 0 is given, all the distance constraints will be included at once. With STEP_RANK=0 the VTF procedure INCLUDES torsional energy, DSB information and proceeds including distance constraints with a rank increment of 1. BIG_VIOLATION = number Is used to determine which conformations are reasonable after the whole process of generation is finished (conformation that should be saved). If the maximum violation is greater than the BIG_VIOLATION (+ 10%), the conformation is rejected. All the conformations VTF produces should be reasonable. STEPS_ON_ERROR = number -At every step of the VTF procedure, we check if the maximum violation exceeds BIG_VIOLATION. After 'n' consecutive steps of the vtf procedure that exceed BIG_VIOLATION (with n=STEPS_ON_ERROR), the generation process is aborted and a new trial is started. The idea is to cut time in useless energy-minimization. A conformation that do not satisfy the distance criteria after "n" steps have little chances to reorganize later, when additional distances are added. STEPS_ON_ERROR should not be very small, since some conformations with distances that exceed BIG_VIOLATION at certain stage of generation can reorganize later on, as additional distances are included in the minimization. (From my experience, values for STEPS_ON_ERROR of 15 to 20 seem to work better). REGION_SAMP = - Use the set of sampling regions specified for specific amino acid. UNIFORM - Use uniform sampling through specified regions NONUNIFORM - Sample through specified regions using provided weights. BACKUP = - This keywords should allow to stop the procedure nicely. Not implemented, yet. RESTART - This keywords should allow to restart the procedure. Not implemented, yet. NO_MINIMIZATION - Use to check energy terms related to the distance constraints . No VTF minimization is being carried out. ZIMMERMAN_CODE - This option is used to print the Zimmerman Code of the conformation(s). $WINDOWS This data group contains the ranges of residues whose dihedral angles will be changed during sampling. There are as many non-empty lines as the number of these ranges is. Each line contains the following two integers, read in free format: iw1 (the first residue of the range); iw2 (the last residue of the range). Description of the Data included for each Residue in rsdata _____________________________________________________________ A few changes in the original ECEPP3 residue data file have been included to add flexibility to the program. The goal is to minimize the coding of instructions that are residue-specific. example: ISOLEUCINE ILE I 0 0 0.000 F 19 4-0.9437972 0.3305252-0.0993245 0.9950551 -8 4 1.35 3 1 4 1.35 3 1 5 1.35 3 1 6 1.35 3 1 7 0.350197-0.548499-0.759283 3 5 3 8 0 0 -0.379217 0.310917-0.871508 5 9 3 11 1 1 0.999962-0.005163-0.007059 5 10 3 14 0 -1 0.350197-0.548498-0.759284 10 16 3 17 0 0 N 14 22 -4.59 11 7 10 -0.4226 0.9063 0.0 HN 2 2 2.27 7 4 6 1.4530 0.0 0.0 CA 9 7 0.82 17 11 16 1.7797 -0.4805 0.9222 HA 1 0 0.26 11 7 10 1.9888 -0.8392 -1.1617 CB 9 7 -0.06 0 17 19 1.9587 1.4440 0.0 C 7 14 5.80 11 8 10 1.1648 2.3835 0.0 O 17 26 -4.95 8 0 0 1.6625 -1.8692 -1.0179 HB 1 0 0.32 17 11 16 1.4086 -0.3635 -2.4951 CG2 6 5 -0.96 17 14 16 3.5188 -0.8471 -1.1725 CG1 6 6 -0.25 0 11 13 0.3225 -0.4528 -2.4713 HG2 1 0 0.32 14 0 0 1.6840 0.6781 -2.6602 HG2 1 0 0.32 14 0 0 1.8059 -0.9770 -3.3037 HG2 1 0 0.32 14 0 0 3.8906 0.1742 -1.2551 HG1 1 0 0.19 0 17 19 3.8906 -1.2463 -0.2289 HG1 1 0 0.19 0 17 19 4.0546 -1.6863 -2.3342 CD1 6 5 -0.96 0 0 0 3.7005 -2.7124 -2.2348 HD1 1 0 0.32 0 0 0 3.7005 -1.2695 -3.2771 HD1 1 0 0.32 0 0 0 5.1444 -1.6749 -2.3183 HD1 1 0 0.32 0 0 0 3.2771 1.5756 0.0 Description of first line: (TITL(L,I),L=1,4),ARES(I),ONE_LET(I),NFATO(I), QQQ_READ(I),PK0_READ(I), NMETR (TITL(L,I),L=1,4) Residue name. - ARES(I) Three-letter-code residue identifier, used for sequence definition. - ONE_LET(I) One-letter-code residue identifier, used for sequence definition. - NFATO (I) Indicates that the 3 initial atoms (N, HN, and CA) of the first full residue should be generated using the data from the amino-end (NFATO=0), or the data from the residue (NFATO=1). In particular, this assignment affects the charges of these atoms. - QQQ_READ(i) Net charge of the ionized residue (used on specific versions of the code). - PK0_READ(I) pKa0 of the ionizable group (used on specific versions of the code). - NMETHYL Logic variable to indicate if this is an N-methylated residue. Description of second line: NATOMS(I),NCHI(I),SNTH2(I),CSTH2(I),SDEL(I), CDEL(I) Same as in ECEPP/3 manual. Description of third line: KNDRES(I),NT,NGEOM(I),NTOR(I) - KNDRES(I) and NGEOM(I) same as in ECEPP/3 manual. - NTOR(I) is the number of torsional terms that are associated with EXPLICIT dihedral angles, while NT is the TOTAL number of the torsional terms associated with a residue, i.e. including the possible angles of the bridge formed by this residue. The parameters of the IMPLICIT torsional angles (i.e. those which will be calculated from the Cartesian coordinates after a bridge is formed) are stored in the arrays after the parameters of the explicit angles. Description of 4th to 7th lines: AR(J,I),NBB(J,I),NSS(J,I),NANG(J,I) Same as in ECEPP/3 manual. Description of 8th to 11th lines: (CHIANG(L,J,I),L=1,3),NDPT1(J,I),NDPT2(J,I), NUM(J,I),LRT1(J,I), IBRNCH(J,KINDI), ISHFK(J,KINDI) - CHIANG, NDPT1, NDPT2, NUM and LRT1 same as in ECEPP/3 manual. LRT1 is used in this program (not in the original ECEPP/3). - IBRNCH The program now handles more than one branch on the side-chains. If there is a branch this is defined specifically (IBRNCH =1) for the bond that branches out. Also, to bring compatibility with the IUPAC conventions (ECEPP reads the torsional angles following this convention), a variable ISHFK is defined for each bond to indicate is there is a shift of the bond definition given in rsdata. In some cases, like ILE, organization of the rsdata file for generation purposes (in ECEPP/3) requires a different rearrangement of the bonds numbers. In the specific case of ILE, lines 9 and 10 indicate that the dihedral angle input for bond 2 and 3 have to be exchanged. Description of 12th to 31th lines: (XOORD(L,J-1,I),L=1,3),ALPHA(J,I),LTYPE(J,I), NTYPE(J,I),CHG(J,I),NSN15(J,I), NSN14(J,I),NFN14(J,I) - XOORD, ALPHA, LTYPE, CHG, NSN15, NSN14 and NFN14 same as in ECEPP/3 manual. - NTYPE atom type for surface solvation models. How to Build a File with Distance and/or Dihedral Angle Constraint (bounds.*) ______________________________________________________________________________ A distance constraint energy term can be used in the calculations. The algorithm used in this program represents a modification of the one originally implemented in Max Vasquez's VTF (Vasquez, M. & Scheraga, H. A. 1988. "Variable-Target-Function and Build-up procedures for the calculation of protein conformation Application to bovine pancreatic trypsin inhibitor using limited simulated nuclear magnetic resonance data." J. Biomol. Struct. Dyn. vol. 5, 757-784. The functional form is: Econs= WEI_ENE * Sum [ wei(j)*(| rj - R|)^2]; j in {pairs} for rj < R or rj < R with R an upper or lower bound. where rj is the actual interproton distance. R is either, an upper bound or a lower bound. wei is a factor or weight used to make the constraint more (or less) relevant with respect to others. WEI_ENE is a factor that weights the distance energy term with respect to other energy terms ( like electrostatic, torsional,etc.) Distance constrains are included in the calculations in the following manner. 1.- Use the $DIST_CONST data group (NOT the $BOUNDS) to specify the number of constrains and setup other parameters. 2.- Generate a file (bounds.FILENAME) containing the information for each constraint as in the following examples. There are two alternative ways to describe the constraints: a.- Using the ecepp number for the specific atoms. The information should be written in one line per constraint (78 characters or less), and given in a free format as: mol1 iatm1 mol2 iatm2 lowb upb weight where mol1 is the molecule containing the first atom (integer). iatm1 is the first atom defining the constraint (integer). mol2 is the molecule containing the second atom (integer). iatm2 is the second atom defining the constraint (integer). lowb lower bound (real). upb upper bound (real). wei weighting factor (real). example 1 34 1 51 1.900 5.000 10.0 b.- 1 1 HCA 1 1 HCB -1.000 3.000 10.0 mol1 res1 atm mol2 res2 atm low-b upp-b weight where mol1 is the molecule containing the first atom res1 is the residue containing the first atom C if lower-bound is -1.000, then VDW contact is assumed. example: mol1 res1 iatm1 mol2 res2 iatm2 lowb upb weight (the file is a FORMATTED one). 1 1 CA 1 29 CA 7.186 7.942 10.000 will specify: the atoms defining the distance, upper and lower bounds, and a parameter (a weight) for the constraint. A line starting with ! is considered as a comment. See the example file bounds.timbck You should enter the number of constraints used in $DIST_CONST as N1PAIR= mmm and N2PAIR= nnn, where N1PAIR and N2PAIR are the number of constrains specified using format (a) or (b). NOTE: if a lower-bound is -1.000, then VDW contact is assumed. DIHEDRAL ANGLES CONSTRAINTS can also be included in the simulations. The functional form for the penalty energy is the same one used for the distance constraints (formula written above). The dihedral angles constraints are included in the 'bounds.*' file as follows: i. The word DIHEDRAL must come after the last distance constraint. ii. The next line should contain a number (real) that represents the conversion factor (or penalty weight), WEIDIH (equivalent to WEI_ENE in formula above). iii. Each subsequent line contains a description of a dihedral angle constraint with the following information: residue number, dihedral angle number, expected mean value, maximum deviation, and specified weight value. example: DIHEDRAL 100.00 2 1 -40 40 1000.00 5 2 -60 20 1000.00 where: WEIDIH = 100.00. Two dihedral angle constraints are included: first: residue 2 , dihedral angle 1 (phi) is forced to adopt a value of -40 deg. and the allowed deviation is 40 ( allowed values are those within the interval [-80,0] ) second: residue 5, dihedral angle 2 (psi) is forced to adopt values within the interval [-80, -40]. Random Number Generators, ------------------------ The program uses two random number generators. The serial version uses the VRND program (Prof. Ken Wilson). The parallel version uses PRNG (Prof. Mal Kalos). PRNG (parallel random number generator) is freely available by anonymous ftp. It's really easy to install it on any 64-bit machine such as the SGI PC. CTC staff can get it without ftp'ing. cp /afs/theory/archive/ftp/pub/utilities/prng.tar.Z to wherever you want to build it. There are also two Makefiles in the eceppak PRNG directory, Makefile.DEC8400 and Makefile.IBMSP2. Any of these makefiles should be appropriately changed for the specific architecture where the user intend to install the program. If you are not CTC staff, here's how you can get the tar file: ftp ftp.tc.cornell.edu login in as user anonymous give email address as password cd pub/utilities get prng.tar.Z ----------------------------------------------------------- IF IMSL libraries are not available in your computer: Edit the file orient1.F and comment the lines: #ifdef AIX CALL DEVCSF (3,RTR,3,EIGVAL,T,3) IJUMP=1 #endif Also, removed "-limsl" from the "make" file, LIBS = -L/usr/local/lib -limsl should read: LIBS = -L/usr/local/lib Finally, recompile the program. Only the "GOLUB" option will work for calculations of rms deviations.