************************************************************************
*                                                                       *
*                           The ECEPP Package                           *
*                                                                       *
*************************************************************************

What the Package Does
---------------------

The program performs the following calculations:
  1) Single Energy Evaluation.
  2) Single Energy Minimization
  3) Energy evaluation of Multiple Input Conformations
  4) Energy Minimization of Multiple Input Conformations
  5) Monte Carlo Search using a generalized MCM (EDMC) algorithm.
  6) PRODUCE an energy map for a pair of dihedral angles.
  7) Carry out an rms deviations analysis.
  8) Variable Target Function Procedure for structure determination.

Getting Started and Compiling the Eceppak Package
-------------------------------------------------
See the file "README" in the main eceppak directory.

How To Run this program
-----------------------

 - The script to run the program is called: recepp.s. When you SOURCE the
   cshrc file, an ALIAS is set up to SELECT the script for the correct
   ARCHITECTURE. Files are stored in the proper subdirectory in eceppak/Scripts.

   To run the program you should give a set of arguments, the number of
   arguments depends on the architecture. You will get precise information
   about the arguments that should be used by typing,
                 recepp.s

   IMPORTANT: if "recepp.s" is not recognized, you need to source the cshrc file.
              if the command does not execute properly, then, check your cshrc
              file. It may have been set up incorrectly.  Look at previous
              point "To Start" to do this setup.

What's New
----------

* The old set of ECEPP Input files has been replaced by a more flexible
    file structure.
* The main input file contains now a series of cards that define the type
    of run and parameters.

* Residue Data file has been enhanced.
    This file contains the ECEPP/3 residues and other non-standard ones.
    There are 72 residues (including N-methyl residues), and new end groups
    defined.
    The file is found under eceppak/Data/Residue/rsdata.

    Among the changes introduced in rsdata are:
    (a) Data on loop closing pairs was added. The program uses a general
        treatment for these pairs (introduced by A.Liwo).
    (b) It includes N-methyl residues.
    (c) Hydration atom types were added in the description of atoms
        (old hrs.data).
    (d) Description of 1-4 interactions is included in a more general format.
    (e) C' was replaced by C, NP in PRO and HPRO was replaced by N to increase
        compatibility with PDB format.
    (f) Atom type of protons in COOH groups ( ASP, GLU, meASP, meGLU and
        Carboxyl-End terminal) changed to type 1, (as in ECEPP/3, no H-bonding
        allowed).

* Hydration parameters for different surface models are provided under
    the subdirectory eceppak/data/Hydration_files. The SRFOPT set (srfopt.set)
    of parameters is defined as the default. Other sets can be used by
    modifying the recepp.s script (eceppak/Script/$ARCHITECTURE/recepp.s).

Examples
--------
   The Input files provided as examples (directory eceppak/Test)
   will give you an idea of the calculations the program is able to do.
   There are several subdirectories here corresponding to the different
   type of runs eceppak can perform.


FILE(S)                       EXPLANATION
-------                       -----------

enk_sol.inp       Calculation of surface solvation energy.
                  To execute type:
                  "recepp.s ENERGY enk_sol ENK_sol dummy dummy"

enk_checkgrad.inp Checking Gradient calculation.
                  "recepp.s CHECKGRAD enk_checkgrad ENKGRAD  dummy dummy"

enk_sp.inp           Calculate energy using a soft-sphere potential.
                  "recepp.s ENERGY enk_sp ENKSP dummy dummy"

enk.inp           EDMC run.
                  "recepp.s EDMC enk enk_out dummy dummy"

mebmt.inp         Minimization (with output from minimizer).
                  "recepp.s MINIMIZE mpa1ot MPA1OT dummy dummy"

avian.inp         ECEPP/3 and solvation energy.
                  "recepp.s ENERGY avian AVIAN dummy dummy"

cala6.inp         Cyclic peptide and solvation energy.
                  "recepp.s ENERGY cala6 CALA6 dummy dummy"

hisp1.inp         EDMC run with two possible states for PRO (UP and DOWN).
                  and HIS (HID and HIE) residues.
                  "recepp.s EDMC hisp1 HISP1 dummy dummy"

cys1.inp          Input sequence with 1-letter code.
                  "recepp.s ENERGY cys1 CYS1 dummy dummy"

three_let.inp     Input sequence with 3-letters code.
                  "recepp.s ENERGY three_let THREE_LET dummy dummy"

CPEP.inp          Energy minimization of multiple input conformations.
outo.CPEP         set of conformations to be minimized.
                  "recepp.s MINIMIZE CPEP CPEPout CPEP dummy"

ala_map.inp       Energy map.

ala_rms1.inp      RMS deviation analysis; generation of a reference
                  conformation.
outo.ala_rms      Input conformations for comparison in ECEPP format.
ala_HELIX.pdb     Input for reference conformation generation in PDB format
                  To execute type:
                  "recepp.s RMS_FIT ala_rms1 ala_rms1 ala_rms ala_HELIX"
                  As a result you get, among others, a file xray.ala_HELIX
                  that could be save for future use.

ala_rms2.inp      RMS deviation analysis; comparison of a conformation
                  (file in pdb format) with the reference one.
ala135.pdb        Input conformation for comparison in PDB format (with end groups).
xray.ala_HELIX    Reference conformation for comparison (in ECEPP format).
                  To execute type:
                  "recepp.s RMS_FIT ala_rms2 ALA_RMS2 ala135 ala_HELIX"

timbck.inp        Calculate upper and lower bounds for distance constraints
tim.pdb           runs from a pdb file.
                  "recepp.s BOUNDS timbck TIMBCK tim"

vtf_tim.inp       Example of a run using the Variable Target Function procedure.
outo.vtf_tim      Usually constraints come from NMR experiments
bounds.timbck      " recepp.s VTF vtf_tim VTFOUT dummy  timbck"

tim_sp.inp        Example of a Monte Carlo run combining distance constraints
bounds.timbck     and a soft-sphere potential  (NMR refinement).

Output files for comparison with your results are provided in directory
test_output.
NOTE: We have noticed that large differences can occur between EDMC runs
in different architectures. This appears to be related to machine precision.
In general, a single energy calculation will tell you if the ECEPP/3 energy
function is working correctly. For EDMC runs, check if the program leads
to a sequence of improved energies.

                         *******************
                         *     TABLE 1     *
                         *******************
Conventions:
-----------
Residues can be specified using the ECEPP list number, a three-letter code or a
ONE letter code.

----------------------------------------------------------------------
                             ECEPP    ECEPP        3-letters 1-letter
    RESIDUE                 LIST No.   KIND           code     code
----------------------------------------------------------------------

ALANINE                      1        -1               ALA      A
ASPARTIC ACID                2        -2               ASP      D
CYSTINE                      3        -3               CYS      C_
GLUTAMIC ACID                4        -4               GLU      E
PHENYLALANINE                5        -5               PHE      F
GLYCINE                      6         6               GLY      G
HISTIDINE (HID)              7        -7               HIS      H
ISOLEUCINE                   8        -8               ILE      I
LYSINE                       9        -9               LYS      K
LEUCINE                     10       -10               LEU      L
METHIONINE                  11       -11               MET      M
ASPARAGINE                  12       -12               ASN      N
PROLINE-DOWN                13        13               PRO      P
GLUTAMINE                   14       -14               GLN      Q
ARGININE                    15       -15               ARG      R
SERINE                      16       -16               SER      S
THREONINE                   17       -17               THR      T
VALINE                      18       -18               VAL      V
TRYPTOPHAN                  19       -19               TRP      W
TYROSINE                    20       -20               TYR      Y
CYSTEINE                    21       -21               CYX      C
HYDROXYPRO-DOWN             22       -22               HPD      P<
NORLEUCINE                  23       -23               NOR      N<
ORNITHINE                   24       -24               ORN      O
HISTIDINE (HIE)             25       -26               HIE      H-
BENZYL-ASPARTATE            26       -30               BZD      B<
ORNITHINE +                 27       -25               OR+      O+
HISTIDINE+ (HIP)            28       -27               HI+      H+
LYSINE +                    29       -28               LY+      K+
ARGININE +                  30       -29               AR+      R+
ASPARTIC ACID -             31       -31               AS-      D-
GLUTAMIC ACID -             32       -32               GL-      E-
PROLINE-UP                  33        13               PRU      P%
AZETIDIN                    34        13               AZE      P*
HYDROXYPRO-UP               35       -22               HPU      P>
TYROSINE -                  36       -36               TY-      Y-
AMINOBUTYRIC ACI            37       -33               ABU      Z<
AMINOISOBUTYRIC             38       -38               AIB      Z>
SERINOLA                    39       -39               SLA      S<
allo-ISOLEUCINE             40       -40               AIL      I*
AMINOBUTYRIC LOO            41       -41               ASU      U<
SXRAYIN1                    42       -42               SXY      X
SLLXRAYIN                   43       -43               SLX      X*
GLUTAMIC LOOP               44       -44               GLP      E_
LYSINE LOOP                 45       -45               LYP      K_
DAB LOOP                    46       -46               DAB      B_
GLYCINE LOOP                47        47               GYP      G_
LEUCINE LOOP                48       -48               LEP      L_
ASPARTIC LOOP               49       -49               ASX      D_
M-DUMMY50(mGLY)             50       -50               M50      @50
MeALANINE                   51       -51               M-A      @A
MeASPARTIC ACID             52       -52               M-D      @D
MeCYSTINE                   53       -53               M-C      @C_
MeGLUTAMIC ACID             54       -54               M-E      @E
MePHENYLALANINE             55       -55               M-F      @F
SARCOSINE                   56       -56               SAR      @G
MeHISTIDINE                 57       -57               M-H      @H
MeISOLEUCINE                58       -58               M-I      @I
MeLYSINE                    59       -59               M-K      @K
MeLEUCINE                   60       -60               M-L      @L
MeMETHIONINE                61       -61               M-M      @M
MeASPARAGINE                62       -62               M-N      @N
MeDUMMY63                   63       -63               M63      @63
MeGLUTAMINE                 64       -64               M-Q      @Q
MeARGININE                  65       -65               M-R      @R
MeSERINE                    66       -66               M-S      @S
MeTHREONINE                 67       -67               M-T      @T
MeVALINE                    68       -68               M-V      @V
MeTRYPTOPHAN                69       -69               M-W      @W
MeTYROSINE                  70       -70               M-Y      @Y
Me-BMT                      71       -71               BMT      @Z
MeORNITHINE                 72       -72               MOR      @O
----------------------------------------------------------------------

                             ECEPP    ECEPP        3-letters 1-letter
END GROUPS                   LIST No.   KIND           code     code
----------------------------------------------------------------------

AMINO - H2                   1         1               H2N      H
AMINO - H3+                  2         2               H3N      H+
AMINO -CH3                   3         3               CH3      M
AMINO-COCH3                  4        -4               ACE      A
FORMYL                       5        -5               FYL      F
END-PRO,CIS-H                6        -6               CHP      P-
END-PRO,TRANS-H              7        -7               THP      P
END-H2+-PRO                  8        -8               AHP      P+
PYROGLUTAMIC                 9        -9               PGL      G
AMINO (CYCLIZING            10        10               HN-      H_
CARBOXYL - COOH             11       -11               CXH      O
CARBOXYL - O                12        12               OCC      O-
CARBOXYL-CH3                13        13               CCC      L
CARBOXYL-NH2                14       -14               NCC      N
CARBOXYL-NHCH3              15       -15               NME      C
N, N - DIMETHYL             16       -16               DME      D
METHYL ESTER                17       -17               MES      T
ETHYL ESTER                 18       -18               EES      E
AMINO-T-BOC                 19        -9               BOC      B
CARBOXYL(CYCLIZI            20        20               CXL      O_
MPA (HALF S-S)              21       -21               MPA      R_
DMP (HALF S-S)              22       -22               DMP      D_
CPP(AX) (HALF S-            23       -23               CPP      C_
CARBOXYL-CH2F               24        24               CHF      S
OCA(AX) (HALF S-            25       -25               OCA      A_
OCA(EQ) (HALF S-            26       -26               OCE      E_
SCA(AX) (HALF S-            27       -27               SCA      S_
SCA(EQ) (HALF S-            28       -28               SCE      T_
CPP(EQ) (HALF S-            29       -29               CPE      F_
DANSYL                      30       -30               DAN      W
CARBOXYL                    31        31               CXX      X
AMINO-CYNAMONIC             32       -32               CYN      Y

________________________________________________________________________
Note:
----
`@' is used to indicate N-methyl residues.
`_ 'is generally used to indicate a bridging residue (e.g. C_  indicates
CYSTINE).
`+' and `-' are used to indicate a charged residue (e.g. K+ indicates
charged lysine residue).

Description of the input file:
-----------------------------
The general input to the program is given through a file with
a set of instructions. The program uses a parser to read these instructions.
The parser reads and interpret the first 78 characters of a line. No
distinction is made between lower-case or upper-case letters.
The symbols # and ! are used to indicate the beginning of a comment.
When any of this symbols are encountered, the parser will ignore the
rest  of the line.
Instructions related to a given procedure are associated into
the so called  "Data Groups". A "Data Group" is identified by a main keyword
which contains the symbol '$' as the first character, i.e. $EDMC, $CNTRL.
Also the keyword $end or $END, should be present, indicating the end of the
Data Group.
Any word included between the main keyword and $end, is considered an
instruction.
This is an example of a Data Group

$CNTRL
runtyp=Energy
$end

The following list contains the Data Groups already defined in ECEPPAK:

$BOUNDS, $BOUND_DEF, $BRIDGE, $CNTRL, $DIST_CONST, $EDMC, $FFIELD,
$GEOM, $GRID, $MINIM, $REGIONS, $RMSFIT, $SCAN, $SELEC_PDB, $SEQ,
$SPEC, $ENERCALC, $VTF, $WINDOWS, $OVERLAP_GRP and $OMCIS.


Three of the Data Groups are considered essential and without them
the program will abort. They are: $CNTRL, $SEQ and $GEOM.

$CNTRL is used, mainly, to indicate the type of calculation the user
  wants to perform.

$SEQ  provides the sequence of the molecule under study.

$GEOM Contains the set of internal variables (dihedral angles) of the
      initial conformation.

Description of the Data Groups
------------------------------
$CNTRL
This Data group is used to define the type of calculation the user would like to
carry out. Also, there are a few instructions, common to different modules, that
are defined here. The data group is essential. The program will not proceed
if the data group is not found.

Keywords of this data group are:
   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

   RUNTYP      =                       Define the type of calculation.

                 ENERGY               -Compute energy.
                 CHECKGRAD            -Check analytical gradient vs. numerical.
                 MINIMIZE             -Carry out energy minimization
                 EDMC                 -Carry out EDMC/MCM monte Carlo search.
                 RMS_FIT              -Compute rms deviations and fitting.
                 BOUNDS               -Computes upper and lower bounds from
                                        a reference conformation and generate
                                        a constraint file for future use.
                 VTF                  -Carry out a variable Target Function
                                         study.

   VERBOSE                             Print all information available.

   CHISCAN                             Carry out a systematic search with energy
                                       minimization for low conformations of
                                       side chains dihedral angles. Specification
                                       of the keyword RUNTYP = MINIMIZE is
                                       required. The set of dihedral angles to
                                       be scanned should be specified using the
                                       data group $SCAN. Also,  NSTEP should be
                                       specified.

           NSTEP = number              Number of step for the side chain search using
                                       the CHISCAN option.
                                       i.e. if nstep=6 the angles will be search
                                       in increments of 60 degrees.

   PRINT_CART                          To request printing of Cartesian coord.

       OUTFORMAT   =                    Format required for the output file
                                        containing the Cartesian coordinates.
                     ECEPP              ECEPP format.
                     PDB                PDB format.
                     AMBER              AMBER (history) format.
                     CNDO               CNDO format.
                     CA_PDB             PDB (with CA  only)format.
                     SEL_PDB            PDB (for selected atoms only) format.
                                        This atoms should be specified within
                                        the  $SELEC_PDB data group.

       FILE        =  name_of_file      Filename of the output Cartesian file.
                                        In case of multiple conformations. A
                                        sequence of files will be written
                                        as name_of_fileNNN.*, where NNN is an
                                        integer from 000 to 999.

       NO_HYDRG_IN_PDB                  Omit printing H atoms in PDB files

   NRES        = number                 number of residue on the specified
                                        molecule. It is not essential. The
                                        program will compute this value from
                                        the sequence (see $SEQ data group).

   RES_CODE    =                        Specifies the input format of the sequence.
                 ECEPP                  ECEPP numbers are used. Default.
                 THREE_LETTER           Sequence specified using a three-letter code.
                 ONE_LETTER             Sequence specified using a one-letter code.

   VAR_ANGLES  =                        Used to define the set of variables
                 ALL                    All dihedral angles are variable. Default.
                 BACK                   Variable are the backbone dihedral angles.
                 SIDE                   Variable are the side chain dihedral angles.
                 SPEC                   Variable dihedral angles specified through
                                        $SPEC data group.
                 NONE                   ALL dihedral angles are fixed.
                 PHPS                   Only PHI and PSI Backbone dihedral angles.
                 BKSD                   Backbone dihedral angles.

   VAR_RES     =  number                Used to define as variables a group of
                                        dihedral angles from specific residues.
                                        VAR_RES represents the number of residues that
                                        contain variable dihedral angles.
                                        The information of the specific residues
                                        (sequence position) is entered through
                                        the $SPEC data group.
                                        The set of dihedral angles to be varied is
                                        defined by selecting a proper value of VAR_ANGLES.
                                        NOTE: Since the keyword VAR_RES works in combination
                                        with VAR_ANGLES, VAR_ANGLES cannot be set to SPEC.

   TIME        =  number                Estimated CPU time of the run. Program
                                        will end when this time limit is reached.
                                        Default is  10.0**10 sec.

   EMINIMA     =  number                Use to avoid printing of high energy
                                        conformations during multiple evaluation
                                        of energies or minimizations.
                                        Works in conjunction with keywords $ENERCALC
                                        or $VTF.

NOTE: The usage of the following keywords in $CNTRL data group is kept for consistency
with previous version but is not recommended. They were incorporated into other data
groups.

   SURFACE_OUT                          Print exposed surface for atoms.
                                        The keyword SOLVATION= SURFACE must be
                                        specified in datagroup $FFIELD

   MULT_CONF    =                       This flag is used to indicate the energy
                                        evaluation or minimization multiple
                                        conformations.
                  READ                  conformations are read from file (outo.*).
                                        The name of the input file is passed to
                                        the program through the recepp.s script
                                        as the 4th argument.
                  RANDOM                Generate conformations from random sets
                                        of dihedral angles. In this case, MAXIT
                                        and SEED must be specified.
                                        NOTE: The options of this keyword are
                                        equivalent to keywords READ_CONF and
                                        RAND_START in $ENERCALC and $VTF data groups.

          MAXIT       = number          Maximum no. of randomly generated conformations.
                                        Used with MULT_CONF=RANDOM and MAXIT.

          SEED       =  number          Seed for random number generator. Used with
                                        MULT_CONF=RANDOM and MAXIT.

   REFERENCE                            Used to stop EDMC when the ZIMMERMAN
                                        Code of an accepted conformation
                                        matches the one corresponding to the
                                        conformation provided as reference.
                                        now can be specified in $EDMC.
                                        If used with during energy evaluation (or
                                        minimization) or VTF, it will print the
                                        Zimmerman Code of the conformations. This
                                        option is also available (recommended use)
                                        in data groups $ENERCALC or $VTF using
                                        the keyword ZIMMERMAN_CODE.


$BOUND_DEF
This data group works in combination with runtyp= BOUNDS  (see $CNTRL keyword)
and the data group $BOUNDS.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     TYPE_INPUT =
                  PDB_NO_ENDG        Default. input file is PDB with
                                     no end groups.
                  PDB_WITH_ENDG      input file is PDB with end groups.

     DELT_R     =                   Upper and Lower bounds can be obtained by:
                  PERCENTAGE        A- adding and subtracting a percentage
                                     (PERCENT) of the actual distance (R)
                                     to the computed value of R,  i.e upper
                                     bound= R+ (PERCENT/100)*R. Default.
                  FIXED             B- adding and subtracting a fixed value
                                     (FIXVAL) to the actual distances.


     FIXVAL = number                See explanation for DELT_R.

     PERCENT = number               See explanation for DELT_R.

     WEIGHT = number                Weight associated to the constraints.

     IGNORE_H                       Don't stop if H cannot be identified.

     MAXDIST = number               Is used to reduce the number of constraints.
                                    Only specified atoms separated by distances
                                    smaller than MAXDIST will be used.
                                     (default is 100000.0).

     MINDIST = number               Is used to reduce the number of constraints.
                                    Only specified atoms separated by distances
                                    greater than MINDIST will be used
                                     (default is 0.0).

     FIRST_RESIDUE = number          This keyword allows the use of a portion
                                     of a PDB file to be read and use for
                                     generation of distance constraints.
                                     FIRST_RESIDUE should correspond to the
                                     PDB number of the first residue in the
                                     sequence. Note: sequence must be specified
                                     sequentially and no residues should be
                                     missing.

    RESIDUE_GAP = number             Distance for residues separated in sequence by
                                     RESIDUE_GAP  or more residues will be computed
                                     (default is 0).

$BOUNDS
This data group works in combination with runtyp= BOUNDS  (see $CNTRL keyword)
and the data group $BOUNDS_DEF. The group does not have specific keywords. It
is used to enter the names of atoms for with distance constraints are requested
and the weight assigned to the constraint.
example:  Computed Bounds between CA atoms and give them a weight of 10.0

CA CA  10.0


$BRIDGE
This data group is used to define the linkage between bridging residues.
The data group requires the specification of pairs of numbers corresponding to
the position in sequence of the bridging residues. The program recognizes
residues that forms bridges. Consequently, there is no need to specify the
number of them.

$DIST_CONST
This data group is used to define the  set of distance constraints.
It works in combination with one of the following keywords:
   a- RUNTYP= VTF in CNTRL data group, or
   b- CONSTR_MOV in $EDMC or $FFIELD data groups.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     N1PAIR        = number      - Number of bounds read using atom number
                                   as identification.  A tedious procedure
                                   but needed from time to time.

     N2PAIR        = number      - Number of bounds read using specific
                                  alpha-numeric characters for the atoms
                                  and corresponding residue.

     RESN1_IS_ONE                  This flag is used to introduce distance constraints
                                   associated to a sequence without end-groups, i. e.
                                   the first full residue is numbered as 1 (usual case
                                   of constraints obtained from a typical PDB file).
                                   ECEPP ALWAYS assumes that the chain has end groups.
                                   Consequently sequence numbering is usually shifted
                                   by one (+1) from the PDB sequencing.
                                   The flag should be omitted (default) if the residue
                                   numbers in the distance-constraint file are the same
                                   as in ECEPP.  (The sequence number is used to identify
                                   the atoms in subroutine CLASS).

     DIST_WEIGHT    = number      - A constant with units of kcal/mol/A that converts
                                   the "Sum of Squares of Errors" into energy. (WEI)

     ADAPT_WEI                    - This and the following keywords are used by EDMC
                                    method.  (experimental) ADAPT_WEI is used to
                                    indicate that the weight assigned to the distance
                                    energy term, EDIS, should be adapted during the
                                    course of a conformational search.
                                    The goal is to control the value of the distance
                                    energy term during a simulation. This keyword
                                    should be specified in combination with:
                                    (a) PERCENT_WEI;  or
                                    (b) PERCENT_WEI,  DELTA_PERC_WEI, MAX_WEI and
                                    MIN_WEI.


     PERCENT_WEI    = real_number - Defines the 'expected' ratio between EDIS and the
                                    sum of the remaining energy terms.
                                    If the DELTA_PERC_WEI is omitted, the algorithm
                                    will try to keep this ratio approximately constant
                                    during the run.

     DELTA_PERC_WEI = number        This flag is used to modulate the effect of the
                                    distance constraint energy term on the search.
                                    Works in the following manner:
                                    DELTA_PERC_WEI/MAXIT will be added or subtracted
                                    from the initial PERCENT_WEI during the course of
                                    the run. In this way the algorithm tries to enforce
                                    the distance constraints (when DELTA_PERC_WEI is
                                    positive) while it proceeds toward lower energies.
                                    The search will be directed toward constraints
                                    satisfaction.
                                    If DELTA_PERC_WEI is negative, on the other hand,
                                    the constraints will be less important as the run
                                    evolves and the search will be guided by the
                                    ECEPP/3 energy terms.

     MAX_WEI         = number       Maximum allowed value for DIST_WEIGHT; Works in
                                    conjunction with PERCENT_WEI

     MIN_WEI         = number       Minimum allowed value for DIST_WEIGHT; Works in
                                    conjunction with PERCENT_WEI

     SOFT_SWITCH    = number        Use a linear distance constraint function when
                                    the actual distance, d, is greater than the
                                    upper bound plus the specified number.
                                    From Feng Ni (BRI, Montreal).


      SOFT_SLOPE     = number       Value of the slope on linear function
                                    From Feng Ni (BRI, Montreal).

     NUMBER_OF_GROUPS = number      Indicate the number of groups (set of protons)
                                    with overlapping resonances. This value, when
                                    specified, should be greater than one (1).
                                    From Feng Ni (BRI, Montreal).

$EDMC
This data group works in combination with runtyp= edmc  (see $CNTRL keyword).
This data group is used to define  parameters and different alternatives for
the Monte Carlo search.
The EDMC method is a procedure for searching the conformational space a
polypeptide. It is based on a  Monte Carlo approach that combines minimization
of the potential energy and a predictive algorithm that attempts to produce
suitable rotations that lead to better energies.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     MCM                         - Carries out a Monte Carlo with energy
                                   Minimization search rather than the search
                                   available through the EDMC method.
                                   It is a special case of EDMC, in which all
                                   the perturbations are produced randomly.

     MOTION    =
                 CRANKSHAFT      - (the default) - backbone dihedral angles
                                   are associated in  rotatable pairs.
                                   [ psi(i-1), phi(i)], (where i is the
                                   residue in the i-th position on the sequence)
                                   When a member of a given pairs is selected
                                   for a change, say a  rotation 'delta', then,
                                   an opposite rotation, '-delta', is added
                                   to the the second dihedral angle. This type
                                   of movement tend to preserve the global
                                   conformation of a folded polypeptide while
                                   changing the local conformation.

                 PIELA             Varies one backbone angle at a time (makes
                                   large changes)

                 LAMBDA            Varies the angles of rotation of peptide
                                   groups about virtual bond (CA-CA) axes.
                                   Doesn't change much backbone shape, but
                                   rather optimizes the orientation of peptide
                                   groups.

     CONSTR_MOV                    Indicates that distance constraints should
                                   be used. See $DIST_CONST keyword to find
                                   out how to introduce  distance constraints.

     BACKUP     = number           Time interval in seconds in which restart
                                   information is punched.  (default 3600 s)

     RESTART                       Flag to indicate that the program should
                                   continue a previous search. The program will
                                   look automatically for a backup file.

     MAXIT     = number            Maximum number of steps (accepted
                                   conformations) in MCM/EDMC

     RAND_START                    Start from a randomly-generated conformation.
                                   This key works requires definition of SEED.

     OMEGA_180                     Works with RAND_START. Keep the omega's at 180.

     RAND_TO_ELEC     = number      Pre-defined ratio of random to electrostatic
                                    sampling; default 0.1. RAND_TO_ELEC=1.0 is
                                    equivalent to the flag MCM.

     MAX_REPM     = number        - Maximum number of repetitions of a
                                    conformation.

     MAX_RAND     = number        - Maximum number of random-prediction trials.

     MAX_EL       = number        - Maximum number of electrostatic-prediction
                                    trials within an iteration.

     MAX_THERMAL     = number     - Maximum number of thermal movements.

     EFINAL     = number          - Target Energy. This represents a way to
                                    stop the search when EFINAL is reached.
                                    default is a very large negative number.

     TEMP     = number            - Temperature used during normal stages
                                    of the search.
                                    The default is doing simulations at
                                    a constant temperature. However, there
                                    are two other alternatives:
                                    'Thermal_shock' and 'adapt_temp'.

     THERMAL_SHOCK                - Thermal shock Monte Carlo scheme. The
                                    system is suddenly "heated". Keywords that
                                    need to be specified are:

          T_LOW     = number      - lower bound of temperature.

          T_UP     = number       - upper bound of temperature.

          NTEMP     = number      - Number of steps in which the system is
                                    heated from T_LOW  to T_UP.

     ADAPT_TEMP                   - Adaptive temperature scheme.
                                    If NHEAT=NCOOL=1, we have THERMAL_SHOCK.

        NHEAT     = number        - Number of heating steps.

        NCOOL     = number        - Number of cooling steps.

        T_LOW     = number        - lower bound of temperature.

        T_UP     = number         - upper bound of temperature.

     NPRINT_ELEC     = number     - printing of electrostatic diagnosis
                                    every NPRINT_ELEC accepted conformations.

     OMPROB     = number          - The priori probability that a cis peptide
                                    bond is being tried to be converted to a
                                    trans bond. The default is 5000 which means
                                    that the program will first attempt at
                                    making all the peptide bonds trans.

     HISP_CHANGE = number         - The probability that in a given iteration
                                    the program attempts at changing the
                                    conformations of HIS and PRO in the sequence
                                    from PRO-UP to PRO-DOWN, (or vice versa), or
                                    from HIE to HID, or vice versa (default ??).

     CONST_SEQ                    - The program will not change the protonation
                                    form of histidine and the internal geometry
                                    of proline.

     TYPE_BKTK       =            - Defines the set of dihedral angles altered
                                    during backtracking (during heating of the
                                    system).

                       BACK       - Only backbone dihedral angles can be moved.

                       ALL        - All dihedral angles can be moved.

     MAX_VAR_BKTK = number        - Maximum number of variables that can be
                                    changed simultaneously during backtrack.

     REGION_SAMP     =            - Use the set of sampling regions specified
                                    for specific amino acid.

                       UNIFORM    - Use uniform sampling through specified
                                    regions

                       NONUNIFORM - Sample through specified regions using
                                    provided weights.

     SEED     = number            - Initialization of the random-number
                                    generator. Any negative number

     PRINT_SAMPLED                - Print "extra" information from sampling.

     NWIND     = number           - Number of "windows" containing the
                                    specifications of the "bombing ranges", i.e.
                                    the ranges of the residues whose angles
                                    will be targeted by random/electrostatic
                                    sampling procedure. The angles of the other
                                    residues will only change during minimiza-
                                    tions; no changes will be made in them
                                    during sampling. This option is useful, if
                                    you made a point mutation in a large
                                    protein and want to establish quickly the
                                    effect of this mutation on conformation.
                                    In such a case it is good to "bomb" only
                                    the mutated residue, instead of wasting
                                    "munitions" on the whole protein. Default
                                    is to "bomb" the whole molecule.

     MAX_BCKB_REP = number          The maximum number of times that the same
                                    backbone conformation can be accepted. When
                                    this limit is attained, the new generated
                                    conformations having the same Zimmerman code
                                    will be rejected, unless is an improvement on
                                    the current global minimum. Default value is 20.

     PROMET                         The omegas of Pro and N-Met residues will be searched
                                    with similar probabilities as for PHIs and PSIs.


     NPRINT_CONSTR   = number     - printing of information about distance constraints
                                    every NPRINT_CONSTR accepted conformations.

   REFERENCE                            Used to stop EDMC when the Zimmerman
                                        code of an accepted conformation
                                        matches the one corresponding to the
                                        conformation provided as reference (initial
                                        conformation in file *.inp).

$ENERCALC
  This data group is used to request energy evaluation or energy minimization
  of a (or many) conformation(s).

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     SINGLE_CONF                  - Carry out the procedure using as input the
                                    conformation provided in data group $GEOM

     READ_CONF                    - Carry out the procedure starting
                                    from the set of conformations provided
                                    in a separate input file (outo format).

     RAND_START                   - Carry out the procedure starting
                                    from the set of randomly-generated
                                    conformations.

     OMEGA_180                     Works with RAND_START. Keep the omega's at 180.

     MAXIT       = number         - Maximum no. of randomly generated conformations.

     SEED   =  number             - Seed for the random number generator.

     REGION_SAMP     =            - Use the set of sampling regions specified
                                    for specific amino acid.

                       UNIFORM    - Use uniform sampling through specified
                                    regions

                       NONUNIFORM - Sample through specified regions using
                                    provided weights.

     BACKUP      =   number       - This keywords should allow to stop the
                                    procedure nicely.  Not implemented, yet.

     RESTART                      - This keywords should allow to restart the
                                    procedure.   Not implemented, yet.

     NO_MINIMIZATION              - Use to check energy terms related to the
                                    distance constraints . No VTF minimization
                                    is being carried out.

     CONSTR_MOV                   - This keyword is used to indicate that distance
                                    constraints are  used in the calculation. The key
                                    can be included, optionally, in the $FFIELD data
                                    group.

     ZIMMERMAN_CODE               - This option is used to print the Zimmerman Code
                                     of the conformation(s).

$FFIELD
 - Specific information about the force field used.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

      FORCE_FIELD    =

                        ECEPP              - ECEPP/3 force field (the default).
                        SIMPLE_POTENTIAL   -  Max Vasquez's quartic potential
                                              for VDW distances.
                        AMBER              - Not implemented yet.
                        DISCOVER           - Not implemented yet.
                        CHARMM             - Not implemented yet.

      SOLVATION      =                     - Compute solvation energy.
                                            (the default is NO solvation)

                        SURFACE            -use surface-solvation models
                                            developed by J. Vila and R. Williams.

                        VOLUME             -use volume-solvation model developed by
                                            Joe Augspurger (S_PAR_FILE=volume.set
                                            must be specified).

                        ELECTROSTATIC      - Not implemented yet. Is intended
                                             to compute electrostatic solvation
                                             using  the DELPHI program
                                             (B. Honig, Columbia Univ.).

                        ALL                - SURFACE + ELECTROSTATIC

          SURFACE_OUT                      - Print exposed surface for atoms.
                                             The keyword SOLVATION= SURFACE must be
                                             specified.

      NO_SOLV_MIN                          - Used with SOLVATION to indicate
                                             that solvation energy should
                                             be added to the total energy after
                                             energy-minimization of a
                                             conformation, but not used during
                                             the energy minimization process.

      RAD_FILE       = character_variable  - Input file with radia parameters
                                             for different solvation types.

      S_PAR_FILE     = character_variable  - input file with solvation parameters
                                             for different solvation  types.
                                             SURFACE-HYDRATION FILES:srfopt.set (default),
                                             jrf.set,oons.set,solprmNW.nmr,optsl27.rall.
                                             VOLUME-HYDRATION FILE:volume.set.

      OM_TRANS                             - Impose a special one-fold potential
                                             on all omega angles to keep them
                                             trans; this goes with the keyword
                                             FORC.

            FORC           = number        - The torsional constant;
                                             the default value is 100

      NO_TORSIONALS                        - Omit torsional terms of the
                                             potential function.

      THERMO

      TSTART         = number

      TEND           = number

      NSTEP          = number

      CONTACT_ENE    = number              - Defines the contact energy when
                                             using the simplified potential.
                                             Used with FORCE_FIELD=SIMPLE_POTENTIAL

      PH             = number              - pH  value. Not used in the present version.

      RES_DBASE      = character_variable  - Used in some architectures (SUN) to define
                                            the residue data file, or to select a different
                                            file than the default ``rsdata".
                                            Note: In general, the residue data file is
                                            specified in the script file recepp.s.

      CUTOFF         =                     - Used to define cutoff in the energy terms.

                         NONE                default.

                         BLOCK               Used when a set of dihedral angles are kept
                                             fixed during the computations. In that case,
                                             the CUTOFF keyword can be used to omit
                                             the calculations of 1-4 and 1-5 interactions
                                             that don't vary during energy minimization.

                         DISTANCE_CA         Not implemented, yet.

      OVER_CUTOFF    = number              - Used to pre-minimize a conformation
                                             using a simple potential function until
                                             every single term of the energy is lower
                                             than the value specified by "number".

      NON_OVERLAP_ENER                       logical flag to requested  printing of the
                                             energy of a conformation after relief of
                                             atomic overlaps when the conformation is
                                             subjected to energy minimization using the
                                             simple-potential function.
                                             Should be specified with OVER_CUTOFF or
                                             FORCE_FIELD=SIMPLE_POTENTIAL

      VARDIEL                              - Use a distance-dependent dielectric
                                             constant. Implementation of Feng
                                             Ni (BRI, Montreal).

NOTE: The usage of the following keywords in $FFIELD data group is kept for consistency
with previous version but is not recommended. They were incorporated into other data
groups.

      CONSTR_MOV                           - This keyword is used to indicate
                                             that distance constraints are
                                             used in the calculation. The
                                             key can be included, optionally,
                                             in the $EDMC data group.

$GEOM
This is  another essential data group used to define the initial conformation
of the molecule. The program will not proceed if the data group is not found.
The data group should contain  the LIST OF DIHEDRAL ANGLES IN A FORMATTED INPUT
(15f8.3).  One line per residue (or end group) is necessary or the program will
terminated with error. Blank lines are permitted. In this case, all dihedral
angles will be set to zero, except when random generation of the starting
conformation is requested.

$GRID
These keyword  can work in combination with RUNTYP=ENERGY or
RUNTYP= MINIMIZE. Generates an energy grid ( a two-dimensional energy map).
if RUNTYP= MINIMIZE is specified, the program will carry out the following
procedure:
1. The dihedral angles you define for the Phi-Psi map are kept fixed
   during minimization.
2.- The program minimize the energy using the remaining variables dihedral
angles.

The program scans two dihedral angles (ANG1 and ANG2) starting from
the values specified in FROM1 and FROM2, respectively. There are two
alternative possibilities for specifying the scanning:
a- To give the final values of the dihedral angles (TO1 and TO2) and the
   number of steps (N1 and N2).
b- to give the step size (STEP1 and STEP2) and the number of steps (N1 and N2).

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

      ANG1      = character_variable       - Name used to describe the first
                                             dihedral angle.Characters allowed
                                             are: PHI(n),PSI(n),OME(n),CHI(n),
                                             TAU(n) where n is the residue number.

      ANG2      = character_variable       - Name of the second dihedral angle.

      FROM1      = number                  - Initial value of first dihedral
                                              angle.

      FROM2      = number                  - Initial value of second dihedral
                                              angle.

      TO1        = number                  - Final value of first dihedral
                                              angle.

      TO2        = number                  - Final value of second dihedral
                                              angle.

      STEP1      = number                  - step size of first dihedral angle.

      STEP2      = number                  - step size of second dihedral angle.

      N1         = number                  - Number of steps for first dihedral
                                              angle.

      N2         = number                  - Number of steps for second
                                              dihedral angle.

      OMSCAN-OK                            - Used to confirm scanning over an
                                             omega dihedral angle.

$MINIM
This keyword is used to modify a few  parameters in the  minimization program
of Gay (SUMSL, SMSNO).

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     MINIMIZER    =
                    SUMSL                  - Use the unconstrained minimization
                                             solver with analytical gradient.

                    SMSNO                  - Use the unconstrained minimization
                                             solver with numerical gradient.

     MAXFUN       = number                 - Maximum number of function
                                              evaluations allowed.

     MAXIT        = number                 - Maximum number of iterations
                                             allowed.

     MAXSTEP      = number                 - Maximum value for V(RADFAC).

     VTNER1       = number                 - Helps decide when to check for
                                             FALSE convergence [V(26)].

     ABSTOL       = number                 - The absolute function convergence
                                             tolerance [V(31)].

     RELTOL       = number                 - The relative function convergence
                                             tolerance [V(32)]

     DSCALE       =                        -  Not implemented, yet.

                    NONE

                    FIXED

                    VARIABLE

     DVALUE       = number                  -  Initialization value  of the
                                               scale vector D.

     FULL_PRINT                             -  Controls SUMSL printing.

     PRINT_RES_XG                           -  Prints out values of X's,
                                               gradient and D's on  return.

     PRINT_STAT                             -  Prints out summary of statistics.

     PRINT_INITIAL_X                        -  Print initial X's and D's.

$OMCIS
This datagroup is used to defined the residues for which the reference
conformation for the peptide bond is cis.
Format:
NRES     res1   res2  ...  resk
where NRES is the number of residues for which the cis conformation of the
peptide bond is taken as the reference; and res1, res2,....resk are the numbers
representing the position  of the residue in the sequence.

$OVERLAP_GRP
This data group is used in a  version under development.
It is used within the VTF procedure to defined a sets of atoms with
overlapping resonances.
The format is as follows:
IGP GLB NG IR1 G1  IR2 G2  IR3 G3 .........IGn Gn
  1 HX1  1  17 HN
  2 HX2  5   7 HM0   6 HM1   6 HM2  15 HM1  15 HM2
  3 HX3  2   6 HM0  15 HM0

$REGIONS
This data group is used to define the sampling regions for amino acids in
a Monte Carlo type of search.
The sampling can be UNIFORM or NONUNIFORM (this are keywords defined in
data groups $VTF  and $EDMC with the keyword REGION_SAMP).
If the sample is UNIFORM the input is specified using the following format:
residue_no.   region1 region2 .....   regionM
If the sample is NON-uniform the input is specified as:
residue_no.   region1 weight1  region2  weight2 .....   regionM weightM

where:
- residue no. belongs to  { 2, inumrs}
- regionI is one of the 16 regions of the PHI-PSI map using the Zimmerman's
  code A,A*,... H* , or any the four POPOV's regions. H-, H+ (HELIX) and
  S-, S+ (SHEET).
- weightI is an integer indicating the weight used to generate the sampling
  probability for the associated region.

Example of UNIFORM sampling:
3   A  A* C  C*
Example of NONUNIFORM sampling:
3   A 40  A* 10  C 30  C* 10

A continuation line should be indicated with the symbol '\'

$RMSFIT
This data group is used for comparison of one or multiple conformations with
a reference one. It works in combination with the keyword RUNTYP= RMS_FIT.
This module calculates atomic rms deviations, rms distance deviations,
radia of gyration, and is able to produce fitting of conformations.
The program reads different types of reference and input files. By default,
it tries to read the reference conformation from a file named xray.NAME_REF
(where NAME_REF is a name provided by the user. NAME_REF is passed through an
argument of the script that runs the program).
When this file does not exist, the keyword  GENERATE_REF should be used to
generate it. As a default for generation of the reference conformation, the
set of dihedral angles provided as input in the $GEOM data group is used.
If a conformation given in PDB format is going to be used as reference,
the user should used the keyword TYPE_REF with the appropriate argument to
indicate this.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     GENERATE_REF                      - Used to indicate the generation of a
                                         reference  (or target) conformation
                                         in ECEPP format.

     TYPE_REF        =                 - Indicates the type of input format of
                                         the file containing the reference (or
                                         target) conformation:

                       ECEPP             ECEPP dihedral angles provided with the
                                         $GEOM data group in *.inp file. Default.

                       PDB_NO_ENDG       Typical PDB where residue No.1 is
                                         the first full residue. No end groups.

                       PDB_WITH_ENDG     Other files written in PDB format
                                         where first and last residues are
                                         end groups.

     TYPE_INPUT      =                 - Used to indicate the type of format
                                         of the file to be used as input
                                         (conformation(s) under study).
                                         Acceptable input formats are:

                       ECEPP             ECEPP dihedral angles using `outo'
                                         format.

                       PDB_NO_ENDG       Classical PDB where residue No.1 is
                                         the first full residue. No end groups.

                       PDB_WITH_ENDG     Other files written in PDB format
                                         where first and last residues are
                                         end groups.

     IGNORE_H                          - Used to indicate the program not to
                                         worry about mismatches in the H atom
                                         names when reading PDB files.

     INIT_RES        = number          - Initial residue used on calculation.
     IFIN_RES        = number          - Final residue used on calculation.

     ALL_HVY_ATOMS                     - Calculate rms of all heavy atoms of
                                         the specified residues.

     ALPHA_CARBONS                     - Calculate rms of CA atoms of the
                                         the specified residues.

     BACKBONE                          - Calculate rms of backbone atoms
                                         (including CB) of the the specified
                                         residues.

     SIDE_CHAIN                        - Calculate rms of side-chain heavy atoms
                                         of the the specified residues.

     DISTANCE_RMS                      - Produces an additional report of the
                                         distance rms deviations for the input
                                         conformation(s) with respect to the
                                         reference conformation.

     CA_TRACE                          - Works in conjunction with the keyword
                                         ALPHA_CARBONS. It is a flag to request
                                         the generation of a series of aligned
                                         pdb file with  the CA traces.

     PDB_ALIGN_HVY                     - Write a pdb  file using alignment of
                                         all the heavy atoms.

     PDB_ALIGN_CA                      - Write a pdb  file using alignment of
                                         the  CA atoms.

     PDB_ALIGN_BACK                    - Write a pdb  file using alignment of
                                         the backbone atoms.

     PDB_ALIGN_SIDE                    - Write a pdb  file using alignment of
                                         the side-chain heavy atoms.

     METHOD          =                 - Defines the type of algorithm used
                                         to calculate RMS:

                       GOLUB           - Golub method. The default.

                       KABSCH          - Kabsch method. This requires some
                                         IMSL routines.

     FIRST_RESIDUE   = number          - This keyword is used to indicate that the
                                         first residue of the PDB reference file
                                         is numbered as `number' instead of 1.

     ADOPT_REF_SEQ                     - This keyword is used to indicate the
                                         program to adopt the sequence of the reference
                                         conformation when the read conformations
                                         have a different (or incompatible) sequence.

$SCAN
Scan carries out a systematic search of a set of specified dihedral
angles. Angles should be specified in the following way (free format).

 residue_no.  no_of_dih_angles  no_first_dieh ... no_last_dieh

$SELEC_PDB
This  data group works in combination with the keywords
PRINT_CART  OUTFORMAT= SEL_PDB   (in $CNTRL data group)
The data group is used to define the set of atoms included in the
output pdb file. A free format is used to enter atom numbers (integer).

$SEQ
This the last ESSENTIAL data group. It is used to define the sequence
of the molecule.  There are three different ways in which the sequence
is defined:
(a) through ECEPP residue numbers (LIST); (b) Using a three-letter code;
or (c) using a one-letter code.
The keyword  RES_CODE (in $CNTRL data group) is used to specify the options
described previously. If this keyword is omitted, the program will attempt
to read the sequence as ECEPP residue numbers.

Rules:
-----
(a) ECEPP residue numbers are read using free format (default). Numbers
    are integers defined as ECEPP LIST numbers.  Check column 2 of Table I
    for correct assignment. A blank space is required between numbers.

(b) Three-letter code. These are characters variables defined in column 3
    of Table I. A blank space is required between words.

(c) One-letter code. These are characters variables defined in column 4
    of Table I. No blank space is required between descriptors (letters,
    usually).

$SPEC
This data group is used to specify the set of variables dihedral angles.
This card usage depends on the values of the keywords VAR_ANGLES and
VAR_RES ($CNTRL data group).
(a) When  VAR_ANGLES  = SPEC is specified in the $CNTRL data group,
   1- The VAR_RES should not be present in the $CNTRL data group.
   2- The $SPEC data group is obligatory, and it must contain the following
      specifications (in free format and one line per residue):

        res_num    num_var  num_1st_var ...  ... num_last_var

      where:
       res_num is the sequence number of the residue containing variables
         dihedral angles;
       num_var is the number of variables dihedral angles in the residue;
       num_1st_var, ..., num_last_var is a list of numbers (integers) that
         point to the specific variables dihedral angles in the residue.
         The list must contain `num_var' integers.

(a) When VAR_RES = number_of_residues  (number_of_residues is an integer)
    is specified in the $CNTRL data group,
    1- VAR_ANGLES can be given ANY value (all, back, bksd, etc.)  with the
       EXCEPTION of `SPEC'.
    2- The $SPEC data group is required, and it must contain the sequence numbers
       of the residue for which the set of  dihedral angles will be defined
       as the variables. The residues should be given as a list of integers in
       free format.

(c) If  VAR_RES and VAR_ANGLES are both omitted in the $CNTRL data group,
    or VAR_RES is omitted and VAR_ANGLES is set to a value different from
    SPEC, then, the SPEC data group is not required.

$VTF
   This data group is used to define the parameters for a Variable Target Function
   (VTF) calculation ( see as a reference Va'squez and Scheraga. J. Biomol. Struct. &
   Dyn.  Vol 5(4) 757-784 (1988)). It works in combination with the keyword
   RUNTYP = VTF (data group $CNTRL) and the data group $DIST_CONST.

The specific keywords of this data group are:

   KEYWORD        ARGUMENT                DESCRIPTION
   ------         -------                 -----------

     SINGLE_CONF                  - Carry out the procedure using as input the
                                    conformation provided in data group $GEOM

     READ_CONF                    - Carry out the procedure starting
                                    from the set of conformations provided
                                    in a separate input file (outo format).

     RAND_START                   - Carry out the procedure starting
                                    from the set of randomly-generated
                                    conformations.

     OMEGA_180                     Works with RAND_START. Keep the omega's at 180.

     CONST_SEQ                    - The program will not change the protonation
                                    form of histidine and the internal geometry
                                    of proline.

     MAXIT       = number         - Maximum no. of randomly generated conformations.

     SEED   =  number             - Seed for the random number generator.

     RANK_ORDER     = number      - Determines the way the distances
                                    are order for the minimization
                                    steps during an iteration (IORDER).
                                    There are three possibilities:
                                    RANK_ORDER= 0,  Order by range,
                                      `a la Braun-Go', i.e.
                                    rank one ==> distance between nearest-neighbor
                                    residues;
                                    rank two ==> distance between second nearest-neighbor
                                    residues; etc.;
                                    RANK_ORDER= 1, Keep same order as in
                                      input distances;
                                    RANK_ORDER= 2, Order by growing from
                                      N-terminus.

     MAX_RANK                     - Parameter used to control the VTF
                                    procedure. Usually, the procedure is
                                    carried out by starting from random
                                    conformations and introducing the
                                    distance constraints up to MAX_RANK
                                    (typically 10). From this run, a set
                                    of conformations is selected and a
                                    second run is carried out with the full
                                    set of distance (i.e. the final rank is
                                    equal to the number of residues in the
                                    chain.

     VTF_BY_RANK                    Indicates that distance should be included using
                                    the established ranks. Otherwise, the procedure
                                    introduces a few distance per minimization.
                                    NOTE: It is generally recommended to use the
                                    keyword VTF_BY_RANK


     STEP_RANK = [-]number        - STEP_RANK > 0 defines the increment of the rank
                                    for sequential minimizations within an iteration
                                    of the VTF procedure (IFLOV). Additionally,
                                    STEP_RANK different from zero implies that
                                    torsional energy terms will NOT be included in the
                                    energy minimization and disulfide bridge (DSB)
                                    information will NOT be used.
                                    DSB closing at beginning of the VTF procedure interferes
                                    greatly with the possibility of satisfying distance
                                    constraints for smaller ranks. Consequently, it is
                                    recommended to add the DSB as an extra set of distance
                                    constraints. Since ECEPP assigns very high weights to
                                    force DSB, adding the DSB as additional
                                    distances constraints allows you to play with
                                    different values of the weights.
                                    If STEP_RANK < 0 is given, all the distance
                                    constraints will be included at once.
                                    With STEP_RANK=0  the VTF procedure INCLUDES
                                    torsional energy, DSB information and proceeds
                                    including distance constraints with a rank increment of 1.

BIG_VIOLATION = number              Is used to determine which conformations are reasonable
                                    after the whole process of generation is finished
                                    (conformation that should be saved). If the  maximum
                                    violation is greater than the BIG_VIOLATION (+ 10%),
                                    the conformation is rejected.  All the conformations VTF
                                    produces should be reasonable.

STEPS_ON_ERROR = number            -At every step of the VTF procedure, we check if the maximum
                                    violation exceeds BIG_VIOLATION. After 'n' consecutive
                                    steps of the vtf procedure that exceed BIG_VIOLATION
                                    (with n=STEPS_ON_ERROR), the generation process is aborted
                                    and a new trial is started. The idea is  to cut time in
                                    useless  energy-minimization. A conformation that do not
                                    satisfy the distance criteria after "n" steps have little
                                    chances to reorganize later, when additional distances are
                                    added. STEPS_ON_ERROR should not be very small, since
                                    some conformations with distances that exceed BIG_VIOLATION
                                    at certain stage of generation can reorganize later on, as
                                    additional distances are included in the minimization.
                                    (From my experience, values for STEPS_ON_ERROR of
                                    15 to 20 seem to work better).

     REGION_SAMP     =            - Use the set of sampling regions specified
                                    for specific amino acid.

                       UNIFORM    - Use uniform sampling through specified
                                    regions

                       NONUNIFORM - Sample through specified regions using
                                    provided weights.

     BACKUP          =            - This keywords should allow to stop the
                                    procedure nicely.  Not implemented, yet.

     RESTART                      - This keywords should allow to restart the
                                    procedure.   Not implemented, yet.

     NO_MINIMIZATION              - Use to check energy terms related to the
                                    distance constraints . No VTF minimization
                                    is being carried out.

     ZIMMERMAN_CODE               - This option is used to print the Zimmerman Code
                                     of the conformation(s).

$WINDOWS
This data group contains the ranges of residues whose dihedral angles will be
changed during sampling. There are as many non-empty lines as the number
of these ranges is. Each line contains the following two integers, read in free
format:

iw1 (the first residue of the range); iw2 (the last residue of the range).

Description of  the Data included for each Residue in rsdata
_____________________________________________________________

A few changes in the original ECEPP3 residue data file have been included to add
flexibility to the program. The goal is to minimize the coding of instructions
that are residue-specific.
example:

ISOLEUCINE          ILE  I    0    0   0.000    F
   19    4-0.9437972 0.3305252-0.0993245 0.9950551
  -8   4
   1.35       3    1    4
  1.35        3    1    5
  1.35        3    1    6
  1.35        3    1    7
      0.350197-0.548499-0.759283       3  5  3         8  0   0
     -0.379217 0.310917-0.871508       5  9  3        11  1   1
      0.999962-0.005163-0.007059       5 10  3        14  0  -1
      0.350197-0.548498-0.759284      10 16  3        17  0   0
                                     N  14 22  -4.59   11    7 10
   -0.4226    0.9063    0.0          HN  2  2   2.27    7    4  6
    1.4530    0.0       0.0          CA  9  7   0.82   17   11 16
    1.7797   -0.4805    0.9222       HA  1  0   0.26   11    7 10
    1.9888   -0.8392   -1.1617       CB  9  7  -0.06    0   17 19
    1.9587    1.4440    0.0          C   7 14   5.80   11    8 10
    1.1648    2.3835    0.0          O  17 26  -4.95    8    0  0
    1.6625   -1.8692   -1.0179       HB  1  0   0.32   17   11 16
    1.4086   -0.3635   -2.4951       CG2 6  5  -0.96   17   14 16
    3.5188   -0.8471   -1.1725       CG1 6  6  -0.25    0   11 13
    0.3225   -0.4528   -2.4713       HG2 1  0   0.32   14    0  0
    1.6840    0.6781   -2.6602       HG2 1  0   0.32   14    0  0
    1.8059   -0.9770   -3.3037       HG2 1  0   0.32   14    0  0
    3.8906    0.1742   -1.2551       HG1 1  0   0.19    0   17 19
    3.8906   -1.2463   -0.2289       HG1 1  0   0.19    0   17 19
    4.0546   -1.6863   -2.3342       CD1 6  5  -0.96    0    0  0
    3.7005   -2.7124   -2.2348       HD1 1  0   0.32    0    0  0
    3.7005   -1.2695   -3.2771       HD1 1  0   0.32    0    0  0
    5.1444   -1.6749   -2.3183       HD1 1  0   0.32    0    0  0
    3.2771    1.5756    0.0

Description of first line:
  (TITL(L,I),L=1,4),ARES(I),ONE_LET(I),NFATO(I), QQQ_READ(I),PK0_READ(I), NMETR

(TITL(L,I),L=1,4) Residue name.
- ARES(I)     Three-letter-code residue identifier, used for sequence definition.
- ONE_LET(I)  One-letter-code residue identifier, used for sequence definition.
- NFATO (I)  Indicates that the 3 initial atoms (N, HN, and CA)  of the first
             full residue should be generated using the data from the amino-end
             (NFATO=0), or the data from the residue (NFATO=1). In particular,
              this assignment affects the charges of these atoms.
- QQQ_READ(i) Net charge of the ionized residue  (used on specific versions
              of the code).
- PK0_READ(I) pKa0 of the ionizable group (used on specific versions of the
              code).
- NMETHYL     Logic variable to indicate if this is an N-methylated residue.

Description of second line:
 NATOMS(I),NCHI(I),SNTH2(I),CSTH2(I),SDEL(I), CDEL(I)
Same as in ECEPP/3 manual.

Description of third line:
 KNDRES(I),NT,NGEOM(I),NTOR(I)
- KNDRES(I) and NGEOM(I) same as in ECEPP/3 manual.
- NTOR(I) is the number of torsional terms that are associated with EXPLICIT
  dihedral angles, while NT is the TOTAL number of the torsional terms associated
  with a residue, i.e. including the possible angles of the bridge formed by
  this residue.  The parameters of the IMPLICIT torsional angles (i.e. those which
  will be calculated from the Cartesian coordinates after a bridge is formed) are
  stored in the arrays after the parameters of the explicit angles.

Description of 4th to 7th lines:
AR(J,I),NBB(J,I),NSS(J,I),NANG(J,I)
Same as in ECEPP/3 manual.

Description of 8th to 11th lines:
(CHIANG(L,J,I),L=1,3),NDPT1(J,I),NDPT2(J,I), NUM(J,I),LRT1(J,I), IBRNCH(J,KINDI),
ISHFK(J,KINDI)
- CHIANG, NDPT1, NDPT2, NUM and LRT1  same as in ECEPP/3 manual.  LRT1 is used in
      this program (not in the original ECEPP/3).
- IBRNCH The program now handles more than one branch on the side-chains.  If there
       is a branch this is defined specifically (IBRNCH =1) for the bond that branches
       out. Also, to bring compatibility with the IUPAC conventions (ECEPP reads
       the torsional angles following this convention), a variable ISHFK is defined
       for each bond to indicate is there is a shift of the bond definition given
       in rsdata.  In some cases, like ILE,  organization of the rsdata file for
       generation purposes (in ECEPP/3) requires a different rearrangement of the
       bonds numbers. In the specific case of ILE, lines 9 and 10 indicate that
       the dihedral angle input for bond 2 and 3 have to be exchanged.

Description of 12th to 31th lines:
 (XOORD(L,J-1,I),L=1,3),ALPHA(J,I),LTYPE(J,I), NTYPE(J,I),CHG(J,I),NSN15(J,I),
NSN14(J,I),NFN14(J,I)

- XOORD, ALPHA, LTYPE, CHG, NSN15, NSN14 and NFN14 same as in ECEPP/3 manual.
- NTYPE atom type for surface solvation models.

How to Build a File  with Distance and/or Dihedral Angle Constraint (bounds.*)
______________________________________________________________________________

A distance constraint energy term can be used in the calculations.
The algorithm used in this program represents a modification of the one
originally implemented in Max Vasquez's VTF (Vasquez, M. & Scheraga, H. A.
 1988. "Variable-Target-Function and Build-up procedures for the calculation
of protein conformation Application to bovine pancreatic trypsin inhibitor
using limited simulated nuclear magnetic resonance data."
J. Biomol. Struct. Dyn. vol. 5, 757-784.

The functional form is:
             Econs= WEI_ENE * Sum [ wei(j)*(| rj - R|)^2];
                            j in {pairs}

                        for rj < R or rj < R  with R an upper or lower bound.
where
       rj is the actual interproton distance.
       R  is either, an upper bound or a lower bound.
       wei is a factor or weight used to make the constraint more (or less)
           relevant with respect to others.
       WEI_ENE is a factor that weights the distance energy term with
       respect to other energy terms ( like electrostatic, torsional,etc.)

Distance constrains are included in the calculations in the following
manner.
1.-   Use the $DIST_CONST data group (NOT the $BOUNDS) to specify the
number of constrains and setup other parameters.

2.- Generate a file (bounds.FILENAME) containing the information for each
constraint as in the following examples. There are two alternative ways to
describe the constraints:

  a.- Using the ecepp number for the specific atoms.  The information should
be written in one line per constraint (78 characters or less), and given in
a free format as:
              mol1 iatm1      mol2 iatm2     lowb     upb      weight
where
     mol1 is the molecule containing the first atom (integer).
     iatm1 is the first atom defining the constraint (integer).
     mol2 is the molecule containing the second atom (integer).
     iatm2 is the second atom defining the constraint (integer).
     lowb  lower bound (real).
     upb    upper bound (real).
     wei    weighting factor (real).

example
               1     34      1     51      1.900       5.000        10.0

   b.-
           1    1    HCA     1     1    HCB   -1.000   3.000     10.0
          mol1  res1  atm    mol2  res2  atm    low-b    upp-b   weight
where
mol1 is the molecule containing the first atom
res1 is the residue containing the first atom
C if lower-bound is -1.000, then VDW contact is assumed.

example:
        mol1 res1 iatm1 mol2 res2 iatm2     lowb   upb     weight
(the file is a FORMATTED one).
           1   1 CA      1  29 CA      7.186   7.942  10.000

will specify: the atoms defining the distance, upper and lower
bounds, and a parameter (a weight) for the constraint.

A line starting with !  is  considered as a comment.
See the example file bounds.timbck
You should enter the number of constraints used in $DIST_CONST as
N1PAIR= mmm and N2PAIR= nnn, where N1PAIR and N2PAIR are the number
of constrains specified using format (a) or (b).

NOTE:  if a  lower-bound is -1.000, then VDW contact is assumed.

DIHEDRAL ANGLES CONSTRAINTS can also be included in the simulations.
The functional form for the penalty energy is the same one used
for the distance constraints (formula written above).
The dihedral angles constraints  are included  in the 'bounds.*' file
as follows:
i.  The word DIHEDRAL must come after the last distance constraint.
ii. The next line should contain a number (real) that represents the
    conversion factor (or penalty weight), WEIDIH (equivalent to
    WEI_ENE in formula above).
iii. Each subsequent line contains a description of  a dihedral angle
     constraint with the following information:
     residue number,  dihedral angle number, expected mean value, maximum
      deviation,  and specified weight value.
example:

DIHEDRAL
100.00
         2        1          -40      40     1000.00
         5        2          -60      20     1000.00

where:
  WEIDIH = 100.00.
  Two dihedral angle constraints are included:
  first: residue 2 , dihedral angle 1 (phi) is forced to  adopt a value of
  -40 deg. and the allowed deviation is 40 ( allowed values are those within
  the interval [-80,0] )
  second: residue 5,  dihedral angle 2 (psi) is forced to adopt values within
  the interval [-80, -40].

Random Number Generators,
------------------------
The program uses two random number generators. The serial version
uses the  VRND program (Prof. Ken Wilson).

The parallel version uses PRNG (Prof. Mal Kalos).
PRNG (parallel random number generator) is freely available
by anonymous ftp.
It's really easy to install it on any 64-bit machine such as the SGI PC.

CTC staff can get it without ftp'ing.

cp /afs/theory/archive/ftp/pub/utilities/prng.tar.Z to wherever you
want to build it.  There are also two Makefiles in the eceppak PRNG
directory, Makefile.DEC8400 and Makefile.IBMSP2.
Any of these makefiles  should be appropriately changed for the  specific
architecture where the user intend to install the program.

If you are not CTC staff, here's how you can get the tar file:

ftp ftp.tc.cornell.edu
login in as user anonymous
give email address as password
cd pub/utilities
get prng.tar.Z

-----------------------------------------------------------
IF IMSL libraries are not available in your computer:

Edit the file orient1.F and comment the lines:

#ifdef AIX
      CALL DEVCSF (3,RTR,3,EIGVAL,T,3)
      IJUMP=1
#endif

Also, removed  "-limsl" from the "make" file,

LIBS = -L/usr/local/lib -limsl

should read:

LIBS = -L/usr/local/lib

Finally, recompile the program.

Only the "GOLUB" option will work for calculations of rms deviations.