A typical PDB format file will contain atomic coordinates for a diverse collection of proteins, small molecules, ions and water. Each atom is entered as a line of information that starts with a keyword: either ATOM or HETATM. By tradition, the ATOM keyword is used to identify proteins or nucleic acid atoms, and keyword HETATM is used to identify atoms in small molecules. Following this keyword, there is a list of information about the atom, including its name, its number in the file, the name and number of the residue it belongs to, one letter to specify the chain (in oligomeric proteins), its x, y, and z coordinates, and an occupancy and temperature factor.
Here an example:
Pdb Entrie: 2Z3E
This information gives you a lot of control when exploring the structure. For instance, most molecular graphics programs enable you to color identified portions of the molecule selectively.
Biological molecules are hierarchical, building from atoms to residues to chains to assemblies. Coordinate files contain ways to organize and specify molecules at all of these levels. As described above, the atom names and residue information are included in each atom record. The higher-order information is identified by keywords that separate blocks of atom records, such as TER and MODEL.
Protein and nucleic acid chains are specified by the TER keyword, as well as a one-letter designation in the coordinate records. The chains are included one after another in the file, separated by a TER record to indicate that the chains are not physically connected to each other. Most molecular graphics programs look for this TER record so that they don’t draw a bond to connect different chains.
PDB format files use the MODEL keyword to indicate multiple molecules in a single file. This was initially created to archive coordinate sets that include several different models of the same structure, like the structural ensembles obtained in NMR analysis. When you view these files, you will see dozens of similar molecules all superimposed. The MODEL keyword is now also used in biological assembly files to separate the many symmetrical copies of the molecule that are generated from the asymmetric unit