Chemical Database Manager

This chapter presents the mychemdb_manager tool, a chemical database manager for handling chemical databases with MySQL and Mychem.

mychemdb_manager

The Mychem software is useful when working with chemical databases. In order to facilitate the creation and the management of such databases, mychemdb_manager, a Python program, is distributed with the Mychem code. This script is a command line interface that permits to create or update a chemical database.

The mychemdb_manager script can be found in the scripts directory from Mychem. It is a Python program released under the new BSD license. It requires the pymysql Python module. This module is provided by most GNU/Linux distributions. It can also be installed using pip.

The usage of the script is simple. A help can be displayed by using the -h option:

usage: mychem_manager [-h] [-v] -H HOST -U USER -D DATABASE [-P] [-n NAME_TAG]
                      [-l LOG_FILE] [-p PREFIX] [-a | -r] [-V]
                      sdfile

mychem_manager load a file in MDL SDF format into a MySQL database and creates
a chemical cartridge with Mychem.

positional arguments:
  sdfile                Name of the MDL SDF file containing the chemical-data
                        to load into the MySQL database.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -H HOST, --host HOST  Name of the MySQL host.
  -U USER, --user USER  User for login to the MySQL server.
  -D DATABASE, --db DATABASE
                        Name of the MySQL database to use
  -P, --password        Specify if a password is required to connect to MySQL.
  -n NAME_TAG, --nametag NAME_TAG
                        Name of the tag used in the MDL SDF file to define the
                        name of the chemical compound.
  -l LOG_FILE, --logfile LOG_FILE
                        Name of the log file to send logging output to.
  -p PREFIX, --prefix PREFIX
                        Prefix added to the default table names.
  -a, --append          Specify if the data should be added to the existing
                        mychem tables.
  -r, --replace         Specify if the new data should replace existing data.
  -V, --verbose         Enable verbose debug messages.

The first step is to create a database for storing the chemical tables. In this documentation, the database will be named mychem, but any other name can be used (the -D option permit to set the database name).

--
-- Database creation
--

CREATE DATABASE `mychem`;

Once the database is created, it is possible to load the MDL SDF file with the mychemdb_manager script:

$ python mychemdb_manager -D mychem -H localhost -U user -P database.sdf
Enter MySQL password:
The MDL SDFFile has been successfully loaded.

The previous command will create the following tables:

  • mychem_compounds - It contains the compound’s name and two timestamps (when the entry is created and when the entry is updated).

  • mychem_1D_structures - It contains the 1D representation of the compounds (SMILES and InChI code).

  • mychem_3D_structures - It contains the 3D structure of the compounds in MDL Molfile format.

  • mychem_bin_structures - It contains the binary representation (fp2 and obserialized object) of the compounds.

    Note

    It is possible to use another prefix than mychem by using the -t option.

If all these tables are already existing, you have to choose either to append data (-a option) or to replace existing data (-r option).