Resolvers
cholla_chem uses a variety of resolvers to convert chemical names to SMILES. Resolvers can be initialized and passed to the function resolve_compounds_to_smiles as a list to customize how compounds are resolved to SMILES. If no resolvers are passed, the following default resolvers will be used:
- PubChemNameResolver('pubchem_default', resolver_weight=2),
- OpsinNameResolver('opsin_default', resolver_weight=3),
- ManualNameResolver('manual_default', resolver_weight=10),
- StructuralFormulaNameResolver('structural_formula_default', resolver_weight=2)
- InorganicShorthandNameResolver('inorganic_shorthand_default', resolver_weight=2)
Passing Resolvers to resolve_compounds_to_smiles:
Initialize resolvers with a name (required), and resolver_weight (optional):
from cholla_chem import resolve_compounds_to_smiles
from cholla_chem import (
OpsinNameResolver,
PubChemNameResolver,
CIRpyNameResolver
)
opsin_resolver = OpsinNameResolver(
resolver_name='opsin',
resolver_weight=4
)
pubchem_resolver = PubChemNameResolver(
resolver_name='pubchem',
resolver_weight=3
)
cirpy_resolver = CIRpyNameResolver(
resolver_name='cirpy',
resolver_weight=2
)
resolved_smiles = resolve_compounds_to_smiles(
['2-acetyloxybenzoic acid'],
[opsin_resolver, pubchem_resolver, cirpy_resolver],
detailed_name_dict=True
)
"{'2-acetyloxybenzoic acid': {
'SMILES': 'CC(=O)Oc1ccccc1C(=O)O',
'SMILES_source': ['opsin', 'pubchem', 'cirpy'],
'SMILES_dict': {
'CC(=O)Oc1ccccc1C(=O)O': ['opsin', 'pubchem', 'cirpy']
},
'additional_info': {}
}}"
OpsinNameResolver
This resolver uses OPSIN for name-to-SMILES conversion. The code is adapted from py2opsin. This resolver can be configured with the following arguments:
Arguments:
- allow_acid (bool, optional): Allow interpretation of acids. Defaults to False.
- allow_radicals (bool, optional): Enable radical interpretation. Defaults to False.
- allow_bad_stereo (bool, optional): Allow OPSIN to ignore uninterpretable stereochemistry. Defaults to False.
- wildcard_radicals (bool, optional): Output radicals as wildcards. Defaults to False.
Default weight for 'weighted' SMILES selection method: 3
from cholla_chem import OpsinNameResolver
opsin_resolver = OpsinNameResolver(
resolver_name='opsin',
resolver_weight=3,
allow_acid=False,
allow_radicals: True,
allow_bad_stereo: False,
wildcard_radicals: False
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['2-acetyloxybenzoic acid'],
resolvers_list=[opsin_resolver]
)
PubChemNameResolver
This resolver uses PubChem for name-to-SMILES conversion. The code is adapted from PubChemPy to implement batching with the Power User Gateway XML schema to significantly speed up SMILES resolutions.
Default weight for 'weighted' SMILES selection method: 2
from cholla_chem import PubChemNameResolver
pubchem_resolver = PubChemNameResolver(
resolver_name='pubchem',
resolver_weight=2
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['acetone'],
resolvers_list=[pubchem_resolver]
)
CIRpyNameResolver
This resolver uses the python library CIRpy, a Python interface for the Chemical Identifier Resolver (CIR) by the CADD Group at the NCI/NIH.
Default weight for 'weighted' SMILES selection method: 1
from cholla_chem import CIRpyNameResolver
cirpy_resolver = CIRpyNameResolver(
resolver_name='cirpy',
resolver_weight=1
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['acetone'],
resolvers_list=[cirpy_resolver]
)
ChemSpiPyNameResolver
This resolver uses the python library ChemSpiPy, a Python interface for the ChemSpider API by the RSC. This resolver must be initialized with a ChemSpider API key, which can be obtained here.
Default weight for 'weighted' SMILES selection method: 3
from cholla_chem import ChemSpiPyNameResolver
chemspider_resolver = ChemSpiPyNameResolver(
resolver_name='chemspider',
resolver_weight=3,
chemspider_api_key='CHEMSPIDER_API_KEY'
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['acetone'],
resolvers_list=[chemspider_resolver]
)
ManualNameResolver
This resolver uses a dataset of manually curated names and their corresponding SMILES, especially focused on common names that are incorrectly resolved by other resolvers (e.g. 'NaH').
Default weight for 'weighted' SMILES selection method: 10
from cholla_chem import ManualNameResolver
manual_resolver = ManualNameResolver(
resolver_name='manual',
resolver_weight=10
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['NaH'],
resolvers_list=[manual_resolver]
)
ManualNameResolver can also be initialized with a custom dictionary mapping chemical names to SMILES:
from cholla_chem import ManualNameResolver
custom_name_dict = {'Foobar': 'c1ccccc1'}
manual_resolver = ManualNameResolver(
resolver_name='manual',
resolver_weight=10,
provided_name_dict=custom_name_dict
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['Foobar'],
resolvers_list=[manual_resolver]
)
StructuralFormulaNameResolver
This resolver converts simple structural chemical formulas (e.g. 'CH3CH2CH2COOH') to SMILES.
Default weight for 'weighted' SMILES selection method: 2
from cholla_chem import StructuralFormulaNameResolver
structural_formula_resolver = StructuralFormulaNameResolver(
resolver_name='structural_formula',
resolver_weight=2
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['CH3CH2CH2COOH'],
resolvers_list=[structural_formula_resolver]
)
InorganicShorthandNameResolver
This resolver converts inorganic chemical formulas (e.g. '[Cp*RhCl2]2') to SMILES.
Default weight for 'weighted' SMILES selection method: 2
from cholla_chem import InorganicShorthandNameResolver
inorganic_shorthand_resolver = InorganicShorthandNameResolver(
resolver_name='inorganic_shorthand',
resolver_weight=2
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['[Cp*RhCl2]2'],
resolvers_list=[inorganic_shorthand_resolver]
)
Custom Resolvers
This library also supports using custom resolvers. To use a custom resolver, import the base class ChemicalNameResolver, and create a subclass with the format shown below. The name_to_smiles method is used to resolve compound names to SMILES. In this example, the method resolves names using a simple lookup dictionary, but it can be also used to call an API, use other name-to-SMILES libraries, run an algorithm, etc. This method must return a tuple of dictionaries, where the first dictionary maps chemical names (strings) to SMILES (strings). The second dictionary returns information (e.g. errors in the resolution process) to the detailed_name_dict by mapping chemical names (strings) to some message (strings).
from cholla_chem import resolve_compounds_to_smiles
from cholla_chem import ChemicalNameResolver
class MyCustomResolver(ChemicalNameResolver):
"""
My custom resolver.
"""
def __init__(self, resolver_name: str, resolver_weight: float = 1):
super().__init__("example", resolver_name, resolver_weight)
def name_to_smiles(
self,
compound_name_list: List[str]
) -> Tuple[Dict[str, str], Dict[str, str]]:
"""
Lookup chemical names from a dict.
"""
lookup_dict = {
'benzene': 'c1ccccc1'
}
resolved_names_dict = {}
additional_info_dict = {}
for compound_name in compound_name_list:
resolved_smiles = lookup_dict.get(compound_name, '')
resolved_names_dict[compound_name] = resolved_smiles
if not resolved_smiles:
additional_info_dict[compound_name] = 'Some info message.'
return resolved_names_dict, additional_info_dict
my_custom_resolver = MyCustomResolver(
resolver_name='example',
resolver_weight=1
)
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['benzene', 'aspirin'],
resolvers_list=[my_custom_resolver],
detailed_name_dict=True
)
"{'benzene': {
'SMILES': 'c1ccccc1',
'SMILES_source': ['example'],
'SMILES_dict': {
'c1ccccc1': ['example']
},
'additional_info': {}
},
'aspirin': {
'SMILES': '',
'SMILES_source': [],
'SMILES_dict': {},
'additional_info': {
'example': 'Some info message.'
}
}}"