Skip to content

SMILES Selection

When different resolvers disagree on the SMILES for a given compound, the SMILES selection method will be used to determine the "best" SMILES. Set smiles_selection_mode to any of the following in the resolve_compounds_to_smiles function (Default: 'weighted'):

  • 'consensus': Pick the SMILES string returned by the most resolvers. Tie-breaker: lexicographical order of the canonical SMILES.
  • 'ordered': Pick the first SMILES that was generated by a resolver with the highest priority. The order of the resolvers provided as the resolvers_list argument in resolve_compounds_to_smiles determines the priority (highest to lowest).
  • 'weighted': Assign weights to resolvers. Sum weights per SMILES. Pick highest total. Custom weights can be assigned at resolver initialization. See Resolvers for default weights.
  • 'rdkit_standardized': Pick the SMILES that is most standardized by RDKit. Penalizes SMILES with more fragments, formal charges, radicals, and isotopes.
  • 'fewest_fragments': Pick the smiles with the fewest fragments (separated by '.')
  • 'longest_smiles': Pick the longest SMILES.
  • 'shortest_smiles': Pick the shortest SMILES.
  • 'random': Pick a random SMILES.
  • 'highest_symmetry': Pick the SMILES with the highest symmetry.

Custom SMILES selection functions

You can also pass a function to resolve_compounds_to_smiles as the smiles_selection_mode to use custom selection functions. For example:

from cholla_chem import resolve_compounds_to_smiles
from typing import Dict, List, Tuple

def reverse_alphabetical(
    smiles_dict: Dict[str, List[str]], 
    **kwargs
) -> Tuple[str, List[str]]:

    smiles = sorted(smiles_dict.keys())[-1]
    return smiles, smiles_dict[smiles]

resolved_smiles = resolve_compounds_to_smiles(
    compounds_list=['aspirin'], 
    smiles_selection_mode=reverse_alphabetical
)