SMILES Selection
When different resolvers disagree on the SMILES for a given compound, the SMILES selection method will be used to determine the "best" SMILES. Set smiles_selection_mode to any of the following in the resolve_compounds_to_smiles function (Default: 'weighted'):
- 'consensus': Pick the SMILES string returned by the most resolvers. Tie-breaker: lexicographical order of the canonical SMILES.
- 'ordered': Pick the first SMILES that was generated by a resolver with the highest priority. The order of the resolvers provided as the resolvers_list argument in resolve_compounds_to_smiles determines the priority (highest to lowest).
- 'weighted': Assign weights to resolvers. Sum weights per SMILES. Pick highest total. Custom weights can be assigned at resolver initialization. See Resolvers for default weights.
- 'rdkit_standardized': Pick the SMILES that is most standardized by RDKit. Penalizes SMILES with more fragments, formal charges, radicals, and isotopes.
- 'fewest_fragments': Pick the smiles with the fewest fragments (separated by '.')
- 'longest_smiles': Pick the longest SMILES.
- 'shortest_smiles': Pick the shortest SMILES.
- 'random': Pick a random SMILES.
- 'highest_symmetry': Pick the SMILES with the highest symmetry.
Custom SMILES selection functions
You can also pass a function to resolve_compounds_to_smiles as the smiles_selection_mode to use custom selection functions. For example:
from cholla_chem import resolve_compounds_to_smiles
from typing import Dict, List, Tuple
def reverse_alphabetical(
smiles_dict: Dict[str, List[str]],
**kwargs
) -> Tuple[str, List[str]]:
smiles = sorted(smiles_dict.keys())[-1]
return smiles, smiles_dict[smiles]
resolved_smiles = resolve_compounds_to_smiles(
compounds_list=['aspirin'],
smiles_selection_mode=reverse_alphabetical
)