Getting Started
This guide covers all four AgaveChem mappers in depth, the map_reactions pipeline function, working with mapper results, scoring mappings, and advanced configuration options.
Overview of the mapping pipeline
AgaveChem provides four composable mappers that can be used independently or combined into a pipeline. In the default pipeline, mappers are applied in order of increasing complexity:
- Identical fragment mapper — maps spectator molecules and structurally unchanged fragments before any other mapper is invoked
- MCS mapper — assigns atom-map numbers to atoms whose local chemical environment is preserved across the reaction, yielding conservative partial maps
- Expert template mapper — applies a curated library of reaction SMIRKS templates to classify and fully map known reaction classes
- Neural mapper — an ALBERT-based model trained on labeled USPTO reactions, used for complete mapping at inference time
Each mapper operates on unmapped reaction SMILES and returns a ReactionMapperResult dictionary. A higher-priority mapper in the pipeline overrides the output of a lower-priority one.
The map_reactions function
The easiest way to map a batch of reactions is the top-level map_reactions function, which runs the MCS and template mappers by default:
from agave_chem import map_reactions
reactions = [
"CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl",
"OCC(=O)OCCCO.Cl>>ClCC(=O)OCCCO",
]
results = map_reactions(reactions)
for r in results:
print(r["original_reaction"])
print(r["final_mapping"])
print()
Parameters
reaction_list(List[str]) — list of unmapped reaction SMILES strings; a single string is also acceptedmappers_list(List[ReactionMapper]) — list of mapper instances to apply in order; defaults to[MCSReactionMapper("mcs_default"), TemplateReactionMapper("expert_default")]batch_size(int) — number of reactions per processing batch (default:500, max:1000)
Using the neural mapper in the pipeline
To use the neural mapper, pass it explicitly in mappers_list. Place it last so it acts as the highest-priority mapper:
from agave_chem import map_reactions, MCSReactionMapper, NeuralReactionMapper
results = map_reactions(
reactions,
mappers_list=[
MCSReactionMapper("mcs"),
NeuralReactionMapper("neural"),
],
)
Working with mapper results
Every mapper returns a ReactionMapperResult dictionary with the following keys:
| Key | Type | Description |
|---|---|---|
original_smiles |
str |
The unmapped input reaction SMILES |
selected_mapping |
str |
The chosen mapped reaction SMILES (empty string if mapping failed) |
possible_mappings |
Dict[str, List] |
All candidate mappings produced before selection |
mapping_type |
str |
Which mapper produced the result ("mcs", "template", "neural", etc.) |
mapping_score |
Any |
Optional score attached by the mapper (may be None) |
additional_info |
List[Dict] |
Mapper-specific metadata |
The map_reactions function returns AgaveChemMapperResult dictionaries with an additional final_mapping key (the best mapping across all mappers) and mapper_results (the per-mapper results list).
result = results[0]
print(result["final_mapping"]) # best mapping from the pipeline
print(result["original_reaction"]) # original unmapped SMILES
print(result["mapper_results"][0]["mapping_type"]) # which mapper produced it
Identical fragment mapper
The identical fragment mapper handles fragments that appear structurally unchanged on both sides of the reaction. It is invoked automatically within map_reactions, but can also be used standalone:
from agave_chem import IdenticalFragmentMapper
mapper = IdenticalFragmentMapper("my_ifm")
result = mapper.map_reaction("CC(=O)O.[Na+].[Cl-]>>CC(=O)[O-].[Na+].[Cl-]")
print(result["selected_mapping"])
Identical fragments are assigned map numbers from a reserved range (starting at 500) to avoid collisions with downstream mappers.
MCS mapper
The MCS mapper assigns atom-map numbers to atoms whose local chemical environment is preserved across the reaction. It uses a bond-radius environment fingerprinting scheme and is efficient even for large, multi-fragment reactions.
from agave_chem import MCSReactionMapper
mapper = MCSReactionMapper("my_mcs")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
Key parameters for map_reaction
min_radius(int) — smallest bond-radius to consider (default:1)min_radius_to_anchor_new_mapping(int) — below this radius, environments are only matched when they already contain at least one mapped atom; controls how close to the reactive center new anchor atoms can be seeded (default:3)max_radius(Optional[int]) — largest bond-radius to search; defaults to the size of the largest molecule
Partial mapping
The MCS mapper produces partial maps for reactions where the reactive center cannot be unambiguously resolved. The selected_mapping will contain atom-map numbers only for atoms whose environment was confidently matched:
result = mapper.map_reaction("c1ccccc1Br.B(O)(O)c1ccccc1>>c1ccc(-c2ccccc2)cc1")
# Atoms in the biaryl core will be mapped; the coupling site atoms may not be
print(result["selected_mapping"])
Expert template mapper
The expert template mapper applies a curated library of reaction SMIRKS templates to classify and fully map reactions that match known reaction classes.
from agave_chem import TemplateReactionMapper
mapper = TemplateReactionMapper("my_template")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
print(result["possible_mappings"]) # all candidate mapped SMILES and their templates
Constructor parameters
mapper_name(str) — unique name for this mapper instancemapper_weight(float) — priority weight in a multi-mapper pipeline (default:3)custom_smirks_patterns(List[Dict] | None) — list of user-supplied SMIRKS patterns; each dict must have"name","smirks", and"superclass_id"keysuse_default_smirks_patterns(bool) — whether to include the built-in template library (default:True)max_transforms(int) — maximum number of tautomer transforms (default:1000)max_tautomers(int) — maximum number of tautomers to enumerate (default:1000)use_mcs_mapping(bool) — whether to use MCS internally to focus template matching on the probable reaction center (default:True)
Using custom SMIRKS patterns
You can extend or replace the built-in template library with your own reaction SMIRKS:
from agave_chem import TemplateReactionMapper
custom_patterns = [
{
"name": "My custom acylation",
"smirks": "[C:1](=[O:2])[Cl:3].[N:4]>>[C:1](=[O:2])[N:4]",
"superclass_id": None,
}
]
mapper = TemplateReactionMapper(
"custom_template",
custom_smirks_patterns=custom_patterns,
use_default_smirks_patterns=True, # combine with built-in patterns
)
result = mapper.map_reaction("CC(=O)Cl.NC>>CC(=O)NC")
print(result["selected_mapping"])
Neural mapper
Overview
The neural mapper uses a supervised ALBERT-based model trained on labeled USPTO reactions. It is the recommended mapper for general-purpose atom mapping.
from agave_chem import NeuralReactionMapper
mapper = NeuralReactionMapper("my_neural")
result = mapper.map_reaction("CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl")
print(result["selected_mapping"])
Neural mapper parameters
mapper_name(str) — unique name for this mapper instancemapper_weight(float) — priority weight in a multi-mapper pipeline (default:3)checkpoint_path(Optional[str]) — path to a custom model checkpoint directory; defaults to the bundled pre-trained modeluse_supervised(bool) — whether to use the supervised (fine-tuned) model head (default:True)sequence_max_length(int) — maximum token sequence length (default:512)
Mapping a batch
from agave_chem import NeuralReactionMapper
mapper = NeuralReactionMapper("neural_batch")
results = mapper.map_reactions([
"CC(Cl)(Cl)OC(C)(Cl)Cl.CC(=O)C(=O)O>>CC(=O)C(=O)Cl",
"OCC(=O)OCCCO.Cl>>ClCC(=O)OCCCO",
])
for r in results:
print(r["selected_mapping"])
Mapping scorer
MappingScorer evaluates the quality of an atom-mapped reaction SMILES using a set of chemically motivated metrics.
from agave_chem import MappingScorer
scorer = MappingScorer()
score = scorer.score_mapping(
"[CH3:1][C:2](=[O:3])[OH:4].[HO:5][CH2:6][CH3:7]>>[CH3:1][C:2](=[O:3])[O:5][CH2:6][CH3:7]"
)
print(score)
Scoring metrics
| Metric | Description |
|---|---|
bond_energy_cost |
Summed bond dissociation energy cost of all bond changes |
num_bond_changes |
Total number of bonds formed or broken |
num_fragments |
Number of disconnected fragments affected |
stereo_changes |
Number of stereocenters whose configuration changes |
ring_changes |
Number of ring opening or closing events |
Custom weights
Each metric weight can be adjusted at construction time:
scorer = MappingScorer(
energy_penalty_weight=1.0,
bond_change_weight=10.0,
fragment_weight=20.0,
stereo_weight=15.0,
ring_weight=25.0,
)
Composing mappers manually
Mappers can be composed into a custom pipeline by passing a mappers_list to map_reactions. The list is applied in order; the last mapper whose result is non-empty is used as the final_mapping:
from agave_chem import (
map_reactions,
MCSReactionMapper,
TemplateReactionMapper,
NeuralReactionMapper,
)
results = map_reactions(
reactions,
mappers_list=[
MCSReactionMapper("mcs", mapper_weight=1),
TemplateReactionMapper("template", mapper_weight=2),
NeuralReactionMapper("neural", mapper_weight=3),
],
batch_size=200,
)
Note that mapper names within a pipeline must be unique.
Sanitizing and validating mapped reactions
All mapper classes expose shared utility methods via the ReactionMapper base class:
from agave_chem import MCSReactionMapper
mapper = MCSReactionMapper("util")
# Sanitize a mapped reaction SMILES (validates, optionally canonicalizes)
clean = mapper.sanitize_rxn_string(
"[CH3:1][C:2](=[O:3])[OH:4].[HO:5][CH2:6]>>[CH3:1][C:2](=[O:3])[O:5][CH2:6]",
expect_full_mapping=True,
canonicalize=True,
remove_mapping=False,
)
print(clean)
# Sanitize a molecule
from rdkit import Chem
mol = Chem.MolFromSmiles("CC(=O)O")
clean_mol = mapper.sanitize_molecule(mol, add_hs=False)