The thesis focuses on investigating the chemical space of commercially available compounds to ensure their compliance with contemporary medicinal chemistry standards. In this research, a novel open-source toolkit for designing compound libraries, named Synthons Interpreter or SynthI, was developed (later the name was changed to Synt-On due to trademark issues, but for convenience the name SynthI is used in the manuscript). This toolkit establishes connections between building blocks (BBs) and fragments, obtained through pseudo-retrosynthetic fragmentation of larger molecules, using synthon-based representation. It relies on thirty-eight reaction rules for breaking chemical bonds, resulting in a range of synthons, each associated with unique labels that link them to approximately 150 types of BBs. These labels encode the position and chemical characteristics of reactive centers while maintaining structural validity, allowing synthons to be treated as actual compounds. This approach not only facilitates the creation of synthetically feasible libraries but also enhances BBs analysis within the context of medicinal chemistry.
The efficacy of SynthI was evaluated on Enamine's in-stock BB library for reagent classification, filtration, and scaffold analysis. The fragmentation of compounds from a list of recently approved drugs was conducted, and the resulting synthetic pathways were compared with existing literature, demonstrating the accuracy of SynthI in most cases, except for heterocyclization steps that were not yet implemented. Libraries of analogs were also generated for selected drugs. Notably, the distinct characteristic of SynthI's library design lies in its strong reliance on available BBs. The utilization of synthon-based library design enables generation of collections comprising synthesizable compounds that retain structural similarity to the original molecule while displaying diversity.
This work extensively analyzed commercially available BBs from eMolecules in terms of market availability, quality, diversity, and their relevance to current medicinal chemistry demands. The evaluation was achieved by fragmenting biologically relevant molecules sourced from the ChEMBL database using SynthI. The resulting synthons were compared with those generated from PBB (Purchasable Building Blocks), leading to a comprehensive analysis of PBB within the context of medicinal chemistry.
The analysis revealed that the most prevalent classes of BBs, such as amines, acids, aryl halides, and aliphatic alcohols, correlate with the popularity of corresponding reactions, like amide formation, Pd-mediated couplings, Buchwald-Hartwig amination, and alkylation. However, the availability of well-studied reactions is not the sole determinant of reagent market presence. For instance, sulfonate esters, secondary and (hetero)benzylic primary alkyl halides are less common due to shorter shelf-life, and the deficit of S-nucleophiles is attributed to challenges in storage conditions. The scarcity of several reagents like SuFEx and polyfunctional BBs is linked to their recent introduction.