PyData Global 2022

The Pythonic Common Chemical Universe
12-02, 14:00–14:30 (UTC), Talk Track I

The virtual chemical universe is expanding rapidly as open access titan databases Enamine Database (20 Billion), Zinc Database (2 Billion), PubMed Database (68 Million) and cheminformatic tools
to process, manipulate, and derive new compound structures are being established. We present our open source knowledge graph, Global-Chem, written in python to distribute dictionaries of common chemical lists of relevant to different sub-communities out to the general public i.e What is inside Food? Cannabis? Sex Products? Chemical Weapons? Narcotics? Medical Therapeutics?

To navigate new chemical space we use our data as a reference index as to help us keep track of common patterns of interest and help us explore new chemicals that could be theoretically real. In our talk, we will present the chemical data, the rules governing the data and it's integrity, and how to use our tools to understand the chemical universe with python.


Selecting chemical compounds requires expertise. Expertise is gained by experience and studying a dedicated discipline. Dedicated disciplines most often have a set of common functional groups that are relevant to that community, this allows us to focus on compounds that are valuable. We do not need all the compounds, since a lot of them are not useful or not possible. In our talk, we describe how Global-Chem, an open source knowledge graph, was developed to facilitate the ability of scientists in both academia and industry to make their compounds of interest readily available to the scientific community in the form of objects that may be directly accessed from python.


Prior Knowledge Expected

Previous knowledge expected

I am this odd blend of an organic chemist, computer scientist, and philanthropist. Through education and research, I have been part of many chemistry labs from natural products to inorganic metal frameworks to medicinal chemistry. As a result of my research in chemistry, and the majority of my undergraduate friends being software engineers I picked up computer science and took the classes. My time in industry has been through a couple of high-throughput research hospitals, building startup life science tech companies, and real estate.

In my most recent startup at L7 Informatics, I learned how to be the role of a leader, scientist and a software engineer learning how to integrate experiment workflows into our application and also develop software adequate to meet the needs of the customers. I moved into being a junior DevOps engineer to understand large scale infrastructure management.

I left the start-up scene to revisit chemistry and computer science in the context of Force Fields under the MacKerell Group at the University of Maryland School of Pharmacy PhD program. On my daily grind, I use a blend of different fields to expand chemical space and explore new avenues of technology including quantum mechanics, data visualization, and cloud infrastructure.