Programming is only half the battle

Posted on April 19, 2021 by perez.ivan.e

Documentation always seems to get pushed to the side while innovation and achievements are promoted and venerated. However, the usefulness of your innovation hinges on the communication of your idea. Developing for crobat, I find that nobody can contribute anything because there is no documentation, and conversely I find I can use other people’s API’s with some success because of their documentation. I created some documentation for crobat using LaTeX, but found https://swagger.io to be a very useful tool.

While innocuous this has started me down the path of learning how to keep track of changes and further organize projects so that others can readily contribute.

If you would like to check out the current state check out the github page for the manual and see the current manual.pdf. Here is a quick screenshot of the for function documentation hat I made!

Originally published February 5, 2021

Half-baked: LSE Project Proposals

Posted on April 19, 2021 by perez.ivan.e

These past two months I’ve found out exactly how hard it is to write a project proposal in Math and Stats. My ideas focused on detection schemes similar to my thesis, but had elements of market microstructure. I hope you take a look!

LSE Statistics Proposal:

It’s about detecting adverse selection events, and learning about how the order book changes when a private signal makes it to informed traders.

LSE Mathematics Proposal:

It’s about also detecting adverse selection events, but emphasizing the use of queuing theory, learning the mathematics behind the thresholds and constraints faced by investors when they cause these changes.

Originally published January 19, 2021

Pricing Model Calibration Through Stochastic Optimization – Stochastic Optimization Project

Posted on April 19, 2021 by perez.ivan.e

Hi, this is very overdue but I thought I should upload my old stochastic optimization project. I did it as the final rpoject for the course stochastic optimization by computer simulation MATH 795-61 at the graduate center. The course (website) was taught by Prof. Felisa Vasquez-Abad and the it gave a statistical footing on the underpinnings of machine learning.

The project was about calibrating the parameters from the Heston stochastic volatility to a time series of closing option prices. The work uses stochastic optimization to minimize the MSE between the observed option prices and the simulated option prices.

I should make a demo on github but the project paper is great on its own.

github link

Originally published: September 25, 2020

launched crobat!

Posted on April 19, 2021 by perez.ivan.e

Hi tonight I launched my surprisingly summer project, the Cryptocurrency Order Book Analysis Tool aka crobat.

It’s pretty cool. it can give you a rough snapshot of the orderbook and better yet it can write down the events at occur in the Level 2 order book for coinbase exchanges. This will be convenient for longitudinal studies of market microstructure since its just taking off.

Follow the github link to learn more about it !

Originally published: September 24, 2020

A Study of CUSUM Statistics on Bitcoin Transactions

Posted on April 19, 2021 by perez.ivan.e

Hello all,

Putting out the current draft of my thesis beamer just in case anyone wants to check it out. Still working on a useful application, and incorporating CUSUM statistics for spreads but this work some cool insights.

Ivan

Abstract:

In this thesis, our objective is to study the relationship between transaction price and volume in the BTC/USD Coinbase exchange. In the second chapter, we develop a consecutive CUSUM algorithm to detect instantaneous changes in the arrival rate of market orders. We begin by estimating a baseline rate using the assumption of a local time-homogeneous compound Poisson process. Our observations lead us to reject the plausibility of a time-homogeneous compound Poisson model on a more global scale by using a chi-squared test. We thus proceed to use CUSUM-based alarms to detect consecutive upward and downward changes in the arrival rate of market orders. In the third chapter we identify active periods from the number of consecutive upward CUSUM alarms, leading to the classification of active versus inactive periods. Finally we use One-Way ANOVA to assess the level effect on price swings for periods classified as containing at least two or three consecutive CUSUM up alarms. We show that in these active periods, price swings are significantly larger.

Link: A Study of CUSUM Statistics on Bitcoin Transactions

Originally published: July 23, 2020

Introduction to Markov Processes – Foundations of Stochastic Finance

Posted on April 19, 2021 by perez.ivan.e

In this blog post I talk a little about Markov Processes as a foundation for the traditional BSM-OPM derivation. Since RMarkdown API is depreciated, I can’t post the RMarkdown notebook directly on wordpress. I kindly ask you to check it out on github.

.pdf version:

Markov Processes – Foundations of Stochastic Finance

Originally published January 17, 2019

Process Route of Simeprevir(Olysio®)

Posted on April 19, 2021 by perez.ivan.e

Today I will be covering the process route of Simeprevir, a leading small molecule antiviral for Hepatitis C. Developed with Jannsen Pharmaceuticals Inc., Rosenquist’s paper highlights many things with SAR and computer aided drug design, and makes a great case study for medicinal chemistry students. One thing not overly emphasized in the paper, was how they moved from a lab scale lead optimization route to a process development route. They make short references that their choices of substitute reagents made things safer in the pilot plant. Given my love for scale up, I think we should talk a little more about the clever process techniques they employed. The route is broken up into 3 Schemes, and the lead optimization route will run concurrent to the process development route to highlight differences.

Before we start let’s contextualize the lead optimization route. Methodology behind medicinal chemistry synthesis is analogous to combinatorial synthesis or large panel screening synthesis techniques. The reactions have to be ubiquitous, work across a wide range of substrates and lend themselves to facile or minimal purification. We can take away is that in order to do this synthesis we need, peptide coupling, a robust method to asymmetrically get to the trans-cylclopentanone dicarboxyllic acid, and a simple way to selectively peptide couple each carbonyl.

Scheme 1: Preparation of The Bicylic Lactone Acid

The Lead Optimization Route:

Starts with a 5 step process to get to the bycyclic lactone acid. They employ unscalable reactions such as a Diels-Alder. Reagents include sodium borohydride (pyrophorric when finely divided). They employ a clever asymmetric esterhydrolysis using Pig Liver Esterase(PLE). We must remember that PLE isn’t a cost effective method process synthesis, and would require an investment in a pilot scale bioreactor, and the somewhat rare pilot scale biosynthesis production chemist.

The Process Development Route:

The synthesis begins from an unresolved mixture of cyclopentanone 3,4 dicarboxylic acid() cutting out ~3steps from the Lead Optimization Route. They improve, on the synthesis by performing a Raney Nickel Hydrogenation, suspending the product as triethylamine salt in water, and directly lactonizing all in one pot. They perform a kinetic resolution using cinchonidine, and report the solid has a shelf life of at least 3 years.

Part 2: Preparation of The Olefin Metathesis Partners

The Lead Optimization Route:

From the bicycliclactone, the synthesis continues with a HATU/DIPEA peptide coupling off the free acid. keeping with the straightforward and simple synthesis, they perform a LiOH/water hydrolysis to open up the lactone. With the second acid free, they peptide couple to install the cycloproyl. After setting up the metathesis partners they finish the synthesis by using a Mitsunobu to install the aryl substituent.

The Process Development Route:

The process route takes a similar approach, they opt with using 2-Ethoxy-1-ethoxycarbonyl-1,2-dihydroquinoline (EEDQ) and N-Methyl Morpholine (NMM) to peptide couple the n-methyl hexene, citing its relative safety compared to HATU. Next, they get clever with the preparation of ring opening. The lead optimization route ties up the free carboxylic acid with a peptide coupling, remembering triphenylphosphine’s oxophillicty, you can gleam that the process route needed to tie up free hydroxyl of the acid thus prompting the careful methanolysis of the bicyclic lactone acid. The aryl moiety is installed using the Mitsunobu again, with a one pot hydrolysis of the ester, and peptide coupling we get to the identical olefin metathesis setup.

Since it’s not obvious, lets talk about the pot economy and purification methods at for the two routes.

In the lead optimization route, they peptide couple, then hydrolyze in one pot, Acid/Base extraction, peptide couple in a second flask, Acid/Base extraction again, Mitsunobu in a third flask.

The process route, peptide couples in refluxing THF, Acid/Base extracts out the cinchodine, using the organic THF layer, the directly subject it to the methanoloysis (note these extractions can be done in the same reaction flask if they use a jacketed flask). They drain the aqueous layer and extract in toluene. close to dry Toluene layer into a fresh reactor(because DIAD decomposes in water) to perform the Mitsunobu and the product is collected in crystalline form. They hydrolyze the methyl ester using LiOH in the same reactor, and drain out the Aqueous layer, and peptide couple using EEDQ.They likely use Boc anhydride in excess after Acid/Base extracting out the used EEDQ. Herein they must purify the Boc Olefin, and dilute to 0.05M to perform the infinite dilution approach in Scheme 3.

Part 3: Ring Closing Metathesis(RCM) and getting to Simperevir

Lead Optimization Route:

The RCM completes in classic conditions, everyone knows there is substantial schlenck techniques necessary with carrying out a Grubbs Hoveyda gen. 1 RCM. They hydrolyze the ethyl ester off the cycloproyl group to afford an acid ready for peptide coupling. To activate carbonyl, they cyclize into the oxazolinone, then added the cyclopropyl sulfonamide to reopen the ring, affording Simeprevir.

Process Development Route:

Given the difficulty of getting an RCM to work on the process scale, I direct you a mini-review called Olefin metathesis on the process scale. They made mention of the process route in Simeprevir. The Simeprevir makes an allusion with “SHD techniques”, “M1 catalyst” the unexplained Boc protection/deprotections of the RCM partners/products.

To get the RCM to work they employed the Ziegler Infinite Dilution. The method hinges on the lemma of competing conversion rates between the RCM product, and oligomerization. Therein, Ziegler et. al show that if the RCM partners are added to a solution of catalyst at a rate identical to its conversion to its RCM product, there will an infinitely dilute concentration of the starting material, and thus oligomerization will be disfavored.

In Higman’s review, their paragraph on Simeprevir, they clear up that SHD techniques meant that the unexplained Boc protection served to increase the concentration at which the RCM partners could be fed into the M1 Catalyst (A specialized ligand for GH RCM) solution. After the RCM, They hydrolyze using NaOH and EtOH, isolating then activating the free carboxylic acid with EDCI in a new flask, likely affording the spiro-oxazolidinone intermediate. Then using DBU to couple the cyclopropylsulfonamide. Simeprevir was collected using a controlled crystallization.

Reflections: What can you take away from this synthesis ?

What can we learn from this synthesis, process syntheses face different issues than traditional synthetic chemists. I think Janssen’s approach embodied the principles of using safer reagents, inventing cheaper methods of enantiomer resolution, and cleverly addressing the longstanding problem of unscalable olefin metathesis. After Simeprevir many other drugs including Paritaprevir(Abbvie), and Telaprevir(Vertex) have come to market using similar process syntheses. I hope to cover them in the next few weeks.

Originally published December 12, 2018

Intuitive Probabilistic Derivation of Black-Scholes

Posted on April 19, 2021 by perez.ivan.e

In this blog post I start the series every way to derive BSM-OPM. Here I demonstrate the easiest most intuitive way. Since RMarkdown API is depreciated, I can’t post the RMarkdown notebook directly on wordpress. I kindly ask you to check it out on github. I basically cover Alexei Krouglov’s derivation.

.pdf version:

Intuitive Probabilistic Derivation of Black Scholes – Option Pricing Formula

Originally published: November 20, 2018

(-)-Aflatoxin B1 (Trost 2003)

Posted on April 19, 2021 by perez.ivan.e

Trost’s synthesis of Aflatoxin B₁, might be a generic synthesis showcasing The Tsuji-Trost Asymmetric Allylic Alkylation (AAA) for a facile prep of the furobenzofuran core. It was a response to the known ways of using enzymatic kinetic resolution to afford their product. This paper goes to showcase Dynamic Kinetic Asymmetric Transformation (DYKAT).

The Synthesis:

The synthesis starts with a Pechmann condensation. The 1967 synthesis also has this step. The coumarin is Iodated to make the handle for the Tsuji-Trost/Heck. The Tsuji-Trost Asymmetric Allylic Alkylaltion sets the stereocenter thus inducing asymmetry for the rest of the synthesis. While not shown in the Chemistry By Design Synthesis, there was supposed to be an intramolecular Heck Coupling. Which completes the prep of the furobenzofuran ring and sets the second stereocenter. The synthesis proceeds to form the cyclopenteneone ring by performing a Friedel-Crafts off aromatic Coumarin scaffold. I questioned here why Trost did a DIBAL-H reduction to afford the vinyl ether instead of maybe a Hydroboration and elimination. This paper talks about accessing different Aflatoxins, which differ based on chirality and substitution on that hemiketal stereocenter. After reduction, the hemiketal is acetylated with acetic anhydride and eliminated to afford (-)-Alfatoxin B₁.

Key Knowledge:

The big addition was taking a racemic γ-tert-butoxycarbonyl-2-butenolide and being able to alkylate asymmetrically. Trost talks about how there was previous work on Kinetic Asymmetric Transformation (KAT), where a chiral palladium complex would form different products depending on the chirality of the substrate. Naturally this facilitates separation of the junk product but leaves much to be desired with a cap at 50% yield. Trost shows a DYKAT where the activated palladium interconverts between from the “mis-matched” to “matched” coordination and likely is caught in transition state “valley” until the substrate is in this matched state.

Taking a look at the arguments presented for the facial interconversion mechanism, I was not impressed that they didn’t get definitive proof for their conjecture. The proposed a Ligand displacement hypothesis, and a furan aromatization. Their proof was that the yield decreased with increased palladium, which meant that the sigma complex was the favored one.

I think a better proof would have been to lock out migration by probing the change in EE with the binding of a Lewis acid. Or attempting to run the reaction in the presence of a Mukiyama silyl enol ether. Other options include, having a chiral alkyl group for the γ-acyloxybutenolide, and seeing how the enantiomeric excess changed. But I think this may not have added much to the already deep knowledge of the Tsuji-Trost reaction umbrella.

Reflections:

Thinking about the disconnects and my lack of instinct when it came to this synthesis, brings to light that I don’t have a flair for syntheses that hinge on transition metal chemistry outside the classic model systems. This synthesis reminds me that you can guess what kind of chemistry to expect, based on the difficult disconnects, the author, and the history of the class of compounds.

Originally published: November 12, 2018

The Quickest High

Posted on April 19, 2021 by perez.ivan.e

Looking through my old synthesis diary I played a game where I could make a fast route to THC. I was quickly disappointed when the best I could reasonably do was 9 or 10 steps (6 after revising and reading different syntheses Trost, Evans, Pinnick’s).

Alpha Bromination of Orange Flavor Ether using NBS and UV light
MOM Protection of Olivitol
In-situ Grignard reagent generation + Palladated Olivetol to do a fancy Kumada Coupling.
Total MOM Deprotection
Sodium ethyl sulfate Demethylation
ZnBr/MgSO₄ mediated S_N

The world record:

Takes simply (+)-p-mentha-2,8-dien-1-ol and olivetol in 1% BF3*(OEt₂) and Anhydrous MgSO₄ in DCM at 273K.

Simply put, Razdan does a retro-friedel crafts using BF₃ etherate as the Lewis Acid, then has acidic conditions to drive the recombination forward. His paper’s goal was to elucidate the mechanism (mind you this was 1974) and had little stereocontrol.

These syntheses show how important starting materials and one’s ability to take advantage of simple reaction kinetics can improve the efficiency of your chemistry. Switching to Organge Flavor Ether from Terpineol saved me a step from my original route. And using olivetol instead of olivtollic acid saved me from doing a Krapcho Decarboxylation. Razdan’s Synthesis was fast because his choice of a relatively obscure monoterpene “(+)-p-mentha-2,8-dien-1-ol”. The terpene had an allyl cation equivalent eager to go through the retro Friedel-Crafts. He shows that all the coupling prep that I did was somewhat unnecessary. So for you next synthesis remember to choose your starting material wisely.

Originally published: November 8, 2018

Almost Surely

Statistics and Programming by Ivan

Monthly Archives: April 2021

Programming is only half the battle

Half-baked: LSE Project Proposals

Pricing Model Calibration Through Stochastic Optimization – Stochastic Optimization Project

launched crobat!

A Study of CUSUM Statistics on Bitcoin Transactions

Introduction to Markov Processes – Foundations of Stochastic Finance

Process Route of Simeprevir(Olysio®)

Intuitive Probabilistic Derivation of Black-Scholes

(-)-Aflatoxin B1 (Trost 2003)

The Quickest High