My Non-Linear Path to OpenAI - (Part 1 of 2)

A Journey in Endless Curiosity - The Pre-ChatGPT Era

Dec 26, 2025

I did not intend for RL Diary to become a place where I document my own life. I am making an exception with this post because I believe there is real value in writing this down for someone just starting out in AI. I recently joined OpenAI as a Solutions Architect and I find it an extraordinary privilege to work alongside people who are redefining how the world works and lives.

While this milestone feels deeply personal, I attribute it entirely to the generosity of people who built open resources, shared their time, and created the conditions for me to learn. That generosity carries a responsibility: to document what I did, make it legible, and pay forward the benefits of work I was able to build upon.

In this post, I have tried to faithfully capture my own journey chronologically from not knowing how to write a single line of code to being able to understand ML models deeply and build state of the art AI systems that scale. This is going to be a long read, but if you get till the end, I promise you’ll walk away with a story that is uniquely my own and may be a few hacks that you can adapt to help with your own journey.

One important note - There are a few associate links in this post to books on Amazon and courses in Coursera. If you choose to buy them, I’ll get a small commission (though you pay nothing extra). All proceeds will go to the Great Ormond Street Hospital Charity, a national specialist children’s hospital within the NHS, UK that treats children with rare, complex, and life-threatening conditions - A cause I deeply associate with.

5 years ago - One Christmas Holiday

The starting point was the Christmas holidays of 2020. A few months earlier, I had left my role as an account executive at an engineering firm to pursue an interest in writing and content creation in a newsletter format. My focus at the time was personal finance and investing, particularly algorithmic trading strategies. I was interested in designing experiments using historical market data—building trading rules, backtesting them, and evaluating outcomes. A very specific newsletter article I wanted to write about was on investing in IPOs and if these are indeed profitable over the long term.

Very quickly I ran into a hard constraint. Without the ability to analyse large datasets, this was infeasible. I could not load market data, manipulate price series, or encode trading rules in a way that could be tested. The missing capability was straightforward: I could not write code. Closing that gap—learning to write code well enough to work with data—became the necessary first step. That realisation marked the beginning of everything that followed.

I picked up a copy of ‘Learn Python Programming’ by Fabrizio Romano during the holiday break. At nearly 500 pages long, all it took was to get through about 50 pages per day during the Christmas break. At the end that, I felt comfortable enough to be able to do basic operations on Python in a Jupyter notebook.

However, I realised quickly that in order to load large CSV files containing pricing data, I needed more than just the barebones. Pandas and NumPy are two absolutely fundamental Python libraries for data analysis. I rapidly skimmed through Mastering Pandas for Finance by Michael Heydt and Python for Data Analysis by Wes MicKinney during the ensuing 2 week period in order to get familiar with the material.

Using only what I’d learned, I managed to then write code to ingest IPO listing data and run simple backtesting experiments. I found interesting insights from historic data and wrote an article describing them to my readers.

In hindsight, this sounds very much like I’d only climbed a tiny mole hill. However, the sense of achievement I felt at that point in time and the gratitude I felt for having learned something extremely new can hardly be described in words. Notably, none of this involved machine learning or AI; it was a clear demonstration to me at that point in time that basic programming and data manipulation alone were already powerful enough to do meaningful work.

Tip 1: For anyone learning to write code for the first time, I highly recommend picking up a book and learning from it at the beginning. A book imposes structure, reduces distractions, and provides a coherent path through unfamiliar material. For a first pass at programming, that combination matters more than optimising for novelty that digital resources offer. Beyond that, the book also offers the opportunity to get familiar with IDEs as well as to type code manually from books rather than copying and pasting them from the browser.

Tip 2: I highly recommend starting with a big, bold problem or a burning curiosity in mind, and then working backwards to what you actually need to learn to solve that problem. This applies to any kind of learning. This is how I learnt to swim when I was 36! I wanted to go scuba diving to see the richest corals in the world. Only problem, I didn’t know how to swim. Solution - Learn to swim. These big audacious goals provide purpose, a much bigger one than that is merely centred on learning.

Over the following months, I dug up research papers on financial trading strategies and tried to recreate those experiments completely ground up. I worked on a rather diverse array of topics that interested me without any constraints. In one experiment, I looked at statistical arbitrage and mean reversion trading strategies. In another, I worked on thematic allocations and rebalancing strategies.

This was during Covid-19 and at a time when Coursera was made available for free to everyone. I used the opportunity to try and swallow vast amounts of learning content in a very short span of time. One of them was the introductory course to portfolio construction and analysis with Python that really helped me learn the mechanics of Jupyter Notebook and become a super user of the tool. Plotting data with Matplotlib and creating visualisations with charts helped add another dimension to my learning experience.

At the expense of sounding old, I will say this - Learning to code for the first time, independently, was very different pre-ChatGPT. I would spend hours in Stack Overflow looking for solutions to problems I faced, bugs I came across. Surprisingly enough, 99% of the times, someone on the internet also faced the same issues I faced and someone else had a solution to it. The discovery cost associated with finding a solution to the problem was quite high and it impeded my speed of learning. But there is an argument to be made that this information search process led to better information retention. Let me know what you think!

The enticing idea of machine learning and super-intelligence

Around this time, I also became interested in Machine Learning. The idea of an intelligent system with far superior intelligence and computational abilities than us, able to analyse large volumes of data, find statistical patterns, correlations, and make decisions that would provide me with an edge over other stock traders was naively very attractive. To explore this, I enrolled in an introductory machine learning course on Udemy. The curriculum was almost entirely centered on scikit-learn, a library that is extremely useful for a broad range of classical ML model training needs. That course is no longer available. For those interested, I instead recommend this one from IBM called “Machine learning with Python“.

This three-week crash course gave me practical fluency. I learned to train a range of models using the standard abstractions and boilerplate provided by scikit-learn, and—just as importantly—to navigate Python documentation efficiently: reading function signatures, understanding expected inputs and outputs, and integrating unfamiliar APIs into working code.

What it did not provide was depth. I did not yet have a strong understanding of how more advanced models worked internally, the mathematical foundations behind them, or the subtleties embedded in data processing and training pipelines. That gap became increasingly apparent. I’ll return to how I addressed it shortly, but before that, it’s worth outlining the additional coursework I completed on Coursera.

These are gold standards for anyone setting out to learn machine learning, and I very highly recommend them.

The math behind it all

While it is entirely possible to become a ML practitioner by simply using open-source libraries like scikit-learn, any study of ML is incomplete without at least a basic understanding of the mathematical foundations behind some of these training algorithms. Perhaps the most important thing is to build intuition for how these models work. Take for instance the KNN algorithm, the mechanics can quite simply be described as - “tell me who your friends are and i’ll tell you who you are”. Simple mental models like these at the beginning made it possible for me to interpret results and debug unexpected behaviour.

During this time, I was also preparing to give my CFA exams. CFA levels I and II are quite quant-heavy, and they cover a lot of material around linear & logistic regression, time series modelling, autoregressive models, probability and advanced statistics. This came very handy while trying to understand the quantitative methods behind some of these prediction systems. But the minute I pivoted to deep learning, the CFA advantage vanished.

In 2021, Deep Learning was gaining significant traction in the research and ML community. The CFA Level 2 curriculum provides a high-level overview of neural networks and multi-layer perceptrons. While the mathematical intuition for how these models make predictions is still quite elusive, the mechanics of how these models are trained is very clear. We take inputs that are vectors in the multi-dimensional space, perform a series of numerical computations (both linear and non-linear), and evaluate the output prediction against the ground truth. A loss is computed based on this evaluation, and the model parameters are adjusted using something called back-propagation to make better predictions in the future - easy enough. But this simple explanation hides an enormous amount of subtlety. How does back-propagation work? What is a loss function? Is there only one kind of loss function? How are inputs encoded?

I came across a fairly unassuming website called - Neural Networks And Deep Learning.com. If this is the first time I were going to learn neural networks, I would still start on this website. The authors consider a challenge that we find is easily solved today - recognising handwritten digits from the MNIST dataset - and provide an incredibly detailed, step-by-step instruction for how to put together a basic deep learning algorithm. Putting the code together and training my first deep learning model was an ecstatic moment. I remember staring at my terminal window for a very long time as each evaluation run printed prediction accuracy scores and the model became better, and better, and better.

The one dark region in that learning material, however, was the back propagation. I couldn’t make much sense of the gradient descent algorithm at all. Intuitively, it had an appeal to it. But the maths behind it was much less clear. The limitation was down to my lack of knowledge of multivariable calculus. The last time I did any calculus was over a decade ago and that was in the first and second years of university. My linear algebra was also a bit rusty and needed some brushing up. What ensued in my life over the next two months can be aptly described as me going down the rabbit hole of trying to conquer multivariable calculus and linear algebra! While the learning process was quite intellectually satisfying, the amount of material to get through was absolutely endless.

Tip 3: Avoid what I did and do this instead - There really are only 2 resources that you need.

3blue1brown, perhaps the greatest math teacher in the digital wold, has published a number of videos on multivariable calculus and linear algebra. This content is terrific joy to consume. Start today!
Sal Khan of Khan Academy fame, the second greatest remote-first math teacher in the world, has excellent material available in this link on multivariable calculus.

Nevertheless, the material I learned gave me a remarkable foundation to build upon and helped me appreciate the pros & cons of deep learning. It proved to be critically useful for what came next - custom training deep learning models on problems that I found to be interesting using a ground-up approach with Pytorch.

I’ll close this section by emphasising a practical point. A deep, formal mastery of the underlying mathematics—while undeniably valuable—is not a prerequisite for applying many well-established machine learning algorithms to real problems. What matters far more early on is a clear visual understanding of how these systems are structured, combined with sound intuition for the mathematical ideas that drive them. In practice, that level of grounding is sufficient to experiment productively, interpret results, and build useful systems.

Enter Reinforcement Learning

I then ventured into RL. How I learnt reinforcement learning deserves a fully dedicated post on its own, but I will try and distill the core essence of it in the next two paragraphs or so.

RL is the most refreshing machine learning technique of all - at least, I think so. It requires no special effort to build an intuition about it - the core idea is that you get an untrained agent deployed in a simulated environment where it makes decisions, gets them wrong most of the time, right some time, and learns from it all through repetition - Fascinating, isn’t it?

There was a lot of press coverage around this time about the AlphaGo, an RL agent trained by Google DeepMind that achieved world-class performance. While on its own this was impressive, what was more impressive was the commentary from some of the GO experts who suggested that this RL agent made very unusual game plays that a human wouldn’t traditionally do. While absolutely smitten in love by AlphaGo, I needed a different game and a different challenge to learn RL. Although not very well, I used to play contract bridge quite a lot while I was at university. So I took upon myself to train a reinforcement learning model from ground up that can play the game of contract bridge. Barto and Sutton’s Reinforcement Learning: An Introduction is a timeless classic and provides the most wonderful treatment to RL. I bought a copy of the book and went page by page.

Tip 4: I recommend really taking your time to learn the material from Barto and Sutton. The way I learned it,

I’d go through a whole chapter and read through it.
Then I would go back to the most important algorithms and pseudo code provided in that chapter and implement it from scratch.

Every chapter has one or more toy problems and the pseudo code describes the algorithm that can be used to solve it. Implementing these from scratch did two things - 1) It helped me get a grassroots understanding of the various RL algorithms; and 2) Possibly more importantly, the feedback loop provided immense gratification and dopamine hits that I became addicted to - I still am, please call for help!

Now, this is not for the faint-hearted. Learning the material this way took well over three months, and I was doing this full-time! But this phase of my life laid the much-needed ground-work I required when later doing the reinforcement learning course at Stanford as well as when I was applying RL to fine-tune LLMs earlier this year. My ability to read RL papers that now seem to pop-up one every day, understand and critique the content, quickly apply them to some of the problems I am working on are fully attributed to the learning approach I adopted with RL.

A fun exercise I did while going through the content was looking at equity portfolio allocations at different stages of one’s lifetime and RL training a model to make an optimum allocation decision that maximised risk-weighted returns. The simple python notebook I built is here and shows how much of your assets should be invested in equities depending upon how many years of working life you have left. (Core Finding - The RL agent says if you are 50 or younger, you should be fully invested in equities. Don’t hold me accountable).

A second, and equally useful material I consulted was OpenAI’s Spinning Up. There are lots of helpful links in there for the RL enthusiast. Goes back to show how OpenAI’s roots are deeply entrenched in RL. At that time at least, I overlooked the fact that OpenAI might just change the world forever and I might just end up working for them. But please can I be excused for not being able to predict the future :-)

At the end of the material, I needed to go away and build what I’d set out to at the very outset. I picked up a copy of Deep Learning and the Game of Go and tried replicating their implementation of AlphaGO to the game of Contract Bridge. I underestimated the search space of the possible rollout paths for a card game compared to GO. Over the following several weeks, I spent an enormous amount of time learning Monte Carlo tree search (MCTS) methods and once again went down the rabbit hole, a different one this time, of learning about and trying to apply every search space pruning algorithm out there in my code. While I say that as if it was a mistake, I also have to mention that the stuff I learnt about MCTS came to rescue much later when RL tuning LLM driven agentic systems in locating errors in conversational traces - In simple words, it wasn’t an utter waste of time.

This was also the time when I learned multi-processing, asynchronous execution and parallelisation in Python in order to simulate all four players in a game of Contract Bridge. I guess the take away is that learning something for the sake of it restricts the learning path to the recommended course structure. Taking a bigger problem or a morbid curiosity and trying to solve for it by learning things, while provides no single straightforward curriculum, broadens the scope of inquiry.

Language as the second frontier of intelligence

While I consider numeric data the first frontier of intelligence, language is clearly the second. Natural language processing (NLP) techniques were at their infancy when my learning journey started. Sentiment analysis was considered state of the art - how quickly the world around us changes! Researchers were fascinated by vector representations of words and how these representations helped deduce pattern in language at scale - sentiments, translations, word and entity relationships and the full shebang. As someone who had a keen interest in algorithmic trading strategies, I was drawn to NLP too. The problem I took upon myself was simple - If I could quantify stock $TICKER level change in investor sentiment by scraping conversations from a web forum, twitter and newspaper articles in real time, then I could build a sentiment-led momentum trading strategy. Grand! How do i do this now?

The first challenge was scraping the information and cleaning it up. Say this after me - I will become a data pre-processing specialist first before becoming a data scientist. No data, no data science. As my learning journey would take me, I spent the following several weeks learning to acquire and scrape data from websites and data providers (Today, web scraping is a full time job at several firms). I had to get comfortable learning to handle external REST APIs, sending POST/GET requests, using Selenium and Playwright to simulate headless web browser windows and such. Once the data was extracted it had to be parsed to get the useful stuff from all the HTML and XML tags using beautifulsoup. And when that was all done, I had to build rule based systems to extract the stock $TICKER, manually label data to train a sentiment analyser, which then finally brought me to the most exciting and the simplest part of the whole process, actually training a sentiment model. Now, it gives me great joy to say that all of this can be done in a little under an hour with Codex and the OpenAI responses API. I will say it again - how quickly the world around us changes!

As you are well aware, NLP, NLU and language generation are all solved problems. And if you are setting out to learn NLP, I’d guide you towards learning about transformers based generative models than ask you to follow the curriculum I first learnt when I entered the field. I will provide much more commentary on generative language models in the next and final article in this series. Clustering and large scale text classification algorithms are perhaps the only NLP techniques that are of some use still - and even this is only true if you are dealing with extremely large volume of data. For the simpler use-cases, LLMs are the only way to go.

The other stuff

The single most important python framework to learn for any aspiring data scientist/machine learning engineer is Pytorch. There are lots and lots of useful guides that can help learn it. My recommendation, and the learning path I took, is to learn it in the context of solving a problem (surprised yet?).
I did venture out to learn Convolutional Neural Networks (CNNs) to train a model to tell the difference between a cat and dog by looking at image inputs. CNNs are a whole branch of ML science on their own, but getting a preliminary understanding with the main intent of learning Pytorch worked really well for me.
If you are ever after inspirations for ML/AI challenges to work on, Kaggle is the place to look at. The dataset for the Dog vs Cat challenge is here. The titanic dataset here is where most ML careers were born.
The underdogs in ML world, at least according to me, are recommender systems. They are very low fuss and yet widely used in consumer apps. They bring in lots and lots of $$$. I am not as knowledgeable in recommenders as I’d like to be. But Andrew Ng’s course on coursera I have linked above provides an excellent treatment to this.
I went through a steep, steep learning curve in managing python environments, using terminal commands, setting up remote servers in AWS, setting up git and doing version control, installing CUDA drivers for GPUs and the likes. These are for an engineer essentially friction in the process, but one that we all have to learn to overcome. I do not recommend a curriculum for these. Just learn to overcome these obstacles as and when you face them.

If this write-up felt like an eulogy to a time that has come by and gone past, then that is exactly how I intended it to be. Chronologically, this was also the time when ChatGPT was released. And to OpenAI, I owe everything for how my own life changed after November 2022! More about that in the next and final piece.