I left AI Research to build an AI Startup. Here is what I learned.
8 important lessons every AI Engineer needs to know to become a CTO
I've been researching AI algorithms in the best labs like Meta AI Research (FAIR) and the University of Oxford, and have 300+ citations including folks from MIT.
I've been also bringing AI to thousands of e-commerce users with Suggestr’s (YC W22) product.
The gap can't be larger!
Here are 8 important lessons every AI Engineer needs to know to become a CTO:
No one gives a fuck how large your model is, how innovative your approach, whether the metric space is complete or what's the cross-validation split. Basically, anything that will fail your paper review means nothing in real life. The only question business asks is "will it solve the problem or not?" (and if yes: “what does it take to implement?”)
The easiest way to make something work is to use something you understand very well and can deploy fast. Hence, the best solutions IRL are often simple algos, hard-coded heuristics, and off-the-shelf solutions like Hugging Face models. The fewer points of failure it has — the better.
Pro tip: very complex algorithms that are used as a black box to produce primitives (e.g. embeddings from transformers) work very well in a combination with simple logic on top of them.
The easiest models to debug are the ones that are simple and that you understand very well. If the good model fails in 2% of unpredictable cases and crashes — it can't be used. Less accurate but bullet-proof heuristics will win.
There is no such thing as a controlled environment! Everything should work for any user at any time of the day. Forget about the hyper-parameters that have to be tuned or limited knowledge domains. The world is unlimited, and you should be prepared for that.
Pro tip: we did everything to go away from absolute values to sorting, ranking, and percentiles, because these are things that let you sleep tight at night.
You should know the problem user solves better than a solution! Only by going deep into the domain and trying to solve it from the first principles, you'll be able to produce the best solution, account for all the edge cases, and see the big picture.
Throw away all the variables and parameters you don't understand. Here, less is more. For example, we would recreate the Collaborative Filtering and find the best variations of it for our use case. It's like understanding the code versus what it does. You should be able to explain your algorithm step-by-step without formulas.
Everything has its price, you can't research endlessly for the sake of research. R&D costs time and money and in a startup, you always lack both. The last 10% of accuracy takes 90% of the time, so try to concentrate on the first 10% of efforts and focus on the impact/effort ratio and not accuracy or AUC. Use the elbow method to find the point when more efforts stop giving results.
AI doesn't exist in a vacuum. To train the model you need data, to store the data you need storage, to run the inference you need resources, infra, backend logic. You can't add the PyTorch model when the first one is in TensorFlow because dependencies will brutally kill you.
You can't use any data, you can't utilize all the resources of the server, and you can't afford crashes because they will take down all the frontend (ideally you don't build it like that). You should think about your neighbors because a hole in a wall is a problem for both of you.
AI doesn't exist at the moment in time. And it runs more often than just during train and test runs. If it won't run one night then you will wake up and run to your laptop. Production algorithms need maintenance, and the less maintenance your model needs — the better.
In a conclusion: a business doesn't value Novelty, Coolness, or even Beauty of your algorithm, what it values is Simplicity, Transparency, and Effectiveness of your solution minus the Cost of deployment and maintenance.
The same, by the way, holds true for engineers 😉
Subscribe to have 100% uptime for your models and share with friends to be able to explain transformers without formulas.
You can reply and tag me on Twitter—the only platform I will engage at.