Get Mystery Box with random crypto!

The hardest thing about Machine Learning is NOT about training | Artificial Intelligence

The hardest thing about Machine Learning is NOT about training models or the math behind the algorithms!

In fact, that might the easiest thing and interestingly enough, that is the only thing that is taught in school or in textbooks.

One of the most difficult pieces is first to be able to frame a business problem as a machine learning solution. This requires to have strong problem solving skills. That is often the reason why it is fashionable to hire PhDs for those roles. Often it is about imposing a constraint mandated by the business needs that will require you to dissect the mechanisms of a ML algorithm to adjust its behavior, invent something new or implement an algorithm from a paper. In a company that has more maturity when it comes to ML, it is also important to consider the ML system design aspect. To me, that is one of the most interesting aspect about ML (it is also required during the FAANG interviews for ML roles) but you won’t find textbooks about it! ML is not about training of model, it is about building a solution.

Here a couple of materials:
- Machine Learning Systems Design course in Stanford: https://stanford-cs329s.github.io/syllabus.html
- The ML booklet by Chip Huyen: https://huyenchip.com/machine-learning-systems-design/toc.html
- Grokking the Machine Learning Interview: https://www.educative.io/courses/grokking-the-machine-learning-interview

The second hard thing about ML is designing a good deployment strategy.
It is driven by a good ML system design, and it relates to what we call MLOps. How do you design an A/B testing pipeline? Do you use canary deployments to release your models? Do you need a feature store? Do you need to create an online training process? How do you continue to aggregate training data? How do you couple your serving pipeline to your training pipeline? How do you monitor your models and what is the process if something goes wrong? Are your inferences fast enough? Are your data pipelines taking into account CCPA or GDPR regulations? That requires a lot of budget, head counts and skill sets to hold everything together, and believe me that is tricky! I am aware about this course that addresses some of those issues: https://www.coursera.org/specializations/machine-learning-engineering-for-production-mlops

Remains training a model, the easiest thing to learn about and automate! (Credit: Damien Benveniste, PhD)