Machine Learning and Operations (MLOps), what is it?
MLOps is the discipline within Machine Learning and DevOps that is focused on getting Machine Learning and Data Science models into production and implementing observability on them. Simply stated, ML Engineers and Data Scientists need to be able to deploy their own models, reduce the time to production for their models and see what is happening on those models through logging, alarms and infrastructure metrics.
Why does this sound easier said than done?
Well, typically the brunt of MLOps work requires skills that most Machine Learning Engineers don’t have. It’s work that more strongly lies in the domain of Data Engineers and DevOps Engineers. Not only are there barriers on a skill level, but there are also barriers on the operational and management level. A Machine Learning Engineer might have the skill to deploy their models to production, but if they don't have the rights nor operational guardrails (logging, monitoring, alarms, rollbacks), the skills don't matter.
Why is this important?
There are times where 90% of ML models don’t even make it into production. By our personal experience and estimates it’s most likely closer to 30% - but that is still a massive time and money sink. Not only is this important out of a budget perspective, but ML Engineers and Data Scientists consistently rank the deployment and consumption of their models as a top metric of work satisfaction.
What can you do?
We have to realise that ML Engineers have a very specific skillset and so do Data Engineers and DevOps Engineers. Strong DevOps Engineers will create the magic that allows your ML Engineers to deploy frequently without fear of breaking things. Strong Data Engineers will set up the best ETL or self-service ETL for ML Engineers and Data Scientists. Management’s job is to break down the barriers between all of these parties.
Let's get specific on solutions for some of the above.
Model Deployment & CICD:
Our favourite solution stack for AI Deployment & CICD is
Github Actions
Airflow
Modelbit
AWS Sagemaker and GCP Vertex AI are also great end-to-end tools, but they can become quite expensive.
Self-Service ETL:
DBT is a great tool to enable ML Engineers to get their own data at the right time. Databricks would be the Ferrari version, at a pricier endpoint. It would be remiss of me not to mention the magic that is a proper Semantic Layer to enable aggregated insights for ML Engineers and Data Scientists.
AI Observability:
Grafana (they have a very generous free cloud tier) or Google Cloud Monitoring are our go-to Logging, Observatility and Alarms tooling.
If you're in the Cape Town area and find this insightful, share it to someone that needs it!