The Race To Solve The Data-ML Divide: Designing The Next Data-Centric AI Platform That Can *Actually* Help Drive Value With Gen-AI
Basically π©π»βπ» New Job Alert π©π»βπ»
Gen-AI: Highlighting The Need For A Reconciliation Between Data & ML People
As many of my talk attendees1 and Substack readers know, for a long time I've felt uneasy about how decoupled the data layer has been from the modeling layer.
We know the following:
β
Great data is a necessary ingredient in great ML products and that there's an incredible amount of operational complexity in scaling a single, standalone unimodal application to one of many multimodal services as part of a comprehensive offering. As challenging as the task is to scale ML in production with structured data, unstructured data like audio, images, text, geospatial, etc has additional challenges.
β
Data can be a powerful differentiator & competitive advantage. But if everyone is training & serving on the same data, then there's no real difference between competitors.
β
Unstructured data can be incredibly rich but has also been incredibly time & resource intensive to unlock, requiring a complex orchestration of labelers, data ops teams, data engineers to build up the datasets, MLE's to develop & train models (even offline), platform SWE's to deploy the model pipelines, & hopefully get the model performance logged someplace (maybe in an ELK stack) for a DS or MLE to manually review or script for evaluation, only to go back the source in case there's issues with data quality or more labeled data is needed.
Some Titles Are Useful, All Are Wrong
At one point I was a growth hacker2.
Then almost 1-2 yrs later, a data analyst.
Then a data scientist. Then an early-stage startup βanalytics engineerβ.
And after many pivots & sideways slides, I became an MLOps Engineer. Who leveraged content to become a DevRel. That still codes & helps consult & build ML platforms.
And over the years Iβve received tons of messages asking:
βHow do I break into X/Y/Z career?β
βWhat masterβs program should I do?β
βWhich job should I pick?β
βWhich tools should I use?β
βWhich certificates are more valuable?β
Maybe because Iβm now closer to βolder-than-dirtβ territory versus βspring chickenβ but I have fewer & fewer concrete answers to that category of questions.
And yet itβs important β being able to communicate, define and quantify your value ensures you get your next job, your next client, build your brand, connect with your peers and partners, find your group.
But what a trend that Iβve started noticing as part of my responsibilities as a DevRel is that some of the most viral projects, libraries and tools in the LLM and Gen-AI space werenβt, in fact, built by data scientists or ML engineers.
They were built by individuals or groups of individuals that had the technical capabilities & curiosity without necessarily the domain expertise.
Different Day, Same Problems? I Think Not.
One of the most ridiculous statements Iβd heard a few years ago, from an influencer friend who shall not be named & shamed because theyβre otherwise a great source of inspiration, was that the problem with MLOps was βtoolsβ.
Orly?
Like, a lack of great tooling that solves the *exact* problems that each of the unique individuals, teams and orgs in the world need solved.
Iβm willing to forgive that person because frankly, itβs been a while since theyβd built and shipped an ML system.
Itβs easier than ever to build an incredible project, ship it, and then scale it up.
Donβt want to spin up a ton of infra? Great, you have an abundance of options including serverless.
Donβt want to spend money on a beefy computer? Great, cloud-based IDEs that allow you to ship pre-virtualized & containerized apps written in your language of choice are available.
Donβt want to use Airflow or Kubernetes? Every single one of the major cloud providersand even a bunch of startups offer alternative orchestration and scheduling options.
If youβre a solopreneur, a startup, and even an βintrapreneurβ, you donβt need to be doing the same things that your kindred in enterprise land are doing. Donβt worry, eventually theyβll catch up to what youβre doing because thatβs the circle of life when it comes to disruption.
On That Noteβ¦
Iβm excited to announce that I've joined Labelbox as their Head of AI Developer Relations!
We have a unique opportunity to bridge the many gaps between the people of data, ML, and SWE's.
I'm excited to do my part in helping upskill, empower, and grow the next vanguard of innovators, builders, and makers of ML.
Up Nextβ¦
But most importantly, Iβm excited to have the backing of a company like Labelbox to do more of what Iβve been doing and want to keep doing.
And of the list of exciting projects I have in the works, I can say that Iβll be back to writing regularly on LinkedIn and here about MLOps, Data-Centric AI, the nitty-gritty of build ML Platforms & ML products, and how to navigate the evolving landscape of data & ML.
Additionally Iβm currently scheduled to be speaking at two virtual conferences in October:
Lesbians Who Tech β βThe Fun-Sized MLOps Stack from Scratchβ β Oct 16, 1:30 PMΒ - 2:00 PMΒ PDTΒ Β (30 Min)
- βs Data Engineering And Machine Learning Summit 2023 β βMLOps Beyond LLMsβ β Oct 25, 10:00 AM - 11:00 AM MDT (45 Min)
For Benβs conference Iβll be (virtually) joining friends like
, etc.

Missed my Australia Data Eng Bytes talks? Want to take a crack at the slides? Check them out here: βMLOps Beyond LLMβ, βThe Full-Stack Data Scientist Is Still The Sexiest Jobβ, "Featurization & Feature Stores: A Crash Course In The ML Lifecycle & MLOps"
Video: βThe Full-Stack Data Scientist Is Still The Sexiest Jobβ
Yeah, I know, still sounds like a job youβd only find in Insufferable Valley but it is a real role with impact.