- 20 November, 2024
Our EiR Dan Teodosiu and Margaux Wehr sat down with Justin Coffey, Director of Engineering at Google, for a fireside chat on scaling a Data & Analytics team.
Dan was formerly the CTO at Criteo – Europe’s largest ad tech firm, where Justin grew and led the Data & Analytics team. Justin joined Criteo two years before its IPO, and saw the company grow more than 8x in eight years. Justin is currently a Director of Engineering at Google.
At Criteo, they were tracking over 4B devices and browsers, corresponding to close to half the Internet population. They served 4B personalised ads, and handled over 300B real-time bid requests per day. 100PB of data was stored in Hadoop (making it the largest cluster in Europe), and 3TB worth of new data was produced every single day. This was a scale comparable to Twitter or Netflix at the time.
With so much at stake, a best-in-class analytics team couldn’t have been more important. Along the way, Justin and his team learned valuable lessons about building and scaling a world-class Data & Analytics team, and you can find Justin’s top tips for start-ups below.
When should you start building an analytics team?
As early as possible, but think carefully about what – and who – you need.
Never underestimate the value of decision support. Especially when it comes to technical decision support, you need to be thinking about it from the early stage of a company’s growth. The more you can make difficult choices by cutting through the ambiguity of technical and product decisions with data, the better you’ll be.
That said, crawl before you walk. Avoid investing in advanced data science too soon. More to the point, classic decision support, business intelligence, and old-school time series analysis get you a really long way. As your tools are not going to be super mature, you’ll want to privilege folks who are going to be able to cut through the accidental complexity inherent in immature data analytics stacks. You should also hire people who are good with stats and visualisations so that the data you collect isn’t used to tell you the wrong things.
Even once your startup has reached 100 or 200 persons, you’re probably still delivering off your initial “eight-page strategy deck”, trying to discover the one big thing that differentiates you in the market. It takes time to reach product market fit and it is going to need a fair amount of analyst time. Also, you would not want your analytics to be a single point of failure, so you’d want to probably think about a small analyst team, two to three people at the start.
Then, at around 200 or 300 employees, when you’re starting to explore derivatives of your initial business idea, that’s when you need to think about a rotating cast of people for data analytics jobs who are able to handle different types of missions.
At Criteo, the genesis of my job building and running Data & Analytics teams boiled down to just solving a scalability problem.
Justin Coffey Director of Engineering, Google
What are the obvious pitfalls to avoid from day one?
There’s the proverbial “laptop analytics stack”. In the early days at Criteo, there was a local BI team literally building all of their automation on a single laptop. There was a freak-out moment when the cleaning staff came in one Friday night and closed the lid on said laptop and their entire automation pipeline went offline for the weekend. They all came in on Monday and nothing had run, and it was of course at the most inopportune time and caused a bunch of heartburn in the org.
On the other hand, you shouldn’t explicitly try to squash the “Cambrian explosion” of innovation in interesting automation that analysts will come up with. It’s a balance.
So you have to invest at the same time in centralising that automation as quickly as you can, so that you can have a well-lit path to something stable. When something is found to be mission-critical, you want to be able to get it into a safe place.
As you begin to scale your analytics team, what challenges should you expect to encounter?
As you scale your company, growing pains are inevitable. Here are some key learnings from Justin’s time at Criteo:
- Scaling-up analytics hiring is complicated, especially across geographies. Identify the level of technical expertise you need and make sure the analysts and/or software engineers you hire have the necessary skill set. It sounds simple, but getting these hires wrong can be costly and time-consuming.
- Secondly, train your key BIs to build truly scalable queries and data models – Justin talked about the time he had to resolve a query timeout issue urgently that broke the whole daily commercial reporting that the C-suite depended on.
- This was representative of a broader learning: you not only have to solve an education problem, but also a path to production problem as well. At some point you need to come up with a roadmap based around software engineering best practices for being able to version and test queries, data model changes and entire dashboards before they are released to production.
How should you think about your data architecture and data catalogues?
Good data architecture means you’re going to have better outcomes. It’s easier to maintain and adapt, easier for analysts to interface with, and easier for the leadership team to understand how data is flowing at a macro level.
A centralised data catalogue can also be incredibly valuable (though granted, it might require 10x more work than you think!). A good data catalogue can tell people where the best data is, who owns it, and what the nature of the data is; it allows you to understand what data you can deprecate because, for example, you can see that nobody’s using it; it allows you to come up with sane migration plans and things like that. And it’s also critical from a policy perspective in our current and future regulatory environment: data provenance tracking becomes non-negotiable.