How We’re Solving Data Discovery Challenges at Shopify

Another hyperscale tech company, another home-grown metadata product. I've written a lot about these over the years, and Shopify's Artifact seems to be very solid, although I'm not sure that it's incredibly different from others like Amundsen and DataHub. A couple of things I liked about it:

  1. The team aggressively gauged success using customer surveys, and meaningfully made an impact in data workflows. Really impressive, and indicative of what this category of tooling can achieve.
  2. The post actually outlines how the team thought about the build/buy decision. This is the first time I've seen a team reflect on this, and one of their reasons really resonated with me:
At Shopify, we have a wide range of data assets, each requiring its own set of metadata, processes, and user interaction. The tooling available in the market doesn’t offer support for this type of variety without heavy customization work.

Essentially: this product needs to integrate with everything in order to be useful, and given the complexity of the data infrastructure at any sufficiently large organization this will almost never be true without significant internal engineering work.


Want to receive more content like this in your inbox?