We are looking into Purview for Data Catalog, Data Map capabilities. There were multiple threads ~1 year ago with less positive feedback. Has things changed since then? Has anyone implemented in recent months and how was your experience. We are looking at data sources such as ADLS, Synapse, Power BI and number of SaaS softwares.
Sort by:
Be mindful of its integration capabilities with other data platforms. If you need to build a complete end-to-end picture and establish data lineage across platforms like MDM, Data Modeling, etc., you might run into some roadblocks.
My team tried Purview 2-3 years back and it was quite a challenge to use the tool even within Azure/Microsoft environment. In fact we ended up providing a lot of product improvement suggestions to them. Since then, we selected and implemented Informatica's data catalog which provides many connectors to bring metadata from several systems from in our ecosystem. The challenge we face is in embedding these tools in business's regular use. Developers still depend on direct access into the systems and do not necessarily rely on catalogs for information. Within individual platforms, I see Palantir having a very robust data lineage tool and now databricks is also making strides with Unity Catalog. In meanwhile we have also created a custom soluton that brings metadata from all tools, adds some using LLM's and provides that to our users. Comments on the thread from Heather Fara are spot on...
Hey, I have implemented it a couple times lightly and attended a Microsoft training. I can tell you that for years it was a headache because tools and permissions kept moving each week. It's stabilized considerably. It's more affordable if you are an enterprise that is 365 based. It connects to anything, but you may have to purchase some connectors. It forces a governance model that is simple and they put a lot of work into the metamodel that I appreciate. While they call it purview, there are still a few things are messy...the data lineage tools were developed separate from the data cataloging tools, so there remain many artifacts and people can get lost if they don't know how to use the top search bar well. It's wonderful if you are handling DLP, records retention, privacy, etc...as you can manage everything all in place.
Purview requires a team and training to implement. Because it's still newer there are few experts. If you are not 365 based, it make little sense. The world is your oyster and there are loads of options.
I presently implement Collibra. There is a large talent pool to draw from. It's data stewardship focused, less IT-focused, and super flexible in terms of defining the governance framework and metamodels. We can quickly set up automated governance workflows and reports to keep the catalog tidy. These aren't things Purview does easily or at all.
If you are only looking for just the basics there are lightweight solutions that cost pennies in comparison (DataGalxy, Talend, etc).
Whatever your tool of choice, be sure to think very hard about your governance operating model and how you will sustain these capabilities over the long term. Few orgs need everything these tools offer. If you don't operationally invest in the adminstration and change management...these initiatives take off strong and die. Determine what you actually want to improve operationally and base your tool selection on that only.
Purview for Data Catalog is known to have gaps when connecting to technologies outside the Microsoft stack. The bigger challenge is business adoption, most non-technical users find the user experience and navigation less effective. While it works for basic data documentation and integrates well with the MSFT stack, its limitations show quickly when working with non-MSFT technologies. Many leaders pick Purview because of the lower price, but that often comes at the cost of not being able to realize positive ROI.
Here are a few common challenges I’ve heard from people using Purview:
1. Basic documentation, with lots of false positives as the product keeps changing
2. Limited support for non-Microsoft data sources, and weak lineage across systems
3. User experience doesn’t meet the needs of business users, layout isn’t fit for daily ops
4. Search isn’t intuitive, hard to connect and surface meaningful insights across data