Data is the necessary ingredient for Machine Learning projects, and more and more companies rely on such systems for important decision process. Therefore, missing or incorrect data can have a negative impact in the downstream business processes.
In this post I review some important concepts on data validation and showcase existing solutions specicially for Pandas data-frames.
Why Data Quality?
Although most of the people agrees with the important of making sure the quality of the data it has been my experience that few organizations pay enough attention to it. It might be for the following reasons: