Profiling Actions API

class ds_capability.intent.feature_build_intent.FeatureBuildIntent(property_manager: ~ds_capability.managers.feature_build_property_manager.FeatureBuildPropertyManager, default_save_intent: bool = None, default_intent_level: [<class 'str'>, <class 'int'>, <class 'float'>] = None, order_next_available: bool = None, default_replace_intent: bool = None)

This class is for feature builds intent actions which are bespoke to a certain used case but have broader reuse beyond this use case.

build_profiling(canonical: ~pyarrow.lib.Table, profiling: str, headers: [<class 'str'>, <class 'list'>] = None, d_types: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, drop: bool = None, connector_name: str = None, seed: int = None, save_intent: bool = None, intent_level: [<class 'int'>, <class 'str'>] = None, intent_order: int = None, replace_intent: bool = None, remove_duplicates: bool = None) Table

Data profiling provides, analyzing, and creating useful summaries of data. The process yields a high-level overview which aids in the discovery of data quality issues, risks, and overall trends. It can be used to identify any errors, anomalies, or patterns that may exist within the data. There are three types of data profiling available ‘dictionary’, ‘schema’ or ‘quality’

Parameters:
  • canonical – a direct or generated pd.DataFrame. see context notes below

  • profiling – The profiling name. Options are ‘dictionary’, ‘schema’ or ‘quality’

  • headers – (optional) a filter of headers from the ‘other’ dataset

  • d_types – (optional) a filter on data type for the ‘other’ dataset. int, float, bool, object

  • regex – (optional) a regular expression to search the headers. example ‘^((?!_amt).)*$)’ excludes ‘_amt’

  • drop – (optional) to drop or not drop the headers if specified

  • connector_name – (optional) a connector name where the outcome is sent

  • seed – (optional) this is a placeholder, here for compatibility across methods

  • save_intent – (optional) if the intent contract should be saved to the property manager

  • intent_level – (optional) the column name that groups intent to create a column

  • intent_order – (optional) the order in which each intent should run. - If None: default’s to -1 - if -1: added to a level above any current instance of the intent section, level 0 if not found - if int: added to the level specified, overwriting any that already exist

  • replace_intent – (optional) if the intent method exists at the level, or default level - True - replaces the current intent method with the new - False - leaves it untouched, disregarding the new intent

  • remove_duplicates – (optional) removes any duplicate intent in any level that is identical

Returns:

a pa.Table