Profiling Actions API
- class ds_capability.intent.feature_build_intent.FeatureBuildIntent(property_manager: ~ds_capability.managers.feature_build_property_manager.FeatureBuildPropertyManager, default_save_intent: bool = None, default_intent_level: [<class 'str'>, <class 'int'>, <class 'float'>] = None, order_next_available: bool = None, default_replace_intent: bool = None)
This class is for feature builds intent actions which are bespoke to a certain used case but have broader reuse beyond this use case.
- build_profiling(canonical: ~pyarrow.lib.Table, profiling: str, headers: [<class 'str'>, <class 'list'>] = None, d_types: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, drop: bool = None, connector_name: str = None, seed: int = None, save_intent: bool = None, intent_level: [<class 'int'>, <class 'str'>] = None, intent_order: int = None, replace_intent: bool = None, remove_duplicates: bool = None) Table
Data profiling provides, analyzing, and creating useful summaries of data. The process yields a high-level overview which aids in the discovery of data quality issues, risks, and overall trends. It can be used to identify any errors, anomalies, or patterns that may exist within the data. There are three types of data profiling available ‘dictionary’, ‘schema’ or ‘quality’
- Parameters:
canonical – a direct or generated pd.DataFrame. see context notes below
profiling – The profiling name. Options are ‘dictionary’, ‘schema’ or ‘quality’
headers – (optional) a filter of headers from the ‘other’ dataset
d_types – (optional) a filter on data type for the ‘other’ dataset. int, float, bool, object
regex – (optional) a regular expression to search the headers. example ‘^((?!_amt).)*$)’ excludes ‘_amt’
drop – (optional) to drop or not drop the headers if specified
connector_name – (optional) a connector name where the outcome is sent
seed – (optional) this is a placeholder, here for compatibility across methods
save_intent – (optional) if the intent contract should be saved to the property manager
intent_level – (optional) the column name that groups intent to create a column
intent_order – (optional) the order in which each intent should run. - If None: default’s to -1 - if -1: added to a level above any current instance of the intent section, level 0 if not found - if int: added to the level specified, overwriting any that already exist
replace_intent – (optional) if the intent method exists at the level, or default level - True - replaces the current intent method with the new - False - leaves it untouched, disregarding the new intent
remove_duplicates – (optional) removes any duplicate intent in any level that is identical
- Returns:
a pa.Table