Profiling Reports API
- class ds_capability.components.abstract_common_component.AbstractCommonComponent(property_manager: Any, intent_model: Any, default_save: bool | None = None, reset_templates: bool | None = None, template_path: str | None = None, template_module: str | None = None, template_source_handler: str | None = None, template_persist_handler: str | None = None, align_connectors: bool | None = None)
An abstract common component class that contains the methods shared across all capabilities. This allows all capability instances to share common behavior in initialization, connectivity management, reporting and running the component.
- static canonical_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = None, display_width: int = None, ordered: bool = None, basic_style: bool = None)
The Canonical Report is a data dictionary of the canonical providing a reference view of the dataset’s attribute properties
- Parameters:
canonical – the table to view
headers – (optional) specific headers to display
regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.
d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]
drop – (optional) if the headers are to be dropped and the remaining to display
stylise – (optional) if True present the report stylised.
display_width – (optional) the width of the observational display
basic_style – provide a basic style
ordered – (optional) if the result should be in header order
- static numeric_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = None)
The Canonical Report is a data dictionary of the canonical providing a reference view of the dataset’s attribute properties
- Parameters:
canonical – the table to view
headers – (optional) specific headers to display
regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.
d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]
drop – (optional) if the headers are to be dropped and the remaining to display
stylise – (optional) if True present the report stylised.
- static quality_report(canonical: Table, nulls_threshold: float | None = None, dom_threshold: float | None = None, cat_threshold: int | None = None, stylise: bool | None = None)
Analyses a dataset, passed as a DataFrame and returns a quality summary
- Parameters:
canonical – The table to view.
cat_threshold – (optional) The threshold for the max number of unique categories. Default is 60
dom_threshold – (optional) The threshold limit of a dominant value. Default 0.98
nulls_threshold – (optional) The threshold limit of a nulls value. Default 0.9
stylise – (optional) if the output is stylised
- static schema_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = True, table_cast: bool = None)
presents the current canonical schema
- Parameters:
canonical – the table to view
headers – (optional) specific headers to display
regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.
d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]
drop – (optional) if the headers are to be dropped and the remaining to display
stylise – (optional) if True present the report stylised.
table_cast – (optional) if the column should try to be cast to its type