Profiling Reports API

class ds_capability.components.abstract_common_component.AbstractCommonComponent(property_manager: Any, intent_model: Any, default_save: bool | None = None, reset_templates: bool | None = None, template_path: str | None = None, template_module: str | None = None, template_source_handler: str | None = None, template_persist_handler: str | None = None, align_connectors: bool | None = None)

An abstract common component class that contains the methods shared across all capabilities. This allows all capability instances to share common behavior in initialization, connectivity management, reporting and running the component.

static canonical_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = None, display_width: int = None, ordered: bool = None, basic_style: bool = None)

The Canonical Report is a data dictionary of the canonical providing a reference view of the dataset’s attribute properties

Parameters:
  • canonical – the table to view

  • headers – (optional) specific headers to display

  • regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.

  • d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]

  • drop – (optional) if the headers are to be dropped and the remaining to display

  • stylise – (optional) if True present the report stylised.

  • display_width – (optional) the width of the observational display

  • basic_style – provide a basic style

  • ordered – (optional) if the result should be in header order

static numeric_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = None)

The Canonical Report is a data dictionary of the canonical providing a reference view of the dataset’s attribute properties

Parameters:
  • canonical – the table to view

  • headers – (optional) specific headers to display

  • regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.

  • d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]

  • drop – (optional) if the headers are to be dropped and the remaining to display

  • stylise – (optional) if True present the report stylised.

static quality_report(canonical: Table, nulls_threshold: float | None = None, dom_threshold: float | None = None, cat_threshold: int | None = None, stylise: bool | None = None)

Analyses a dataset, passed as a DataFrame and returns a quality summary

Parameters:
  • canonical – The table to view.

  • cat_threshold – (optional) The threshold for the max number of unique categories. Default is 60

  • dom_threshold – (optional) The threshold limit of a dominant value. Default 0.98

  • nulls_threshold – (optional) The threshold limit of a nulls value. Default 0.9

  • stylise – (optional) if the output is stylised

static schema_report(canonical: ~pyarrow.lib.Table, headers: [<class 'str'>, <class 'list'>] = None, regex: [<class 'str'>, <class 'list'>] = None, d_types: list = None, drop: bool = None, stylise: bool = True, table_cast: bool = None)

presents the current canonical schema

Parameters:
  • canonical – the table to view

  • headers – (optional) specific headers to display

  • regex – (optional) specify header regex to display. regex matching is done using the Google RE2 library.

  • d_types – (optional) a list of pyarrow DataType e.g [pa.string(), pa.bool_()]

  • drop – (optional) if the headers are to be dropped and the remaining to display

  • stylise – (optional) if True present the report stylised.

  • table_cast – (optional) if the column should try to be cast to its type