Commons

class ds_capability.components.commons.Commons
static filter_columns(data: pa.Table, headers=None, d_types: list = None, regex: [str, list] = None, drop: bool = None) pa.Table

Returns a subset of columns based on the filter criteria. The order of filter is d_type, headers then regex.

Parameters:
  • data – the Canonical data to get the column headers from

  • d_types – (optional) a list of pyarrow DataTypes of the columns headers

  • headers – (optional) a list of header strings to select from the columns headers

  • regex – (optional) a regular expression to search from the columns headers

  • drop – (optional) reverses the selection and drops the selected column headers

Returns:

a filtered list of headers

Returns:

pa.Table

static filter_headers(data: pa.Table, headers: [str, list] = None, d_types: list = None, regex: [str, list] = None, drop: bool = None) list

returns a list of headers based on the filter criteria. The order of filter is d_type, headers then regex. Data type are taken from pyarrow.types and should be a string or list of strings that question a data type. For example [‘is_integer’, ‘is_floating’]

Parameters:
  • data – the Canonical data to get the column headers from

  • d_types – (optional) a list of pyarrow.types method names of the columns headers

  • headers – (optional) a list of header strings to select from the columns headers

  • regex – (optional) a regular expression to search from the columns headers

  • drop – (optional) reverses the selection and drops the selected column headers

Returns:

a filtered list of headers

Raise:

TypeError if any of the types are not as expected

static list_diff(seq: list, other: list, symmetric: bool = True) list

Useful utility method to return the difference between two list where the list is unique. Symmetric set to True returns diff in both, False returns the difference of the first to the last

static list_dup(seq: list) list

Useful utility method to return duplicates

static list_equal(seq: list, other: list) bool

checks if two lists are equal in count and frequency of elements, ignores order

static list_formatter(value: Any) list

Useful utility method to convert any type of str, list, tuple or array into a list

static list_intersect(seq: list, other: list) list

Useful utility method to return the intersection between two list where the list is unique.

static list_match(seq: list, pattern: str) list

Useful utility method to run a regular expression on a list

A binary search for a value in a list sequence between two index

static list_union(seq: list, other: list) list

Useful utility method to return the union between two list where the list is unique.

static list_unique(seq: list) list

Useful utility method to retain the order of a list but removes duplicates

static table_append(t1: Table, t2: Table)

appends all the columns in t2 to t1

static table_report(t: ~pyarrow.lib.Table, head: int = None, index_header: [<class 'str'>, <class 'list'>] = None, bold: [<class 'str'>, <class 'list'>] = None, large_font: [<class 'str'>, <class 'list'>] = None)

generates a stylised version of the pyarrow table