ruleminer package#

Submodules#

ruleminer.const module#

Constants module.

ruleminer.encodings module#

Encoding definitions

ruleminer.metrics module#

Metrics module.

ruleminer.metrics.calculate_metrics(len_results: dict = {}, metrics: list = [])[source]#

ruleminer.metrics.metrics(metrics: list = [])[source]#

ruleminer.metrics.required_variables(metrics: list = [])[source]#: This function derives a set of variables that are needed to calculate the metrics

ruleminer.parser module#

Parser module.

ruleminer.parser.dataframe_index(expression: str = '', required: list = []) → Dict[str, str][source]#

Parse a rule expression and generate corresponding DataFrame index expressions.

This function takes a rule expression, such as ‘if A then B’, and a list of required variables (‘N’, ‘X’, ‘Y’, ‘~X’, ‘~Y’, ‘X and Y’, ‘X and ~Y’, ‘~X and ~Y’). It then generates corresponding DataFrame index expressions for each required variable based on the given expression.

Parameters:

expression (str) – A rule expression in the format ‘if A then B’.
required (List[str]) – A list of required variables for which to generate
expressions. (index)

Returns:

A dictionary where keys are required variable names, and values are corresponding DataFrame index expressions.

Return type:

Dict[str, str]

Example

>>> expression = 'if ({"A"} > 0) then ({"B"} < 10)'
>>> required = ['X', 'Y', 'X and Y']
>>> result = ruleminer.dataframe_index(expression, required)
>>> print(result)
{
  'X': '__df__.index[((__df__["A"] > 0))]',
  'Y': '__df__.index[((__df__["B"] < 10))]',
  'X and Y': '__df__.index[((__df__["A"] > 0)) & ((__df__["B"] < 10))]'
}

ruleminer.parser.dataframe_lengths(expression: str = '', required: list = []) → Dict[str, str][source]#

Calculate lengths based on an rule expression and generate corresponding values.

This function takes a rule expression, such as ‘if A then B’, and a list of required variables (‘N’, ‘X’, ‘Y’, ‘~X’, ‘~Y’, ‘X and Y’, ‘X and ~Y’, ‘~X and ~Y’). It calculates lengths or sums of data based on the given conditional expression for each required variable.

Parameters:

expression (str) – A conditional expression in the format ‘if A then B’.
required (List[str]) – A list of required variables for which to calculate
sums. (lengths or)

Returns:

A dictionary where keys are required variable names, and values are corresponding length or sum calculations.

Return type:

Dict[str, str]

Example

>>> expression = "if ({"A"} > 0) then ({"B"} < 10)"
>>> required = ['X', 'Y', 'X and Y']
>>> result = ruleminer.dataframe_lengths(expression, required)
>>> print(result)
{'X': '((__df__["A"] > 0)).sum()', 'Y': '((__df__["B"] < 10)).sum()',
'X and Y': '(((__df__["A"] > 0)) & ((__df__["B"] < 10))).sum()'}

ruleminer.parser.dataframe_values(expression: str = '')[source]#

Extract values from a Pandas DataFrame based on an expression.

This function constructs a Pandas DataFrame expression to extract values based on the provided expression. The expression is wrapped in square brackets to retrieve the values from the DataFrame.

Parameters:

expression (str) – An expression used to filter or access
values (DataFrame)
e.g.
0}". ("{Column_A >)

Returns:

The Pandas DataFrame expression to retrieve values based on the provided expression.

Return type:

str

Example

>>> expression = '{"A"} > 0'
>>> result = ruleminer.dataframe_values(expression)
>>> print(result)
"__df__[(__df__["A"] > 0)]"

ruleminer.parser.function_expression()[source]#

Define a ruleminer function expression

This function defines a function expression. It uses pyparsing to define the syntax for function calls with parameters, including basic mathematical operations and comparisons.

Returns:: a function expression
Return type:: pyparsing.core.Forward

Example

>>> expression = 'substr({"A"}, 1, 1)'
>>> result = ruleminer.function_expression().parse_string(expression)
>>> print(result)
['substr', ['{"A"}', ',', '1', ',', '1']]

ruleminer.parser.math_expression(base: Forward = None)[source]#

Define a ruleminer mathematical expression

This function defines a mathematical expression. It uses pyparsing to define the syntax for function calls with parameters, including basic mathematical operations and comparisons.

Parameters:: None
Returns:: a mathematical expression
Return type:: pyparsing.core.Forward

Example

>>> expression = '{"A"} > 0'
>>> result = ruleminer.math_expression().parse_string(expression)
>>> print(result)
[['{"A"}', '+', '{"B"}']]

ruleminer.parser.pandas_column(expression: str = '')[source]#

Replace column names with Pandas DataFrame expressions.

This function takes a string containing column names enclosed in curly braces, e.g., {“A”}, and converts them to Pandas DataFrame syntax, e.g., __df__[“A”].

Parameters:

expression (str) – A string with column names enclosed in curly
braces
e.g.
{"A"}.

Returns:

The Pandas DataFrame expression with columns in the format __df__[“A”].

Return type:

str

Example

>>> expression = '{"A"}'
>>> result = ruleminer.pandas_column(expression)
>>> print(result)
"__df__[A]"

ruleminer.parser.rule_expression()[source]#

Define a ruleminer rule expression

This function defines a ruleminer rule expression. It uses pyparsing to define the syntax for conditions and rule syntax

Parameters:: None
Returns:: a ruleminer rule expression
Return type:: pyparsing.core.Forward

Example

>>> expression = 'if ({"A"} > 0) then ({"B"} < 0)'
>>> result = ruleminer.rule_expression().parse_string(expression)
>>> print(result)
['if', ['{"A"}', '>', '0'], 'then', ['{"B"}', '<', '0']]

ruleminer.ruleminer module#

Main module.

class ruleminer.ruleminer.RuleMiner(templates: list = None, rules: DataFrame = None, data: DataFrame = None, params: dict = None)[source]#

Bases: object

The RuleMiner object contains rules and data

It used three basic functions: - update (rule expressions, rules or data) - generate (rules from rule expressions and data) - evaluate (results from rule)

add_results(rule_idx, rule_metrics, co_indices, ex_indices) → None[source]#

Add results for a rule to the results list.

This method adds results for a specific rule, including its metrics and indices, to the results list. It updates both the results for confirmations (co_indices) and exceptions (ex_indices). If no indices are provided for either category, an error message is logged.

Parameters:

rule_idx (int) – The index of the rule in the rule list.
rule_metrics (dict) – A dictionary of rule-specific metrics for evaluation.
co_indices (list or None) – A list of indices for confirmations.
ex_indices (list or None) – A list of indices for exceptions.

Returns:

This method updates the results list in-place.

Return type:

None

Example

>>> rule_index = 0
>>> metrics = {
...     'absolute_support': 50,
...     'absolute_exceptions': 5,
...     'confidence': 0.9
... }
>>> co_indices = [1, 2, 3, 4, 5]
>>> ex_indices = [10, 11, 12, 13]
>>> add_results(rule_index, metrics, co_indices, ex_indices)

add_rule(rule_id: str = '', rule_group: int = 0, rule_def: str = '', rule_status: str = '', rule_metrics: dict = {}, encodings: dict = {})[source]#

Add a rule with information to the discovered rule list.

This method adds a new rule to the discovered rule list. The rule is defined by a unique rule ID, a rule group, a rule definition (expression or description), a rule status, a dictionary of rule- specific metrics, and a dictionary of encodings used in the rule evaluation.

Parameters:

rule_id (str) – A unique identifier for the rule.
rule_group (int) – An integer representing the group or category
belongs. (to which the rule)
rule_def (str) – The definition or expression of the rule.
rule_status (str) – The status of the rule
rule_metrics (dict) – A dictionary of rule-specific metrics and
values. (their)
encodings (dict) – A dictionary of variable encodings used in the
rule.

Example

>>> my_rule = {
...     'rule_id': 'R001',
...     'rule_group': 1,
...     'rule_def': 'column_A > 10',
...     'rule_status': 'active',
...     'rule_metrics': {'coverage': 0.9, 'accuracy': 0.85},
...     'encodings': {}
... }
>>> add_rule(**my_rule)

apply_filter(metrics: dict = {})[source]#: This function applies the filter to the rule metrics (for example confidence > 0.75)

convert_template(template: dict = {})[source]#: Main function to convert templates to rules without data and regexes

evaluate(data: DataFrame = None)[source]#

evaluate_code(expressions: dict = {}, dataframe: DataFrame = None, encodings: dict = {})[source]#

Evaluate a set of expressions and return their results.

This method evaluates a dictionary of expressions using the provided data frame, encodings, and additional variables from the evaluation context. The results of the expressions are stored in a dictionary and returned.

Parameters:

expressions (dict) – A dictionary of variable names as keys and expressions
values. (as)
dataframe (pd.DataFrame) – The Pandas DataFrame containing the data for
evaluation.
encodings (dict) – A dictionary of variable encodings or transformations for
evaluation.

Returns:

A dictionary containing the results of evaluated expressions.

Return type:

dict

generate()[source]#

generate_rules(template: dict)[source]#

Generate rules from data using a rule template.

This method generates rules based on a provided rule template. It uses the template expression to create a set of rules by substituting variable values and applying conditions. The resulting rules are evaluated and filtered based on specified metrics.

Parameters:

template (dict) – A dictionary containing the rule template
keys (with the following) –
- “group” (int): The group identifier for the rules.
- ”encodings” (dict): A dictionary of encodings for the rules.
- ”expression” (str): The rule template expression.

Returns:

None

Example

>>> rule_template = {
...     "expression": 'if ({"A.*"} > 10) then ({"B.*"} == "X")'
...     "group": 1,
...     "encodings": {},
... }
>>> generator.generate_rules(rule_template)

Note

The method first parses the provided rule expression into ‘if’ and ‘then’ parts.
It generates rule candidates by substituting variables and applying conditions.
The candidates are evaluated, and the resulting rules are filtered using metrics.
The rules are added to the discovered rule list.

Temporary index name columns are added to the data to derive rules based on index names.
If the template expression is not in ‘if-then’ format, it is converted into such a format.
Substitutions are made for variable values, and rules are generated and evaluated.
Temporary index columns are removed from the data after rule generation.

reformulate(expression: str = '', apply_tolerance: bool = False, positive_tolerance: bool = True)[source]#

Convert parameters, settings, and functions to Pandas code within an expression.

This method takes an input expression and converts specific parameters, settings, and functions into their equivalent Pandas code. It allows for custom transformations and conversions that are used in the evaluation of rules.

Parameters:

expression (str) – The input expression to be reformulated into Pandas code.
apply_tolerance (bool)
positive_tolerance (bool)

Returns:

The reformulated expression in Pandas code.

Return type:

str

Example

>>> expression = ['substr', ['{"A"}', ',', '1', ',', '1']]
>>> result = ruleminer.RuleMiner().reformulate(expression)
>>> print(result)
"({"A"}.str.slice(1,1))"

search_column_value(expr, column_value) → list[source]#

Search for column-value pairs in an expression.

This method recursively searches for column-value pairs within an expression and appends them to the provided list. It identifies column-value pairs by checking the structure of the expression.

Parameters:

expr (str or list) – The expression to search for column-value
pairs.
column_value (list) – A list to store the identified column-value
pairs.

Returns:

A list containing the discovered column-value pairs as tuples.

Return type:

list

Example

>>> expression = ['{"A"}', '==', '"b"']
>>> column_value_pairs = ruleminer.RuleMiner().search_column_value(
    expression, [])
>>> print(column_value_pairs)
[('{"A"}', '"b"')]

Note

The method examines the structure of the expression and identifies column-value pairs by checking for specific patterns. It recursively traverses the expression to find such pairs and appends them to the provided list.

setup_results_dataframe()[source]#: Helper function to set up the results dataframe

setup_rules_dataframe()[source]#: Helper function to set up the rules dataframe

split_rule(expression: str = '') → tuple[source]#

Split a rule expression into its ‘if’ and ‘then’ parts.

This method takes a rule expression and splits it into its ‘if’ and ‘then’ components. It uses regular expressions to identify these parts, and if the ‘if’ part is empty, it is assumed to be the entire rule expression. The resulting ‘if’ and ‘then’ parts are parsed and returned as lists.

Parameters:

expression (str) – The rule expression to be split.

Returns:

A tuple containing the following elements:

list: The parsed rule expression as a list.
list: The ‘if’ part of the rule as a parsed list (empty if not present).
list: The ‘then’ part of the rule as a parsed list.

Return type:

tuple

Example

>>> rule_expression = 'if ({"A"} > 10) then ({"B"} == "C")'
>>> parsed, if_part, then_part = split_rule(rule_expression)
>>> print(parsed)
['if', ['{"A"}', '>', '10'], 'then', ['{"B"}', '==', '"C"']]
>>> print(if_part)
[['{"A"}', '>', '10']]
>>> print(then_part)
[['{"B"}', '==', '"C"']]

Note

The method employs regular expressions to identify ‘if’ and ‘then’ parts, and if the ‘if’ part is not present, the entire expression is considered the ‘then’ part. The parsed results are returned as lists for further evaluation.

substitute_group_names(expr: str = None, group_names_list: list = [])[source]#

Substitute group names in an expression.

This method substitutes placeholders in an expression with their corresponding group names. Group names are provided as a list, and placeholders in the expression are represented as ‘’, ‘’, and so on. The method replaces these placeholders with the group names from the list.

Parameters:

expr (str or list) – The expression or list of expressions to
processed. (be)
group_names_list (list) – A list of group names to use as
substitutions.

Returns:

The expression with placeholders replaced by group names.

Return type:

str or list

Example

>>> expression = "Column '' contains values from group ''"
>>> group_names = ['Group A', 'Numbers']
>>> result = ruleminer.RuleMiner().substitute_group_names(
    expression, group_names)
>>> print(result)
"Column 'Group A' contains values from group 'Numbers'"

Note

The method can be applied to both strings and lists of expressions. It searches for placeholders in the format ‘’, ‘’, and so on, and substitutes them with the corresponding group names from the list.

substitute_list(expression: str = '', columns: list = [], values: list = [], column_substitutions: list = [], value_substitutions: list = [])[source]#

Substitute columns and values in an expression with their substitutions.

This method allows for the substitution of columns and values within an expression using the provided lists of column and value substitutions. It recursively processes the expression, replacing the first occurrence of a column or value with its substitution.

Parameters:

expression (str or list) – The input expression to be processed.
columns (list) – A list of original columns to be substituted.
values (list) – A list of original values to be substituted.
column_substitutions (list) – A list of column substitutions.
value_substitutions (list) – A list of value substitutions.

Returns:

A tuple containing the following elements:

str or list: The processed expression with substitutions.
list: The remaining columns for substitution.
list: The remaining values for substitution.
list: The remaining column substitutions.
list: The remaining value substitutions.

Return type:

tuple

Example

>>> expression = '({"A.*"} > 10) & ({"B.*"} == 20)'
>>> columns = ['{"A.*"}', {"B.*"}]
>>> values = [10, 20]
>>> column_subs = ["Aa", "Bb"]
>>> value_subs = [30, 40]
>>> result = ruleminer.RuleMiner().substitute_list(expression,
columns, values, column_subs, value_subs)
>>> print(result)
('({"Aa"} > 10) & ({"B.*"} == 20)', [{'B.*'}], [10, 20], ['Bb'], [30, 40])

update(templates: list = None, rules: DataFrame = None, data: DataFrame = None, params: dict = None)[source]#

ruleminer.ruleminer.flatten(expression)[source]#

Recursively flatten a nested expression and return it as a string.

This function takes an expression, which can be a nested list of strings or a single string, and recursively flattens it into a single string enclosed in parentheses.

Parameters:: expression (str or list) – The expression to be flattened.
Returns:: The flattened expression as a string enclosed in parentheses.
Return type:: str

Example

>>> expression = ["A", ["B", ["C", "D"]]]
>>> result = ruleminer.flatten(expression)
>>> print(result)
"(A(B(CD)))"

ruleminer.ruleminer.flatten_and_sort(expression: str = '')[source]#

Recursively flatten and sort a nested expression and return it as a string.

This function takes an expression, which can be a nested list of strings or a single string, and recursively flattens and sorts it into a single string enclosed in parentheses. Sorting is applied to certain elements within the expression, such as mathematical operations, column references, and strings, based on their relationships and order of precedence.

Parameters:: expression (str or list) – The expression to be flattened and sorted.
Returns:: The flattened and sorted expression as a string enclosed in parentheses.
Return type:: str

Example

>>> expression = ["max", ["C", "A"]]
>>> result = ruleminer.flatten_and_sort(expression)
>>> print(result)
"(max((CA)))"

>>> expression = ["C", "==", ["A", "+", "B"]]
>>> result = ruleminer.flatten_and_sort(expression)
>>> print(result)
"((A+B)==C)"

ruleminer.ruleminer.is_column(s)[source]#

Check if a given string is formatted as a column reference.

This function checks if a string is formatted as a column reference, which typically consists of double curly braces {“”} enclosing a column name.

Parameters:: s (str) – The string to be checked.
Returns:: True if the string is formatted as a column reference, False otherwise.
Return type:: bool

Example

>>> is_column('{"A"}')
True

>>> is_column('{"B"}')
True

>>> is_column("Not a column reference")
False

ruleminer.ruleminer.is_string(s)[source]#

Check if a given string is enclosed in single or double quotes.

This function checks if a string is enclosed in single (‘’) or double (“”) quotes, indicating that it is a string literal.

Parameters:: s (str) – The string to be checked.
Returns:: True if the string is enclosed in quotes, False otherwise.
Return type:: bool

Example

>>> is_string("'Hello, World!'")
True

>>> is_string('"42"')
True

>>> is_string("Not a string")
False

ruleminer.utils module#

Main module.

ruleminer.utils.fit_dataframe_to_ensemble(df: DataFrame = None, random_state: int = 0, max_depth: int = 1, n_estimators: int = 10, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0)[source]#: fit and extract from an ensemble

ruleminer.utils.fit_ensemble_and_extract_expressions(df: DataFrame = None, target: str = None, estimator: ABCMeta = None, base: ABCMeta = None, random_state: int = 0, max_depth: int = 2, n_estimators: int = 10, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0, sample_weight: list = None)[source]#

ruleminer.utils.generate_substitutions(df: DataFrame = None, column_value: tuple = None, value_regex: str = None)[source]#: Util function to retrieve values of dataframe satisfying a regex

ruleminer.utils.tree_to_expressions(tree, features, target)[source]#: Util function to derive rules from a decision tree (classifier or regressor)

Module contents#

Top-level package for ruleminer.