ruleminer package#
Submodules#
ruleminer.const module#
Constants module.
ruleminer.encodings module#
Encoding definitions
ruleminer.metrics module#
Metrics module.
ruleminer.parser module#
Parser module.
- ruleminer.parser.dataframe_index(expression: str = '', required: list = []) Dict[str, str] [source]#
Parse a rule expression and generate corresponding DataFrame index expressions.
This function takes a rule expression, such as ‘if A then B’, and a list of required variables (‘N’, ‘X’, ‘Y’, ‘~X’, ‘~Y’, ‘X and Y’, ‘X and ~Y’, ‘~X and ~Y’). It then generates corresponding DataFrame index expressions for each required variable based on the given expression.
- Parameters:
- Returns:
A dictionary where keys are required variable names, and values are corresponding DataFrame index expressions.
- Return type:
Example
>>> expression = 'if ({"A"} > 0) then ({"B"} < 10)' >>> required = ['X', 'Y', 'X and Y'] >>> result = ruleminer.dataframe_index(expression, required) >>> print(result) { 'X': '__df__.index[((__df__["A"] > 0))]', 'Y': '__df__.index[((__df__["B"] < 10))]', 'X and Y': '__df__.index[((__df__["A"] > 0)) & ((__df__["B"] < 10))]' }
- ruleminer.parser.dataframe_lengths(expression: str = '', required: list = []) Dict[str, str] [source]#
Calculate lengths based on an rule expression and generate corresponding values.
This function takes a rule expression, such as ‘if A then B’, and a list of required variables (‘N’, ‘X’, ‘Y’, ‘~X’, ‘~Y’, ‘X and Y’, ‘X and ~Y’, ‘~X and ~Y’). It calculates lengths or sums of data based on the given conditional expression for each required variable.
- Parameters:
- Returns:
A dictionary where keys are required variable names, and values are corresponding length or sum calculations.
- Return type:
Example
>>> expression = "if ({"A"} > 0) then ({"B"} < 10)" >>> required = ['X', 'Y', 'X and Y'] >>> result = ruleminer.dataframe_lengths(expression, required) >>> print(result) {'X': '((__df__["A"] > 0)).sum()', 'Y': '((__df__["B"] < 10)).sum()', 'X and Y': '(((__df__["A"] > 0)) & ((__df__["B"] < 10))).sum()'}
- ruleminer.parser.dataframe_values(expression: str = '')[source]#
Extract values from a Pandas DataFrame based on an expression.
This function constructs a Pandas DataFrame expression to extract values based on the provided expression. The expression is wrapped in square brackets to retrieve the values from the DataFrame.
- Parameters:
expression (str) – An expression used to filter or access
values (DataFrame)
e.g.
0}". ("{Column_A >)
- Returns:
The Pandas DataFrame expression to retrieve values based on the provided expression.
- Return type:
Example
>>> expression = '{"A"} > 0' >>> result = ruleminer.dataframe_values(expression) >>> print(result) "__df__[(__df__["A"] > 0)]"
- ruleminer.parser.function_expression()[source]#
Define a ruleminer function expression
This function defines a function expression. It uses pyparsing to define the syntax for function calls with parameters, including basic mathematical operations and comparisons.
- Returns:
a function expression
- Return type:
pyparsing.core.Forward
Example
>>> expression = 'substr({"A"}, 1, 1)' >>> result = ruleminer.function_expression().parse_string(expression) >>> print(result) ['substr', ['{"A"}', ',', '1', ',', '1']]
- ruleminer.parser.math_expression(base: Forward = None)[source]#
Define a ruleminer mathematical expression
This function defines a mathematical expression. It uses pyparsing to define the syntax for function calls with parameters, including basic mathematical operations and comparisons.
- Parameters:
None
- Returns:
a mathematical expression
- Return type:
pyparsing.core.Forward
Example
>>> expression = '{"A"} > 0' >>> result = ruleminer.math_expression().parse_string(expression) >>> print(result) [['{"A"}', '+', '{"B"}']]
- ruleminer.parser.pandas_column(expression: str = '')[source]#
Replace column names with Pandas DataFrame expressions.
This function takes a string containing column names enclosed in curly braces, e.g., {“A”}, and converts them to Pandas DataFrame syntax, e.g., __df__[“A”].
- Parameters:
expression (str) – A string with column names enclosed in curly
braces
e.g.
{"A"}.
- Returns:
The Pandas DataFrame expression with columns in the format __df__[“A”].
- Return type:
Example
>>> expression = '{"A"}' >>> result = ruleminer.pandas_column(expression) >>> print(result) "__df__[A]"
- ruleminer.parser.rule_expression()[source]#
Define a ruleminer rule expression
This function defines a ruleminer rule expression. It uses pyparsing to define the syntax for conditions and rule syntax
- Parameters:
None
- Returns:
a ruleminer rule expression
- Return type:
pyparsing.core.Forward
Example
>>> expression = 'if ({"A"} > 0) then ({"B"} < 0)' >>> result = ruleminer.rule_expression().parse_string(expression) >>> print(result) ['if', ['{"A"}', '>', '0'], 'then', ['{"B"}', '<', '0']]
ruleminer.ruleminer module#
Main module.
- class ruleminer.ruleminer.RuleMiner(templates: list = None, rules: DataFrame = None, data: DataFrame = None, params: dict = None)[source]#
Bases:
object
The RuleMiner object contains rules and data
It used three basic functions: - update (rule expressions, rules or data) - generate (rules from rule expressions and data) - evaluate (results from rule)
- add_results(rule_idx, rule_metrics, co_indices, ex_indices) None [source]#
Add results for a rule to the results list.
This method adds results for a specific rule, including its metrics and indices, to the results list. It updates both the results for confirmations (co_indices) and exceptions (ex_indices). If no indices are provided for either category, an error message is logged.
- Parameters:
- Returns:
This method updates the results list in-place.
- Return type:
None
Example
>>> rule_index = 0 >>> metrics = { ... 'absolute_support': 50, ... 'absolute_exceptions': 5, ... 'confidence': 0.9 ... } >>> co_indices = [1, 2, 3, 4, 5] >>> ex_indices = [10, 11, 12, 13] >>> add_results(rule_index, metrics, co_indices, ex_indices)
- add_rule(rule_id: str = '', rule_group: int = 0, rule_def: str = '', rule_status: str = '', rule_metrics: dict = {}, encodings: dict = {})[source]#
Add a rule with information to the discovered rule list.
This method adds a new rule to the discovered rule list. The rule is defined by a unique rule ID, a rule group, a rule definition (expression or description), a rule status, a dictionary of rule- specific metrics, and a dictionary of encodings used in the rule evaluation.
- Parameters:
rule_id (str) – A unique identifier for the rule.
rule_group (int) – An integer representing the group or category
belongs. (to which the rule)
rule_def (str) – The definition or expression of the rule.
rule_status (str) – The status of the rule
rule_metrics (dict) – A dictionary of rule-specific metrics and
values. (their)
encodings (dict) – A dictionary of variable encodings used in the
rule.
Example
>>> my_rule = { ... 'rule_id': 'R001', ... 'rule_group': 1, ... 'rule_def': 'column_A > 10', ... 'rule_status': 'active', ... 'rule_metrics': {'coverage': 0.9, 'accuracy': 0.85}, ... 'encodings': {} ... } >>> add_rule(**my_rule)
- apply_filter(metrics: dict = {})[source]#
This function applies the filter to the rule metrics (for example confidence > 0.75)
- convert_template(template: dict = {})[source]#
Main function to convert templates to rules without data and regexes
- evaluate_code(expressions: dict = {}, dataframe: DataFrame = None, encodings: dict = {})[source]#
Evaluate a set of expressions and return their results.
This method evaluates a dictionary of expressions using the provided data frame, encodings, and additional variables from the evaluation context. The results of the expressions are stored in a dictionary and returned.
- Parameters:
- Returns:
A dictionary containing the results of evaluated expressions.
- Return type:
- generate_rules(template: dict)[source]#
Generate rules from data using a rule template.
This method generates rules based on a provided rule template. It uses the template expression to create a set of rules by substituting variable values and applying conditions. The resulting rules are evaluated and filtered based on specified metrics.
- Parameters:
template (dict) – A dictionary containing the rule template
keys (with the following) –
“group” (int): The group identifier for the rules.
”encodings” (dict): A dictionary of encodings for the rules.
”expression” (str): The rule template expression.
- Returns:
None
Example
>>> rule_template = { ... "expression": 'if ({"A.*"} > 10) then ({"B.*"} == "X")' ... "group": 1, ... "encodings": {}, ... } >>> generator.generate_rules(rule_template)
Note
The method first parses the provided rule expression into ‘if’ and ‘then’ parts.
It generates rule candidates by substituting variables and applying conditions.
The candidates are evaluated, and the resulting rules are filtered using metrics.
The rules are added to the discovered rule list.
Temporary index name columns are added to the data to derive rules based on index names.
If the template expression is not in ‘if-then’ format, it is converted into such a format.
Substitutions are made for variable values, and rules are generated and evaluated.
Temporary index columns are removed from the data after rule generation.
- reformulate(expression: str = '', apply_tolerance: bool = False, positive_tolerance: bool = True)[source]#
Convert parameters, settings, and functions to Pandas code within an expression.
This method takes an input expression and converts specific parameters, settings, and functions into their equivalent Pandas code. It allows for custom transformations and conversions that are used in the evaluation of rules.
- Parameters:
- Returns:
The reformulated expression in Pandas code.
- Return type:
Example
>>> expression = ['substr', ['{"A"}', ',', '1', ',', '1']] >>> result = ruleminer.RuleMiner().reformulate(expression) >>> print(result) "({"A"}.str.slice(1,1))"
- search_column_value(expr, column_value) list [source]#
Search for column-value pairs in an expression.
This method recursively searches for column-value pairs within an expression and appends them to the provided list. It identifies column-value pairs by checking the structure of the expression.
- Parameters:
- Returns:
A list containing the discovered column-value pairs as tuples.
- Return type:
Example
>>> expression = ['{"A"}', '==', '"b"'] >>> column_value_pairs = ruleminer.RuleMiner().search_column_value( expression, []) >>> print(column_value_pairs) [('{"A"}', '"b"')]
Note
The method examines the structure of the expression and identifies column-value pairs by checking for specific patterns. It recursively traverses the expression to find such pairs and appends them to the provided list.
- split_rule(expression: str = '') tuple [source]#
Split a rule expression into its ‘if’ and ‘then’ parts.
This method takes a rule expression and splits it into its ‘if’ and ‘then’ components. It uses regular expressions to identify these parts, and if the ‘if’ part is empty, it is assumed to be the entire rule expression. The resulting ‘if’ and ‘then’ parts are parsed and returned as lists.
- Parameters:
expression (str) – The rule expression to be split.
- Returns:
- A tuple containing the following elements:
list: The parsed rule expression as a list.
list: The ‘if’ part of the rule as a parsed list (empty if not present).
list: The ‘then’ part of the rule as a parsed list.
- Return type:
Example
>>> rule_expression = 'if ({"A"} > 10) then ({"B"} == "C")' >>> parsed, if_part, then_part = split_rule(rule_expression) >>> print(parsed) ['if', ['{"A"}', '>', '10'], 'then', ['{"B"}', '==', '"C"']] >>> print(if_part) [['{"A"}', '>', '10']] >>> print(then_part) [['{"B"}', '==', '"C"']]
Note
The method employs regular expressions to identify ‘if’ and ‘then’ parts, and if the ‘if’ part is not present, the entire expression is considered the ‘then’ part. The parsed results are returned as lists for further evaluation.
- substitute_group_names(expr: str = None, group_names_list: list = [])[source]#
Substitute group names in an expression.
This method substitutes placeholders in an expression with their corresponding group names. Group names are provided as a list, and placeholders in the expression are represented as ‘’, ‘’, and so on. The method replaces these placeholders with the group names from the list.
- Parameters:
- Returns:
The expression with placeholders replaced by group names.
- Return type:
Example
>>> expression = "Column '' contains values from group ''" >>> group_names = ['Group A', 'Numbers'] >>> result = ruleminer.RuleMiner().substitute_group_names( expression, group_names) >>> print(result) "Column 'Group A' contains values from group 'Numbers'"
Note
The method can be applied to both strings and lists of expressions. It searches for placeholders in the format ‘’, ‘’, and so on, and substitutes them with the corresponding group names from the list.
- substitute_list(expression: str = '', columns: list = [], values: list = [], column_substitutions: list = [], value_substitutions: list = [])[source]#
Substitute columns and values in an expression with their substitutions.
This method allows for the substitution of columns and values within an expression using the provided lists of column and value substitutions. It recursively processes the expression, replacing the first occurrence of a column or value with its substitution.
- Parameters:
expression (str or list) – The input expression to be processed.
columns (list) – A list of original columns to be substituted.
values (list) – A list of original values to be substituted.
column_substitutions (list) – A list of column substitutions.
value_substitutions (list) – A list of value substitutions.
- Returns:
- A tuple containing the following elements:
str or list: The processed expression with substitutions.
list: The remaining columns for substitution.
list: The remaining values for substitution.
list: The remaining column substitutions.
list: The remaining value substitutions.
- Return type:
Example
>>> expression = '({"A.*"} > 10) & ({"B.*"} == 20)' >>> columns = ['{"A.*"}', {"B.*"}] >>> values = [10, 20] >>> column_subs = ["Aa", "Bb"] >>> value_subs = [30, 40] >>> result = ruleminer.RuleMiner().substitute_list(expression, columns, values, column_subs, value_subs) >>> print(result) ('({"Aa"} > 10) & ({"B.*"} == 20)', [{'B.*'}], [10, 20], ['Bb'], [30, 40])
- ruleminer.ruleminer.flatten(expression)[source]#
Recursively flatten a nested expression and return it as a string.
This function takes an expression, which can be a nested list of strings or a single string, and recursively flattens it into a single string enclosed in parentheses.
- Parameters:
- Returns:
The flattened expression as a string enclosed in parentheses.
- Return type:
Example
>>> expression = ["A", ["B", ["C", "D"]]] >>> result = ruleminer.flatten(expression) >>> print(result) "(A(B(CD)))"
- ruleminer.ruleminer.flatten_and_sort(expression: str = '')[source]#
Recursively flatten and sort a nested expression and return it as a string.
This function takes an expression, which can be a nested list of strings or a single string, and recursively flattens and sorts it into a single string enclosed in parentheses. Sorting is applied to certain elements within the expression, such as mathematical operations, column references, and strings, based on their relationships and order of precedence.
- Parameters:
expression (str or list) – The expression to be flattened and sorted.
- Returns:
The flattened and sorted expression as a string enclosed in parentheses.
- Return type:
Example
>>> expression = ["max", ["C", "A"]] >>> result = ruleminer.flatten_and_sort(expression) >>> print(result) "(max((CA)))"
>>> expression = ["C", "==", ["A", "+", "B"]] >>> result = ruleminer.flatten_and_sort(expression) >>> print(result) "((A+B)==C)"
- ruleminer.ruleminer.is_column(s)[source]#
Check if a given string is formatted as a column reference.
This function checks if a string is formatted as a column reference, which typically consists of double curly braces {“”} enclosing a column name.
- Parameters:
s (str) – The string to be checked.
- Returns:
True if the string is formatted as a column reference, False otherwise.
- Return type:
Example
>>> is_column('{"A"}') True
>>> is_column('{"B"}') True
>>> is_column("Not a column reference") False
- ruleminer.ruleminer.is_string(s)[source]#
Check if a given string is enclosed in single or double quotes.
This function checks if a string is enclosed in single (‘’) or double (“”) quotes, indicating that it is a string literal.
- Parameters:
s (str) – The string to be checked.
- Returns:
True if the string is enclosed in quotes, False otherwise.
- Return type:
Example
>>> is_string("'Hello, World!'") True
>>> is_string('"42"') True
>>> is_string("Not a string") False
ruleminer.utils module#
Main module.
- ruleminer.utils.fit_dataframe_to_ensemble(df: DataFrame = None, random_state: int = 0, max_depth: int = 1, n_estimators: int = 10, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0)[source]#
fit and extract from an ensemble
- ruleminer.utils.fit_ensemble_and_extract_expressions(df: DataFrame = None, target: str = None, estimator: ABCMeta = None, base: ABCMeta = None, random_state: int = 0, max_depth: int = 2, n_estimators: int = 10, min_samples_split: int = 2, min_samples_leaf: int = 1, min_weight_fraction_leaf: float = 0.0, sample_weight: list = None)[source]#
Module contents#
Top-level package for ruleminer.