Rule grammar
The rule grammar explains the structure and components of rules used in ruleminer. It covers how to write conditions using logical and comparison operators, how mathematical expressions are formed, and the types of elements (like numbers, strings, and dataset columns) that can be used within those expressions.
General rule format
Rules follow this format:
-
if condition_1 then condition_2 -
or simply a
condition
Conditions
A condition is a wel-formed logical expression of comparisions with &, | and ~ and parentheses, for example cond_1 & (cond_2 | cond_3).
Comparisons
A single comparision consists of two mathematical expressions separated by a comparison operator, for example math_expr_1 > math_expr_2. The following comparison operators are availabe:
-
Standard:
>=,>,<=,<,!=,== -
List- and set-based:
inandnot in -
Range:
betweenandnot between, for example({"A"} between [2, 30]). -
Pattern:
matchandnot match, for exameple({"C0450"} match "^C\d+"). -
Substring:
containsandnot contains, for example({"C0450"} contains "type").
Mathematical expressions
These include numbers, functions, columns, lists, and use +, -, *, /, and **. They can be nested.
-
Functions: name (any case) followed by parameters in parentheses, e.g.
func(param1, param2) -
Parameter: can be a
list,condition, or another expression
The base elements are:
-
Columns: quoted string in braces, e.g.
{"Type"} -
Numbers: examples +3, -4.1, 2.1e-8 and 0.9e10
-
Quoted strings: text in double quotes, including letter, numbers, symbol: a-z A-Z 0-9 _ . , ; ; < > * = + - / \ ? | @ # $ % ^ & ( )
General mathematical functions
-
min(expr_1, expr_2, ... )returns the minimum of the parameters. -
max(expr_1, expr_2, ... )returns the maximum of the parameters. -
abs(expression)returns the absolute value of the expression. -
sum([expr_1, expr_2, ... ])returns the sum of the list of expressions.
Rounding functions
-
exact(expression)returns the expression without applying interval mathematics. -
round(expression, p)return the rounded values of the rows in the expression given precisionp -
floor(expression, p)return the truncated values of the rows in the expression given precisionp -
ceil(expression, p)return the values rounded up of the rows in the expression given precisionp
Statistical functions
-
quantile(expression, p)returns the quantilepof the expression. -
mean(expression)returns the mean of the expression. -
std(expression)returns the standard deviation of the expressions. -
corr("matrix", expr_1, expr_2, ... )returns the sum of correlations given coefficient matrixmatrixand the list of expressions.
String functions
-
substr(s, pos, len)returns the substring ofsfrom positionpos(starting at 1) with lengthlen -
split(s, sep, n)returns then-th element (starting at 1) of the list of elements of stringsseparated bysep
Conditional functions
-
sumif([a, b, ... ], cond)returns the sum of list [a, b, ...] given that conditioncondis satisfied, for example(sumif([{"Assets"}, {"Own_funds"}], {"Type"}=="life_insurer") > 0) -
sumif([a, b, ... ], [cond_a, cond_b, ... ])returns the sum of list [a, b, ...] given that the corresponding condition is satisfied. The length of the lists should be equal. -
countif([a, b, ... ], cond)returns the count of list [a, b, ...] given that conditioncondis satisfied -
countif([a, b, ... ], [cond_a, cond_b, ... ])returns the count of list [a, b, ...] given that the corresponding condition is satisfied. The length of the lists should be equal.
Functions for external data
table("table name", ["a", "b"]), returns a list of tuples from an external table. You can use it to check if a row or a set of column values exists in that table. Example:[{"A"}, {"B"}] in table("external_data", ["a", "b"])checks whether the values in columns A and B match any row in the external table.
Date and time functions
-
days(expression)returns the number of days of the expression, e.g.days({"C"}- {"D"})return the numbers of days between (date) columns{"C"}and{"D"} -
months(expression)return the number of months -
years(expression)returns the number of years
Full grammar in pyparsing
_lpar = pyparsing.Literal("(")
_rpar = pyparsing.Literal(")")
_lbra = pyparsing.Literal("[")
_rbra = pyparsing.Literal("]")
_sep = pyparsing.Literal(",")
_quote = pyparsing.Literal('"')
_number = pyparsing.Regex(r"[+-]?\d+(?:\.\d*)?(?:[eE][+-]?\d+)?")
_timedate_functions = pyparsing.one_of(
"day_name \
month_name \
days_in_month \
daysinmonth \
is_leap_year \
is_year_end \
dayofweek \
weekofyear \
weekday \
week \
is_month_end \
is_month_start \
is_year_start \
is_quarter_end \
is_quarter_start \
day \
month \
quarter \
year",
caseless=True,
)
_timedelta_functions = pyparsing.one_of("days months years", caseless=True)
_function = (
pyparsing.one_of(
"min \
max \
abs \
quantile \
sum \
substr \
split \
count \
sumif \
countif \
mean \
std \
exact \
corr \
round \
floor \
ceil \
table",
caseless=True,
)
| _timedelta_functions
| _timedate_functions
)
_for = pyparsing.one_of("for", caseless=True)
_in = pyparsing.one_of("in", caseless=True)
_empty = pyparsing.one_of(["None", '""', "pd.NA", "np.nan"])
_list_comprehension_var = pyparsing.Word(pyparsing.alphas)
_string = (
pyparsing.srange(r"[a-zA-Z0-9_.,:;<>*=+-/?|@#$%^&\[\]{}\(\)\\']")
+ " "
+ "\x01"
+ "\x02"
+ "\x03"
)
_quoted_string = pyparsing.Combine(_quote + pyparsing.Word(_string) + _quote)
_column = pyparsing.Combine("{" + _quote + pyparsing.Word(_string) + _quote + "}")
_addop = pyparsing.one_of("+ -")
_multop = pyparsing.one_of("* /")
_expop = pyparsing.Literal("**")
_compa_op = (
pyparsing.one_of(">= > <= < != == in between match contains", caseless=True)
| pyparsing.CaselessLiteral("not in")
| pyparsing.CaselessLiteral("not between")
| pyparsing.CaselessLiteral("not match")
| pyparsing.CaselessLiteral("not contains")
)
base_element = (
_quoted_string
| _column
| _number
| _empty
| _list_comprehension_var
)
math_expression = pyparsing.Forward()
condition_expression = pyparsing.infixNotation(
(math_expression + _compa_op + math_expression),
[
(
pyparsing.one_of(["not", "~"], caseless=True),
1,
pyparsing.opAssoc.RIGHT,
),
(
pyparsing.one_of(["and", "&"], caseless=True),
2,
pyparsing.opAssoc.LEFT,
),
(
pyparsing.one_of(["or", "|"], caseless=True),
2,
pyparsing.opAssoc.LEFT,
),
],
lpar=pyparsing.Literal("("),
rpar=pyparsing.Literal(")"),
)
param = pyparsing.Forward()
list_expression = pyparsing.Group(_lbra + param + (_sep + param)[...] + _rbra)
list_comprehension_expression = pyparsing.Group(
_lbra
+ pyparsing.Group(param)
+ _for
+ _list_comprehension_var
+ _in
+ list_expression
+ _rbra
)
param <<= (
list_comprehension_expression
| list_expression
| condition_expression
| math_expression
)
params = param + (_sep + param)[...]
function = pyparsing.Group(_function + pyparsing.Group(_lpar + params + _rpar))
atom = (
function
| list_expression
| base_element
| pyparsing.Group(_lpar + math_expression + _rpar)
)
factor = pyparsing.Group(atom + pyparsing.OneOrMore(_expop + atom)) | atom
term = pyparsing.Group(factor + pyparsing.OneOrMore(_multop + factor)) | factor
math_expression <<= term + pyparsing.ZeroOrMore(_addop + term)
_if = pyparsing.one_of("if", caseless=True)
_then = pyparsing.one_of("then", caseless=True)
rule_expression = (
_if + condition_expression + _then + condition_expression
| "if () then " + condition_expression
| "IF () THEN " + condition_expression
| condition_expression
)