History

1.0.3 (2026-6-7)

Small change to logging in parser.py
Added lxml >= 5.0 to requirements

1.0.2 (2026-3-24)

Implemented split-function with all/any instead of position to check whether all or any comma separated elements are in a list
updated to new version of pyparsing

1.0.1 (2026-2-10)

Bug fix for issue #67

1.0.0 (2025-5-23)

Bump to production version

0.3.18 (2025-5-9)

Added logging of results for strings, bools and datetimes
Updated docs
Minor bug fix for complex expressions

0.3.13 (2025-5-4)

Refactored grammar: infinite nesting, all functions caseless
Some refactoring in parser of lists and list comprehensions

0.3.12 (2025-5-2)

Added logging for comparisions without tolerance
Fix for bug parsing list comprehensions
Added power function to deal with interval arithmetic
Changed 'between'-function to a >= lower_bound and a <= upper_bound instead of internal Pandas function
Renamed interval functions by removing _
Updated documentation

0.3.9 (2025-4-25)

Minor change to 'between'-function: if interval is (a,b) then (min(a,b),max(a,b)) is used
Refactoring of comparisons logging (and fix for logging of > operators)
Added unittests for these changes
Minor buf fix for 'between'-function

0.3.6 (2025-4-15)

The right side of a 'between' comparison operator can now be a list of mathematical expressions
Added 'not match' as comparison operator.

0.3.4 (2025-4-11)

When using COUNTIF() with a single list as a parameter, all elements that evaluate to True are counted, like for example COUNTIF([K==1 FOR K in [{"A"}, {"B"}, {"C"}]]) counts the number of items that evaluate to 1.
Added 'table' function to check whether a tuple is in an external table, for example '([{"A"}, {"B"}] in TABLE("data", ["a", "b"]))' checks for each row in columns A and B whether the tuple is in the data DataFrame with columns a and b.

0.3.3 (2025-4-10)

Added 'between' and 'not between' as comparison operator to grammar
Enforced parameters of 'between' to lowercase

0.3.1 (2025-3-25)

Small fix in logging when comparing two columns

0.3.0 (2025-3-21)

Applied rounding to 8 decimals in interval numbers
Changed strictly higher-operator: (max_left >= min_right) & (min_left > max_right) (instead of max_left > min_right)
Changed strictly lower-operator: (min_left <= max_right) & (max_left < min_right) (instead of min_left < max_right)
deleted rule_status from rules and results dataframes

0.2.40 (2025-3-20)

Added unittests for logging of intermediate results
Fix in logging of ranges for == and !=
Fix for consistent use of double parentheses in parser
Changed exact function with one or two extra parameters to include lower and upper bounds
Refactored evaluator code

0.2.36 (2025-3-14)

Correction for logging and calculation of < and > comparisons
Logging: added ; as separator between comparison results
Logging: separated if and then comparisons
Grammar: an extra exact-function is allowed around nested functions

0.2.35 (2025-3-13)

Fix to accept NaT in tolerance function
Fix to not suppress parentheses in pyparsing.infixNotation (Kevin's curious case)
Fix to not apply tolerance for Pandas datetime64-types
Fix for nan values in corr-function

0.2.32 (2025-3-9)

Fix for abs-function where lower < 0 and upper > 0 (then the real lower bound is 0)
Correction for logging of <=, <, >=, > comparisons
Two bug fixes in corr-function

0.2.29 (2025-3-4)

Added <=, <, >=, >-operators that take tolerance into account
Applied sqrt in the corr-function
Added round, ceil and floor-functions
Reformatted comparison-logs: {LHS - RHS = diff} operator [LHS_min - LHS - RHS_max + RHS, LHS_max - LHS - RHS_min + RHS]

0.2.28 (2025-2-28)

Added 'contains' and 'not contains' comparison operators
Added intermediate results for unequalities
Refactoring for logging intermediate results
Added intermediate results for equalities and statistics

0.2.25 (2025-2-11)

Improved performance of evaluation (antecedent and consequent are calculated only once)
Added corr-function to calculate correlations
Refactored code
Added some preparations for logging rule evaluations
Improved logging info of evaluation of rules

0.2.22 (2025-1-28)

Fix in evaluation of unequal operators in combination with tolerances
Fix in evaluation of equal and unequal operators when writing debug info

0.2.20 (2025-1-14)

Added exact-function to grammar and parser to ignore tolerance of specific part of expression
When exponentials like x ** y are used, for x max(0, x) is substituted to prevent imaginary numbers (as x can be negative)
The tolerance dict can now contain None values:
- if params={'tolerance': {"A": None', 'default': { ... }}} then tolerance is not applied to column A and other columns are set to 'default'
- if params={'tolerance': {"default": None, "A": { ... }}} then tolerance is not applied as default except for column A
- the keys of the tolerance dict can contain regular expressions. They are matched with columns when converting the expressions
Added regex to dependency in pyproject.toml

0.2.19 (2024-12-19)

Added mean and std functions to grammar and parser
Changed parameter name 'evaluate_quantile' to 'evaluate_statistics' to cover mean and std
Changed subst parameter definitions: substr(x, a, b) is now translated to x.str.slice(a-1, a+b-1)

0.2.18 (2024-12-17)

Added multiply and divide functions when left and right side both contain columns (see usage documentation)
Changed grammar to separate plus, minus on one hand and multiplication, division on other hand into separate groups (needed for first point)
Added debugging information on lower and upper bounds when evaluating (in)equalities

0.2.17 (2024-12-10)

Fix in grammar when parsing mathematical expressions in function parameters
Fix for applying tolerances

0.2.16 (2024-12-8)

Added: 'not in' as comparison operator
Added: parameter 'apply_rules_on_indices' (default True) that allows indices to be included as columns when generating and evaluating rules (if set to False then the data DataFrame is not changed)
Output: dtypes of results DateFrame are enforced
Performance: changed the internal equal and unequal functions to improve performance for columns with string, bool and datetime64_ns
Logging: change to logging when all are not applicable
Improved parsing performance

0.2.13 (2024-12-3)

Changed tolerance function to be able to evaluate string values
Small fix when collecting results

0.2.12 (2024-12-2)

Refactoring
- Created separated CodeEvaluator class
- Created separated RuleParser class
- Streamlined parentheses process (resulting expressions have same parentheses as original)
Performance
- Deleted concats when collecting rules and results
Parameter 'results_datatype' can now be a dict that contains per column a list of results

0.2.11 (2024-11-27)

Parser of list comprehension now accepts a wider range of expressions
Fix when applying tolerances when denominator in rule expression is negative
Shortened internal function names

0.2.10 (2024-11-25)

Added dtype=float to np.sum operations to prevent divide by zero exceptions
In the results a row with not applicable is only included is there are no exceptions and no confirmations
Added days / months / years functions that return the number of days / months / years of an expression (the difference between two timedate64 columns)

0.2.9 (2024-11-18)

Fix for SUMIF when data DataFrame contains pd.NA's
Added date functions to extract day, month, quarter and year from a column
Added date functions day_name, month_name, days_in_month, daysinmonth, is_leap_year, is_year_end, dayofweek, weekofyear, weekday, week, is_month_end, is_month_start, is_year_start, is_quarter_end and is_quarter_start

0.2.8 (2024-11-15)

Added 'not applicable' as a default metric to be calculated (n - confirmations - exceptions)
Restructured the parsing of parentheses (deleted some unnecessary pyparsing.groups)
Fix for unnecessary parentheses around power function (**)
Fix for necessary parentheses around mathematical expressions
Adapted all unittests accordingly

0.2.7 (2024-11-13)

Better fix when applying tolerances with column names that contain strings
Tolerances are not applied to columns where dtype is string, bool or datetime64_ns
Logging 'finished' with rule_id per rule_id
Fix for data DataFrames that contain 'pd.libs.missing.NAType()'

0.2.5 (2024-11-08)

Fix when applying tolerances with column names that contain strings
When applying 'output_not_applicable' the results now contain the indices of the rows to which a rule does not apply
Added match logical operator to check if column that contains strings satisfies a regular expression

0.2.4 (2024-23-10)

Added parameters 'output_confirmations', 'output_exceptions', 'output_not_applicable' to specify which results to return
Fixed issue with parentheses in combination with max, min and abs functions
Added parameters 'rules_datatype' and 'results_datatype' for output in pandas or polars dataframes
Separated pandas parser code from general parser.py
Moved tolerance code to pandas parsing instead including it in rule definition

0.2.3 (2024-20-21)

Use zip for sumif and countif so list comprehensions need not to be evaluated
Suppressed brackets in parser and changed code accordingly
Some refactoring for readability

0.2.2 (2024-10-17)

Fix for sum, sumif, count and countif with list comprehensions

0.2.1 (2024-10-14)

Added list comprehension for countif
Fixed issue with evaluating first part of list comprehension
Second parameter of split starts at 1

0.2.0 (2024-10-14)

Converted docs to mkdocs and added ruff
Changed setup.py to pyproject.toml
Deleted (some) redundant parentheses in rule code
Restructured code to parse rule tree to rule code
Fixed incorrect parsing of negative numbers in rule expressions
Fixed issue when applying tolerance and decimal to comparisons with strings
Fixed issue with identifying empty strings in rule expressions

0.1.30 (2024-10-2)

Added functionality for countif and sumif
Bug fix for tolerances in combination with >=, <=, > and <
Bug fix for tolerances in formulas like A - B - C
Added tests for these bug fixes

0.1.26 (2023-4-22)

Added additional arguments estimate, base and sample_weights to fit_ensemble_and_extract_expressions function to use more than AdaBoost
Added decision tree functions to init.py

0.1.24 (2023-10-25)

Added sumif and improved tolerance functionality
Added nested conditions in functions

0.1.22 (2023-10-17)

Added tolerance functionality for !=, <, <=, > and >=
Updated docs
Changed sum to nansum
Added tolerance functionality for ==

0.1.20 (2023-8-29)

Small fixes in evaluating rules with syntax errors

0.1.19 (2023-8-27)

Small fixes rule conversion without data

0.1.18 (2023-8-8)

Dedicated function added for template to rule conversion without data
Exp sign changed from ^ to **

0.1.17 (2023-8-8)

Generate rules now runs without specified data

0.1.16 (2023-8-3)

Templates now do not necessarily have to contain a regex
Bug fix when evaluating rules that contain columns that do not exist
Templates now can start with 'if () then'

0.1.14 (2023-4-17)

Nested functions added
substr and in operators added

0.1.12 (2022-5-11)

Optimizations: metric calculations are done with boolean masks of DataFrame

0.1.4 (2022-1-26)

Evaluated columns in then part are now dependent on if part of rule
Rule with quantiles added (including evaluating intermediate results)
A number of optimization in rule generation process
Rule power factor metric added

0.1.1 (2021-11-23)

Added more documentation to the README text
Bug fixes wrt some complex expressions
Optimized rule generation process

0.1.0 (2021-11-21)

First release on PyPI.