pm4py.algo.filtering.pandas.attributes package¶

Submodules¶

pm4py.algo.filtering.pandas.attributes.attributes_filter module¶

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.

class pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters[source]¶

Bases: enum.Enum

An enumeration.

ACTIVITY_KEY = 'pm4py:param:activity_key'¶

ATTRIBUTE_KEY = 'pm4py:param:attribute_key'¶

CASE_ID_KEY = 'pm4py:param:case_id_key'¶

DECREASING_FACTOR = 'decreasingFactor'¶

KEEP_ONCE_PER_CASE = 'keep_once_per_case'¶

POSITIVE = 'positive'¶

STREAM_FILTER_KEY1 = 'stream_filter_key1'¶

STREAM_FILTER_KEY2 = 'stream_filter_key2'¶

STREAM_FILTER_VALUE1 = 'stream_filter_value1'¶

STREAM_FILTER_VALUE2 = 'stream_filter_value2'¶

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply(df: pandas.core.frame.DataFrame, values: List[str], parameters: Optional[Dict[Union[str, pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters], Any]] = None) → pandas.core.frame.DataFrame[source]¶

Filter dataframe on attribute values (filter traces)

Parameters:	df – Dataframe values – Values to filter on parameters – Possible parameters of the algorithm, including: Parameters.CASE_ID_KEY -> Case ID column in the dataframe Parameters.ATTRIBUTE_KEY -> Attribute we want to filter Parameters.POSITIVE -> Specifies if the filter should be applied including traces (positive=True) or excluding traces (positive=False)
Returns:	Filtered dataframe
Return type:	df

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_auto_filter(df, parameters=None)[source]¶

Apply auto filter on activity values

Parameters:	df – Dataframe parameters – Possible parameters of the algorithm, including: Parameters.ACTIVITY_KEY -> Column containing the activity Parameters.DECREASING_FACTOR -> Decreasing factor that should be passed to the algorithm
Returns:	Filtered dataframe
Return type:	df

Deprecated since version 2.2.11: This will be removed in 3.0.0. Removed

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_events(df: pandas.core.frame.DataFrame, values: List[str], parameters: Optional[Dict[Union[str, pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters], Any]] = None) → pandas.core.frame.DataFrame[source]¶

Filter dataframe on attribute values (filter events)

Parameters:	df – Dataframe values – Values to filter on parameters – Possible parameters of the algorithm, including: Parameters.ATTRIBUTE_KEY -> Attribute we want to filter Parameters.POSITIVE -> Specifies if the filter should be applied including traces (positive=True) or excluding traces (positive=False)
Returns:	Filtered dataframe
Return type:	df

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_numeric(df: pandas.core.frame.DataFrame, int1: float, int2: float, parameters: Optional[Dict[Union[str, pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters], Any]] = None) → pandas.core.frame.DataFrame[source]¶

Filter dataframe on attribute values (filter cases)

Parameters:	df – Dataframe int1 – Lower bound of the interval int2 – Upper bound of the interval parameters – Possible parameters of the algorithm: Parameters.ATTRIBUTE_KEY => indicates which attribute to filter Parameters.POSITIVE => keep or remove traces with such events?
Returns:	Filtered dataframe
Return type:	filtered_df

pm4py.algo.filtering.pandas.attributes.attributes_filter.apply_numeric_events(df: pandas.core.frame.DataFrame, int1: float, int2: float, parameters: Optional[Dict[Union[str, pm4py.algo.filtering.pandas.attributes.attributes_filter.Parameters], Any]] = None) → pandas.core.frame.DataFrame[source]¶

Apply a filter on events (numerical filter)

Parameters:	df – Dataframe int1 – Lower bound of the interval int2 – Upper bound of the interval parameters – Possible parameters of the algorithm: Parameters.ATTRIBUTE_KEY => indicates which attribute to filter positive => keep or remove events?
Returns:	Filtered dataframe
Return type:	filtered_df

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_keeping_activ_exc_thresh(df, thresh, act_count0=None, activity_key='concept:name', most_common_variant=None)[source]¶

Filter a dataframe keeping activities exceeding the threshold

Parameters:	df – Pandas dataframe thresh – Threshold to use to cut activities act_count0 – (If provided) Dictionary that associates each activity with its count activity_key – Column in which the activity is present
Returns:	Filtered dataframe
Return type:	df

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_keeping_spno_activities(df: pandas.core.frame.DataFrame, activity_key: str = 'concept:name', max_no_activities: int = 25)[source]¶

Filter a dataframe on the specified number of attributes

Parameters:	df – Dataframe activity_key – Activity key in dataframe (must be specified if different from concept:name) max_no_activities – Maximum allowed number of attributes
Returns:	Filtered dataframe
Return type:	df

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_on_attribute_values(df, values, case_id_glue='case:concept:name', attribute_key='concept:name', positive=True)[source]¶

Filter dataframe on attribute values

Parameters:	df – Dataframe values – Values to filter on case_id_glue – Case ID column in the dataframe attribute_key – Attribute we want to filter positive – Specifies if the filtered should be applied including traces (positive=True) or excluding traces (positive=False)
Returns:	Filtered dataframe
Return type:	df

pm4py.algo.filtering.pandas.attributes.attributes_filter.filter_df_relative_occurrence_event_attribute(df: pandas.core.frame.DataFrame, min_relative_stake: float, parameters: Optional[Dict[Any, Any]] = None) → pandas.core.frame.DataFrame[source]¶

Filters the event log keeping only the events having an attribute value which occurs: - in at least the specified (min_relative_stake) percentage of events, when Parameters.KEEP_ONCE_PER_CASE = False - in at least the specified (min_relative_stake) percentage of cases, when Parameters.KEEP_ONCE_PER_CASE = True

Parameters:	df – Pandas dataframe min_relative_stake – Minimum percentage of cases (expressed as a number between 0 and 1) in which the attribute should occur. parameters – Parameters of the algorithm, including: - Parameters.ATTRIBUTE_KEY => the attribute to use (default: concept:name) - Parameters.KEEP_ONCE_PER_CASE => decides the level of the filter to apply (if the filter should be applied on the cases, set it to True).
Returns:	Filtered Pandas dataframe
Return type:	filtered_df

Module contents¶

This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).

PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.