pm4py.statistics.traces.generic.pandas package¶
Submodules¶
pm4py.statistics.traces.generic.pandas.case_arrival module¶
This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).
PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.
-
class
pm4py.statistics.traces.generic.pandas.case_arrival.Parameters[source]¶ Bases:
enum.EnumAn enumeration.
-
ACTIVITY_KEY= 'pm4py:param:activity_key'¶
-
ATTRIBUTE_KEY= 'pm4py:param:attribute_key'¶
-
CASE_ID_KEY= 'pm4py:param:case_id_key'¶
-
KEEP_ONCE_PER_CASE= 'keep_once_per_case'¶
-
MAX_NO_POINTS_SAMPLE= 'max_no_of_points_to_sample'¶
-
START_TIMESTAMP_KEY= 'pm4py:param:start_timestamp_key'¶
-
TIMESTAMP_KEY= 'pm4py:param:timestamp_key'¶
-
-
pm4py.statistics.traces.generic.pandas.case_arrival.get_case_arrival_avg(df: pandas.core.frame.DataFrame, parameters: Optional[Dict[Union[str, pm4py.statistics.traces.generic.pandas.case_arrival.Parameters], Any]] = None) → float[source]¶ Gets the average time interlapsed between case starts
Parameters: df – Pandas dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.TIMESTAMP_KEY -> attribute of the log to be used as timestamp
Returns: Average time interlapsed between case starts
Return type: case_arrival_avg
-
pm4py.statistics.traces.generic.pandas.case_arrival.get_case_dispersion_avg(df, parameters=None)[source]¶ Gets the average time interlapsed between case ends
Parameters: df – Pandas dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.TIMESTAMP_KEY -> attribute of the log to be used as timestamp
Returns: Average time interlapsed between the completion of cases
Return type: case_dispersion_avg
pm4py.statistics.traces.generic.pandas.case_statistics module¶
This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).
PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.
-
class
pm4py.statistics.traces.generic.pandas.case_statistics.Parameters[source]¶ Bases:
enum.EnumAn enumeration.
-
ACTIVITY_KEY= 'pm4py:param:activity_key'¶
-
ATTRIBUTE_KEY= 'pm4py:param:attribute_key'¶
-
BUSINESS_HOURS= 'business_hours'¶
-
CASE_ID_KEY= 'pm4py:param:case_id_key'¶
-
ENABLE_SORT= 'enable_sort'¶
-
MAX_RET_CASES= 'max_ret_cases'¶
-
MAX_VARIANTS_TO_RETURN= 'max_variants_to_return'¶
-
SORT_ASCENDING= 'sort_ascending'¶
-
SORT_BY_COLUMN= 'sort_by_column'¶
-
TIMESTAMP_KEY= 'pm4py:param:timestamp_key'¶
-
VARIANTS_DF= 'variants_df'¶
-
WEEKENDS= 'weekends'¶
-
WORKCALENDAR= 'workcalendar'¶
-
WORKTIMING= 'worktiming'¶
-
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_all_case_durations(df, parameters=None)[source]¶ Gets all the case durations out of the log
Parameters: - df – Pandas dataframe
- parameters – Possible parameters of the algorithm
Returns: List of all duration values
Return type: duration_values
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_cases_description(df: pandas.core.frame.DataFrame, parameters: Optional[Dict[Union[str, pm4py.statistics.traces.generic.pandas.case_statistics.Parameters], Any]] = None) → Dict[str, Dict[str, Any]][source]¶ Get a description of traces present in the Pandas dataframe
Parameters: df – Pandas dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that identifies the case ID Parameters.TIMESTAMP_KEY -> Column that identifies the timestamp enable_sort -> Enable sorting of traces Parameters.SORT_BY_COLUMN -> Sort traces inside the dataframe using the specified column. Admitted values: startTime, endTime, caseDuration Parameters.SORT_ASCENDING -> Set sort direction (boolean; it true then the sort direction is ascending, otherwise descending) Parameters.MAX_RET_CASES -> Set the maximum number of returned traces
Returns: Dictionary of traces associated to their start timestamp, their end timestamp and their duration
Return type: ret
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_events(df: pandas.core.frame.DataFrame, case_id: str, parameters: Optional[Dict[Union[str, pm4py.statistics.traces.generic.pandas.case_statistics.Parameters], Any]] = None) → List[Dict[str, Any]][source]¶ Get events belonging to the specified case
Parameters: df – Pandas dataframe
case_id – Required case ID
parameters –
- Possible parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column in which the case ID is contained
Returns: List of events belonging to the case
Return type: list_eve
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_first_quartile_case_duration(df, parameters=None)[source]¶ Gets the first quartile out of the log
Parameters: - df – Pandas dataframe
- parameters – Possible parameters of the algorithm
Returns: First quartile value
Return type: value
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration(df, parameters=None)[source]¶ Gets the estimation of KDE density for the case durations calculated on the dataframe
Parameters: df – Pandas dataframe
parameters –
- Possible parameters of the algorithm, including:
Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID
Returns: - x – X-axis values to represent
- y – Y-axis values to represent
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_kde_caseduration_json(df, parameters=None)[source]¶ Gets the estimation of KDE density for the case durations calculated on the log/dataframe (expressed as JSON)
Parameters: df – Pandas dataframe
parameters –
- Possible parameters of the algorithm, including:
Parameters.GRAPH_POINTS -> number of points to include in the graph Parameters.CASE_ID_KEY -> Column hosting the Case ID
Returns: JSON representing the graph points
Return type: json
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_median_case_duration(df, parameters=None)[source]¶ Gets the median case duration out of the log
Parameters: - df – Pandas dataframe
- parameters – Possible parameters of the algorithm
Returns: Median duration value
Return type: value
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_variant_statistics(df: pandas.core.frame.DataFrame, parameters: Optional[Dict[Union[str, pm4py.statistics.traces.generic.pandas.case_statistics.Parameters], Any]] = None) → Union[List[Dict[str, int]], List[Dict[List[str], int]]][source]¶ Get variants from a Pandas dataframe
Parameters: df – Dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.MAX_VARIANTS_TO_RETURN -> Maximum number of variants to return variants_df -> If provided, avoid recalculation of the variants dataframe
Returns: List of variants inside the Pandas dataframe
Return type: variants_list
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df(df, parameters=None)[source]¶ Get variants dataframe from a Pandas dataframe
Parameters: df – Dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity
Returns: Variants dataframe
Return type: variants_df
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_and_list(df: pandas.core.frame.DataFrame, parameters: Optional[Dict[Union[str, pm4py.statistics.traces.generic.pandas.case_statistics.Parameters], Any]] = None) → Tuple[pandas.core.frame.DataFrame, Union[List[Dict[str, int]], List[Dict[List[str], int]]]][source]¶ (Technical method) Provides variants_df and variants_list out of the box
Parameters: df – Dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity
Returns: - variants_df – Variants dataframe
- variants_list – List of variants sorted by their count
-
pm4py.statistics.traces.generic.pandas.case_statistics.get_variants_df_with_case_duration(df, parameters=None)[source]¶ Get variants dataframe from a Pandas dataframe, with case duration that is included
Parameters: df – Dataframe
parameters –
- Parameters of the algorithm, including:
Parameters.CASE_ID_KEY -> Column that contains the Case ID Parameters.ACTIVITY_KEY -> Column that contains the activity Parameters.TIMESTAMP_KEY -> Column that contains the timestamp
Returns: Variants dataframe
Return type: variants_df
Module contents¶
This file is part of PM4Py (More Info: https://pm4py.fit.fraunhofer.de).
PM4Py is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
PM4Py is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with PM4Py. If not, see <https://www.gnu.org/licenses/>.