src.encoding package

Subpackages

Submodules

src.encoding.apps module

class src.encoding.apps.EncodingConfig(app_name, app_module)

Bases: django.apps.config.AppConfig

name = 'src.encoding'

src.encoding.boolean_frequency module

src.encoding.boolean_frequency.boolean(log, event_names, label, encoding)
Return type:DataFrame
src.encoding.boolean_frequency.frequency(log, event_names, label, encoding)
Return type:DataFrame

src.encoding.common module

src.encoding.common.encode_label_log(run_log, encoding, job_type, labelling, event_names=None, additional_columns=None, fit_encoder=False)
src.encoding.common.encode_label_logs(training_log, test_log, job, additional_columns=None)

src.encoding.complex_last_payload module

src.encoding.complex_last_payload.complex(log, labelling, encoding, additional_columns)
Return type:DataFrame
src.encoding.complex_last_payload.last_payload(log, labelling, encoding, additional_columns)
Return type:DataFrame

src.encoding.encoder module

class src.encoding.encoder.Encoder(df, encoding)

Bases: object

encode(df, encoding)
Return type:None

src.encoding.encoding_container module

class src.encoding.encoding_container.EncodingContainer

Bases: src.encoding.encoding_container.EncodingContainer

Inner object describing encoding configuration.

static encode(df)
Return type:None
static init_label_encoder(df)
Return type:None
is_all_in_one()
Return type:bool
is_boolean()
Return type:bool
is_complex()
Return type:bool
is_zero_padding()
Return type:bool

src.encoding.encoding_parser module

class src.encoding.encoding_parser.DataEncoder(task, is_targets_dataset=False)

Bases: object

support class for EncodingParser, tasked with actual parsing/one-hot encoding

class DataTypes

Bases: enum.Enum

possible data types for each column

CATEGORICAL = 'categorical'
NUMERIC = 'numeric'
build_encoders(data)

builds an encoder for each column

first the base headers are extracted (prefix_1 -> prefix, org:resources:Amount_1 -> org_resources:Amount) and then a dictionary of LabelEncoders is built. Numerical data stores min and max instead of a LabelEncoder.

Parameters:data (DataFrame) – input dataframe
Return type:None
encode_data(data, train=True)

encodes the input data

actual data encoding, using the built encoders. For each column type the right encoding is done (to class/normalization)

Parameters:
  • data (DataFrame) – input dataframe
  • train (bool) – flag indicating whether the input is a train dataframe or a test one
Return type:

None

get_n_classes_x()

returns the number of training/test classes

returns the highest number of classes for the encoded dataframe, adding 1 if there are numerical values. The structure is [one-hot encoding, normalized_value] for each variable, such that a categorical variable becomes [0 0 0 1 0.0] where a numerical value becomes [0 0 0 0 0 0.263]

Returns:number of training/test classes + 1 (for numerical values)
get_numerical_limits(header='label')

returns the numerical limits for the input header

returns the min and max value from the stored LabelEncoders, using header as index

Parameters:header – label associated with the data we want to extract min and max from
Returns:min and max values associated with the column _header_
to_one_hot(data)

one hot encoding

transforms the encoded data into the one-hot representation

Parameters:data (DataFrame) – input dataframe
Return type:ndarray
Returns:one-hot encoded array
class src.encoding.encoding_parser.EncodingParser(encoding, binary_target, task)

Bases: object

parses the encoded datasets into a suitable format for the keras models (0-1 float range, one-hot encodable classes etc.), plus minor utils

denormalize_predictions(predictions)

denormalizes the predictive_model predictions

denormalizes the predictions using the stored y min and max

Parameters:predictions (ndarray) – predictive_model predictions
Return type:ndarray
Returns:denormalized predictions
get_n_classes_x()
parse_targets(targets)

parses the target dataset

encodes the target dataset based on the encoding given in the init method. Stores min and max value/classes number based on the encoding :type targets: DataFrame :param targets: input dataset :rtype: ndarray :return: parsed input dataset

parse_testing_dataset(test_data)

parses the test dataset

encodes the test dataset based on the encoding given in the init method :type test_data: DataFrame :param test_data: input dataset :rtype: ndarray :return: parsed input dataset

parse_training_dataset(train_data)

parses the training dataset

encodes the training dataset based on the encoding given in the init method :type train_data: DataFrame :param train_data: input dataset :rtype: ndarray :return: parsed input dataset

src.encoding.models module

class src.encoding.models.DataEncodings

Bases: enum.Enum

An enumeration.

LABEL_ENCODER = 'label_encoder'
ONE_HOT_ENCODER = 'one_hot'
class src.encoding.models.Encoding(id, data_encoding, value_encoding, add_elapsed_time, add_remaining_time, add_executed_events, add_resources_used, add_new_traces, features, prefix_length, padding, task_generation_type)

Bases: src.common.models.CommonModel

exception DoesNotExist

Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: django.core.exceptions.MultipleObjectsReturned

add_elapsed_time

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_executed_events

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_new_traces

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_remaining_time

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_resources_used

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

data_encoding

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

features

A placeholder class that provides a way to set the attribute on the model.

get_data_encoding_display(*, field=<django.db.models.fields.CharField: data_encoding>)
get_task_generation_type_display(*, field=<django.db.models.fields.CharField: task_generation_type>)
get_value_encoding_display(*, field=<django.db.models.fields.CharField: value_encoding>)
id

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

job_set

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

labelledlog_set

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

objects = <django.db.models.manager.Manager object>
padding

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

prefix_length

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

task_generation_type

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

to_dict()
Return type:dict
value_encoding

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

class src.encoding.models.TaskGenerationTypes

Bases: enum.Enum

An enumeration.

ALL_IN_ONE = 'all_in_one'
ONLY_THIS = 'only'
UP_TO = 'up_to'
class src.encoding.models.ValueEncodings

Bases: enum.Enum

An enumeration.

BOOLEAN = 'boolean'
COMPLEX = 'complex'
FREQUENCY = 'frequency'
LAST_PAYLOAD = 'lastPayload'
SIMPLE_INDEX = 'simpleIndex'

src.encoding.simple_index module

src.encoding.simple_index.add_trace_row(trace, encoding, labelling, event_index, column_len, attribute_classifier=None, executed_events=None, resources_used=None, new_traces=None)

Row in data frame

src.encoding.simple_index.simple_index(log, labelling, encoding)
Return type:DataFrame

Module contents