src.encoding package¶

Subpackages¶

Submodules¶

src.encoding.apps module¶

class src.encoding.apps.EncodingConfig(app_name, app_module)¶

Bases: django.apps.config.AppConfig

name = 'src.encoding'¶

src.encoding.boolean_frequency module¶

src.encoding.boolean_frequency.boolean(log, event_names, label, encoding)¶

Return type:	`DataFrame`

src.encoding.boolean_frequency.frequency(log, event_names, label, encoding)¶

Return type:	`DataFrame`

src.encoding.common module¶

src.encoding.common.encode_label_log(run_log, encoding, job_type, labelling, event_names=None, additional_columns=None, fit_encoder=False)¶

src.encoding.common.encode_label_logs(training_log, test_log, job, additional_columns=None)¶

src.encoding.complex_last_payload module¶

src.encoding.complex_last_payload.complex(log, labelling, encoding, additional_columns)¶

Return type:	`DataFrame`

src.encoding.complex_last_payload.last_payload(log, labelling, encoding, additional_columns)¶

Return type:	`DataFrame`

src.encoding.encoder module¶

class src.encoding.encoder.Encoder(df, encoding)¶

Bases: object

encode(df, encoding)¶

Return type:	`None`

src.encoding.encoding_container module¶

class src.encoding.encoding_container.EncodingContainer¶

Bases: src.encoding.encoding_container.EncodingContainer

Inner object describing encoding configuration.

static encode(df)¶

Return type:	`None`

static init_label_encoder(df)¶

Return type:	`None`

is_all_in_one()¶

Return type:	`bool`

is_boolean()¶

Return type:	`bool`

is_complex()¶

Return type:	`bool`

is_zero_padding()¶

Return type:	`bool`

src.encoding.encoding_parser module¶

class src.encoding.encoding_parser.DataEncoder(task, is_targets_dataset=False)¶

Bases: object

support class for EncodingParser, tasked with actual parsing/one-hot encoding

class DataTypes¶

Bases: enum.Enum

possible data types for each column

CATEGORICAL = 'categorical'¶

NUMERIC = 'numeric'¶

build_encoders(data)¶

builds an encoder for each column

first the base headers are extracted (prefix_1 -> prefix, org:resources:Amount_1 -> org_resources:Amount) and then a dictionary of LabelEncoders is built. Numerical data stores min and max instead of a LabelEncoder.

Parameters:	data (`DataFrame`) – input dataframe
Return type:	`None`

encode_data(data, train=True)¶

encodes the input data

actual data encoding, using the built encoders. For each column type the right encoding is done (to class/normalization)

Parameters:	data (`DataFrame`) – input dataframe train (`bool`) – flag indicating whether the input is a train dataframe or a test one
Return type:	`None`

get_n_classes_x()¶

returns the number of training/test classes

returns the highest number of classes for the encoded dataframe, adding 1 if there are numerical values. The structure is [one-hot encoding, normalized_value] for each variable, such that a categorical variable becomes [0 0 0 1 0.0] where a numerical value becomes [0 0 0 0 0 0.263]

Returns:	number of training/test classes + 1 (for numerical values)

get_numerical_limits(header='label')¶

returns the numerical limits for the input header

returns the min and max value from the stored LabelEncoders, using header as index

Parameters:	header – label associated with the data we want to extract min and max from
Returns:	min and max values associated with the column _header_

to_one_hot(data)¶

one hot encoding

transforms the encoded data into the one-hot representation

Parameters:	data (`DataFrame`) – input dataframe
Return type:	`ndarray`
Returns:	one-hot encoded array

class src.encoding.encoding_parser.EncodingParser(encoding, binary_target, task)¶

Bases: object

parses the encoded datasets into a suitable format for the keras models (0-1 float range, one-hot encodable classes etc.), plus minor utils

denormalize_predictions(predictions)¶

denormalizes the predictive_model predictions

denormalizes the predictions using the stored y min and max

Parameters:	predictions (`ndarray`) – predictive_model predictions
Return type:	`ndarray`
Returns:	denormalized predictions

get_n_classes_x()¶

parse_targets(targets)¶

parses the target dataset

encodes the target dataset based on the encoding given in the init method. Stores min and max value/classes number based on the encoding :type targets: DataFrame :param targets: input dataset :rtype: ndarray :return: parsed input dataset

parse_testing_dataset(test_data)¶

parses the test dataset

encodes the test dataset based on the encoding given in the init method :type test_data: DataFrame :param test_data: input dataset :rtype: ndarray :return: parsed input dataset

parse_training_dataset(train_data)¶

parses the training dataset

encodes the training dataset based on the encoding given in the init method :type train_data: DataFrame :param train_data: input dataset :rtype: ndarray :return: parsed input dataset

src.encoding.models module¶

class src.encoding.models.DataEncodings¶

Bases: enum.Enum

An enumeration.

LABEL_ENCODER = 'label_encoder'¶

ONE_HOT_ENCODER = 'one_hot'¶

class src.encoding.models.Encoding(id, data_encoding, value_encoding, add_elapsed_time, add_remaining_time, add_executed_events, add_resources_used, add_new_traces, features, prefix_length, padding, task_generation_type)¶

Bases: src.common.models.CommonModel

exception DoesNotExist¶: Bases: django.core.exceptions.ObjectDoesNotExist

exception MultipleObjectsReturned¶: Bases: django.core.exceptions.MultipleObjectsReturned

add_elapsed_time¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_executed_events¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_new_traces¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_remaining_time¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

add_resources_used¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

data_encoding¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

features¶: A placeholder class that provides a way to set the attribute on the model.

get_data_encoding_display(*, field=<django.db.models.fields.CharField: data_encoding>)¶

get_task_generation_type_display(*, field=<django.db.models.fields.CharField: task_generation_type>)¶

get_value_encoding_display(*, field=<django.db.models.fields.CharField: value_encoding>)¶

id¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

job_set¶

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

labelledlog_set¶

Accessor to the related objects manager on the reverse side of a many-to-one relation.

In the example:

class Child(Model):
    parent = ForeignKey(Parent, related_name='children')

Parent.children is a ReverseManyToOneDescriptor instance.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

objects = <django.db.models.manager.Manager object>¶

padding¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

prefix_length¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

task_generation_type¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

to_dict()¶

Return type:	`dict`

value_encoding¶: A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

class src.encoding.models.TaskGenerationTypes¶

Bases: enum.Enum

An enumeration.

ALL_IN_ONE = 'all_in_one'¶

ONLY_THIS = 'only'¶

UP_TO = 'up_to'¶

class src.encoding.models.ValueEncodings¶

Bases: enum.Enum

An enumeration.

BOOLEAN = 'boolean'¶

COMPLEX = 'complex'¶

FREQUENCY = 'frequency'¶

LAST_PAYLOAD = 'lastPayload'¶

SIMPLE_INDEX = 'simpleIndex'¶

src.encoding.simple_index module¶

src.encoding.simple_index.add_trace_row(trace, encoding, labelling, event_index, column_len, attribute_classifier=None, executed_events=None, resources_used=None, new_traces=None)¶: Row in data frame

src.encoding.simple_index.simple_index(log, labelling, encoding)¶

Return type:	`DataFrame`

src.encoding package¶

Subpackages¶

Submodules¶

src.encoding.apps module¶

src.encoding.boolean_frequency module¶

src.encoding.common module¶

src.encoding.complex_last_payload module¶

src.encoding.encoder module¶

src.encoding.encoding_container module¶

src.encoding.encoding_parser module¶

src.encoding.models module¶

src.encoding.simple_index module¶

Module contents¶