src.encoding package¶
Subpackages¶
Submodules¶
src.encoding.apps module¶
src.encoding.boolean_frequency module¶
-
src.encoding.boolean_frequency.boolean(log, event_names, label, encoding)¶ Return type: DataFrame
-
src.encoding.boolean_frequency.frequency(log, event_names, label, encoding)¶ Return type: DataFrame
src.encoding.common module¶
-
src.encoding.common.encode_label_log(run_log, encoding, job_type, labelling, event_names=None, additional_columns=None, fit_encoder=False)¶
-
src.encoding.common.encode_label_logs(training_log, test_log, job, additional_columns=None)¶
src.encoding.complex_last_payload module¶
-
src.encoding.complex_last_payload.complex(log, labelling, encoding, additional_columns)¶ Return type: DataFrame
-
src.encoding.complex_last_payload.last_payload(log, labelling, encoding, additional_columns)¶ Return type: DataFrame
src.encoding.encoder module¶
src.encoding.encoding_container module¶
-
class
src.encoding.encoding_container.EncodingContainer¶ Bases:
src.encoding.encoding_container.EncodingContainerInner object describing encoding configuration.
-
static
encode(df)¶ Return type: None
-
static
init_label_encoder(df)¶ Return type: None
-
is_all_in_one()¶ Return type: bool
-
is_boolean()¶ Return type: bool
-
is_complex()¶ Return type: bool
-
is_zero_padding()¶ Return type: bool
-
static
src.encoding.encoding_parser module¶
-
class
src.encoding.encoding_parser.DataEncoder(task, is_targets_dataset=False)¶ Bases:
objectsupport class for EncodingParser, tasked with actual parsing/one-hot encoding
-
class
DataTypes¶ Bases:
enum.Enumpossible data types for each column
-
CATEGORICAL= 'categorical'¶
-
NUMERIC= 'numeric'¶
-
-
build_encoders(data)¶ builds an encoder for each column
first the base headers are extracted (prefix_1 -> prefix, org:resources:Amount_1 -> org_resources:Amount) and then a dictionary of LabelEncoders is built. Numerical data stores min and max instead of a LabelEncoder.
Parameters: data ( DataFrame) – input dataframeReturn type: None
-
encode_data(data, train=True)¶ encodes the input data
actual data encoding, using the built encoders. For each column type the right encoding is done (to class/normalization)
Parameters: - data (
DataFrame) – input dataframe - train (
bool) – flag indicating whether the input is a train dataframe or a test one
Return type: None- data (
-
get_n_classes_x()¶ returns the number of training/test classes
returns the highest number of classes for the encoded dataframe, adding 1 if there are numerical values. The structure is [one-hot encoding, normalized_value] for each variable, such that a categorical variable becomes [0 0 0 1 0.0] where a numerical value becomes [0 0 0 0 0 0.263]
Returns: number of training/test classes + 1 (for numerical values)
-
get_numerical_limits(header='label')¶ returns the numerical limits for the input header
returns the min and max value from the stored LabelEncoders, using header as index
Parameters: header – label associated with the data we want to extract min and max from Returns: min and max values associated with the column _header_
-
to_one_hot(data)¶ one hot encoding
transforms the encoded data into the one-hot representation
Parameters: data ( DataFrame) – input dataframeReturn type: ndarrayReturns: one-hot encoded array
-
class
-
class
src.encoding.encoding_parser.EncodingParser(encoding, binary_target, task)¶ Bases:
objectparses the encoded datasets into a suitable format for the keras models (0-1 float range, one-hot encodable classes etc.), plus minor utils
-
denormalize_predictions(predictions)¶ denormalizes the predictive_model predictions
denormalizes the predictions using the stored y min and max
Parameters: predictions ( ndarray) – predictive_model predictionsReturn type: ndarrayReturns: denormalized predictions
-
get_n_classes_x()¶
-
parse_targets(targets)¶ parses the target dataset
encodes the target dataset based on the encoding given in the init method. Stores min and max value/classes number based on the encoding :type targets:
DataFrame:param targets: input dataset :rtype:ndarray:return: parsed input dataset
-
parse_testing_dataset(test_data)¶ parses the test dataset
encodes the test dataset based on the encoding given in the init method :type test_data:
DataFrame:param test_data: input dataset :rtype:ndarray:return: parsed input dataset
-
parse_training_dataset(train_data)¶ parses the training dataset
encodes the training dataset based on the encoding given in the init method :type train_data:
DataFrame:param train_data: input dataset :rtype:ndarray:return: parsed input dataset
-
src.encoding.models module¶
-
class
src.encoding.models.DataEncodings¶ Bases:
enum.EnumAn enumeration.
-
LABEL_ENCODER= 'label_encoder'¶
-
ONE_HOT_ENCODER= 'one_hot'¶
-
-
class
src.encoding.models.Encoding(id, data_encoding, value_encoding, add_elapsed_time, add_remaining_time, add_executed_events, add_resources_used, add_new_traces, features, prefix_length, padding, task_generation_type)¶ Bases:
src.common.models.CommonModel-
exception
DoesNotExist¶ Bases:
django.core.exceptions.ObjectDoesNotExist
-
exception
MultipleObjectsReturned¶ Bases:
django.core.exceptions.MultipleObjectsReturned
-
add_elapsed_time¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_executed_events¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_new_traces¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_remaining_time¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_resources_used¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
data_encoding¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
features¶ A placeholder class that provides a way to set the attribute on the model.
-
get_data_encoding_display(*, field=<django.db.models.fields.CharField: data_encoding>)¶
-
get_task_generation_type_display(*, field=<django.db.models.fields.CharField: task_generation_type>)¶
-
get_value_encoding_display(*, field=<django.db.models.fields.CharField: value_encoding>)¶
-
id¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
job_set¶ Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.childrenis aReverseManyToOneDescriptorinstance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()defined below.
-
labelledlog_set¶ Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.childrenis aReverseManyToOneDescriptorinstance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()defined below.
-
objects= <django.db.models.manager.Manager object>¶
-
padding¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
prefix_length¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
task_generation_type¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
to_dict()¶ Return type: dict
-
value_encoding¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
exception
src.encoding.simple_index module¶
-
src.encoding.simple_index.add_trace_row(trace, encoding, labelling, event_index, column_len, attribute_classifier=None, executed_events=None, resources_used=None, new_traces=None)¶ Row in data frame
-
src.encoding.simple_index.simple_index(log, labelling, encoding)¶ Return type: DataFrame