src.encoding package¶
Subpackages¶
Submodules¶
src.encoding.apps module¶
src.encoding.boolean_frequency module¶
-
src.encoding.boolean_frequency.
boolean
(log, event_names, label, encoding)¶ Return type: DataFrame
-
src.encoding.boolean_frequency.
frequency
(log, event_names, label, encoding)¶ Return type: DataFrame
src.encoding.common module¶
-
src.encoding.common.
encode_label_log
(run_log, encoding, job_type, labelling, event_names=None, additional_columns=None, fit_encoder=False)¶
-
src.encoding.common.
encode_label_logs
(training_log, test_log, job, additional_columns=None)¶
src.encoding.complex_last_payload module¶
-
src.encoding.complex_last_payload.
complex
(log, labelling, encoding, additional_columns)¶ Return type: DataFrame
-
src.encoding.complex_last_payload.
last_payload
(log, labelling, encoding, additional_columns)¶ Return type: DataFrame
src.encoding.encoder module¶
src.encoding.encoding_container module¶
-
class
src.encoding.encoding_container.
EncodingContainer
¶ Bases:
src.encoding.encoding_container.EncodingContainer
Inner object describing encoding configuration.
-
static
encode
(df)¶ Return type: None
-
static
init_label_encoder
(df)¶ Return type: None
-
is_all_in_one
()¶ Return type: bool
-
is_boolean
()¶ Return type: bool
-
is_complex
()¶ Return type: bool
-
is_zero_padding
()¶ Return type: bool
-
static
src.encoding.encoding_parser module¶
-
class
src.encoding.encoding_parser.
DataEncoder
(task, is_targets_dataset=False)¶ Bases:
object
support class for EncodingParser, tasked with actual parsing/one-hot encoding
-
class
DataTypes
¶ Bases:
enum.Enum
possible data types for each column
-
CATEGORICAL
= 'categorical'¶
-
NUMERIC
= 'numeric'¶
-
-
build_encoders
(data)¶ builds an encoder for each column
first the base headers are extracted (prefix_1 -> prefix, org:resources:Amount_1 -> org_resources:Amount) and then a dictionary of LabelEncoders is built. Numerical data stores min and max instead of a LabelEncoder.
Parameters: data ( DataFrame
) – input dataframeReturn type: None
-
encode_data
(data, train=True)¶ encodes the input data
actual data encoding, using the built encoders. For each column type the right encoding is done (to class/normalization)
Parameters: - data (
DataFrame
) – input dataframe - train (
bool
) – flag indicating whether the input is a train dataframe or a test one
Return type: None
- data (
-
get_n_classes_x
()¶ returns the number of training/test classes
returns the highest number of classes for the encoded dataframe, adding 1 if there are numerical values. The structure is [one-hot encoding, normalized_value] for each variable, such that a categorical variable becomes [0 0 0 1 0.0] where a numerical value becomes [0 0 0 0 0 0.263]
Returns: number of training/test classes + 1 (for numerical values)
-
get_numerical_limits
(header='label')¶ returns the numerical limits for the input header
returns the min and max value from the stored LabelEncoders, using header as index
Parameters: header – label associated with the data we want to extract min and max from Returns: min and max values associated with the column _header_
-
to_one_hot
(data)¶ one hot encoding
transforms the encoded data into the one-hot representation
Parameters: data ( DataFrame
) – input dataframeReturn type: ndarray
Returns: one-hot encoded array
-
class
-
class
src.encoding.encoding_parser.
EncodingParser
(encoding, binary_target, task)¶ Bases:
object
parses the encoded datasets into a suitable format for the keras models (0-1 float range, one-hot encodable classes etc.), plus minor utils
-
denormalize_predictions
(predictions)¶ denormalizes the predictive_model predictions
denormalizes the predictions using the stored y min and max
Parameters: predictions ( ndarray
) – predictive_model predictionsReturn type: ndarray
Returns: denormalized predictions
-
get_n_classes_x
()¶
-
parse_targets
(targets)¶ parses the target dataset
encodes the target dataset based on the encoding given in the init method. Stores min and max value/classes number based on the encoding :type targets:
DataFrame
:param targets: input dataset :rtype:ndarray
:return: parsed input dataset
-
parse_testing_dataset
(test_data)¶ parses the test dataset
encodes the test dataset based on the encoding given in the init method :type test_data:
DataFrame
:param test_data: input dataset :rtype:ndarray
:return: parsed input dataset
-
parse_training_dataset
(train_data)¶ parses the training dataset
encodes the training dataset based on the encoding given in the init method :type train_data:
DataFrame
:param train_data: input dataset :rtype:ndarray
:return: parsed input dataset
-
src.encoding.models module¶
-
class
src.encoding.models.
DataEncodings
¶ Bases:
enum.Enum
An enumeration.
-
LABEL_ENCODER
= 'label_encoder'¶
-
ONE_HOT_ENCODER
= 'one_hot'¶
-
-
class
src.encoding.models.
Encoding
(id, data_encoding, value_encoding, add_elapsed_time, add_remaining_time, add_executed_events, add_resources_used, add_new_traces, features, prefix_length, padding, task_generation_type)¶ Bases:
src.common.models.CommonModel
-
exception
DoesNotExist
¶ Bases:
django.core.exceptions.ObjectDoesNotExist
-
exception
MultipleObjectsReturned
¶ Bases:
django.core.exceptions.MultipleObjectsReturned
-
add_elapsed_time
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_executed_events
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_new_traces
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_remaining_time
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
add_resources_used
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
data_encoding
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
features
¶ A placeholder class that provides a way to set the attribute on the model.
-
get_data_encoding_display
(*, field=<django.db.models.fields.CharField: data_encoding>)¶
-
get_task_generation_type_display
(*, field=<django.db.models.fields.CharField: task_generation_type>)¶
-
get_value_encoding_display
(*, field=<django.db.models.fields.CharField: value_encoding>)¶
-
id
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
job_set
¶ Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
-
labelledlog_set
¶ Accessor to the related objects manager on the reverse side of a many-to-one relation.
In the example:
class Child(Model): parent = ForeignKey(Parent, related_name='children')
Parent.children
is aReverseManyToOneDescriptor
instance.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()
defined below.
-
objects
= <django.db.models.manager.Manager object>¶
-
padding
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
prefix_length
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
task_generation_type
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
to_dict
()¶ Return type: dict
-
value_encoding
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
exception
src.encoding.simple_index module¶
-
src.encoding.simple_index.
add_trace_row
(trace, encoding, labelling, event_index, column_len, attribute_classifier=None, executed_events=None, resources_used=None, new_traces=None)¶ Row in data frame
-
src.encoding.simple_index.
simple_index
(log, labelling, encoding)¶ Return type: DataFrame