MODERN ML METHODS FOR FAILURE

ANALYSIS AUTOMATION ON CLASSICAL ERP

SYSTEMS

MÉTODOS MODERNOS DE APRENDIZAJE AUTOMÁTICO

PARA LA AUTOMATIZACIÓN DEL ANÁLISIS DE FALLAS EN

SISTEMAS ERP CLÁSICOS

Mario Enrique Vallejo Venegas

University of Guadalajara – México

Ma. Del Rocio Maciel Arellano

University of Guadalajara – México

Victor Manuel Larios Rosillo

University of Guadalajara - México

Jose Antonio Orizaga Trejo

University of Guadalajara - México

Jesus Raul Beltran Ramirez

University of Guadalajara - México

pág. 13235

DOI: https://doi.org/10.37811/cl_rcm.v8i5.14782

Modern ML methods for failure analysis automation on classical ERP

systems

Mario Enrique Vallejo Venegas1

mario_vallejo@hotmail.com

https://orcid.org/0000-0002-5607-766X

Harman International, SAP SCM & Analytics

University of Guadalajara, PhD in IT

Mexico

Ma. Del Rocio Maciel Arellano

ma.maciel@academicos.udg.mx

https://orcid.org/0000-0002-5548-2073

University of Guadalajara, PhD in IT.

Smart Cities Innovation Center.

Mexico

Victor Manuel Larios Rosillo

victor.m.lariosrosillo@ieee.org

https://orcid.org/0000-0002-2899-724X

University of Guadalajara, PhD in IT.

Smart Cities Innovation Center.

Mexico

Jose Antonio Orizaga Trejo

jose.orizaga@academicos.udg.mx

https://orcid.org/0000-0001-5649-5514

University of Guadalajara, PhD in IT.

Smart Cities Innovation Center.

Mexico

Jesus Raul Beltran Ramirez

jrbeltran@academicos.udg.mx

https://orcid.org/0000-0001-8645-9258

University of Guadalajara, PhD in IT.

Smart Cities Innovation Center.

Mexico

ABSTRACT

Typically IT teams working with ERP systems have little or no knowledge of artificial intelligence and

more specifically Machine Learning (ML) and Natural Language Processing (NLP) models because

their working environment is mostly focused in supporting usually commercial ERP systems like SAP,

Oracle ERP, Microsoft Dynamics, etc., so they focus on the functional aspects and sometimes in some

proprietary development environments like ABAP language for SAP systems, to make very specific

customizations. The current research work is aimed to provide detail insights of the state-of-the-art ML

and NLP models that could be used in a classical ERP environment and also it pursues the objective of

investigating the technical feasibility and the ease or difficulty to provide artificial intelligence to a

classic ERP system that does not have it with the intention of automating the analysis of errors and

failures in the ERP system that due to its volume is difficult to manage by human operators. The aim is

therefore to achieve significant savings in time and IT human resources consumed in failure analysis.

Another objective is to share with the reader several lessons learned by the researchers while

investigating the available literature and while experimenting and testing with several of the existing

models available in Python and C# languages when comparing different technology platforms. This

should provide valuable information to IT managers, project managers, developers, and testers who

normally work with ERP systems and not with AI, so they are not so familiar with it.

Keywords: artificial intelligence, machine learning, automation, supervised learning, erp

Autor Principal

Correspondencia: mario_vallejo@hotmail.com

pág. 13236

Métodos modernos de aprendizaje automático para la automatización del

análisis de fallas en sistemas ERP clásicos

RESUMEN

Normalmente, los equipos de TI que trabajan con sistemas ERP tienen poco o casi ningún conocimiento

de inteligencia artificial y, más concretamente, de modelos de Aprendizaje Automático (ML) y

Procesamiento del Lenguaje Natural (NLP) porque su entorno de trabajo se centra principalmente en

dar soporte a sistemas ERP normalmente comerciales como SAP, Oracle ERP, Microsoft Dynamics,

etc., por lo que se enfocan en los aspectos funcionales y, a veces, en algunos entornos de desarrollo

propietarios como el lenguaje ABAP para sistemas SAP, para realizar personalizaciones muy

específicas. El presente trabajo de investigación pretende dar una visión detallada del estado del arte de

los modelos ML y NLP que podrían ser utilizados en un entorno ERP clásico y además persigue el

objetivo de investigar la viabilidad técnica y la facilidad o dificultad de dotar de inteligencia artificial a

un sistema ERP clásico que no disponga de ella con la intención de automatizar el análisis de errores y

fallos en el sistema ERP que por su volumen es difícil de gestionar por operadores humanos. Se persigue

pues un ahorro significativo de tiempo y recursos humanos de IT consumidos en el análisis de fallos.

Otro objetivo es compartir con el lector varias lecciones aprendidas por los investigadores mientras

investigaban la literatura disponible y mientras experimentaban y probaban con varios de los modelos

existentes disponibles en los lenguajes Python y C# mientras comparaban diferentes plataformas

tecnológicas. Esto debería proporcionar información valiosa a los responsables de TI, jefes de proyecto,

desarrolladores y probadores que normalmente trabajan con sistemas ERP y no con IA, por lo que no

están tan familiarizados con ella.

Palabras clave: artificial intelligence, machine learning, automation, supervised learning, erp

Artículo recibido 08 septiembre 2024

Aceptado para publicación: 12 octubre 2024

pág. 13237

INTRODUCTION

The goal of current research work is to provide details of the state-of-the-art ML and NLP models that

could be used in a classical ERP environment. It is also aimed to investigate the technical feasibility (the

ease or difficulty) to provide AI to a classic ERP system that does not have it. The objective is to

automate failure analysis in the ERP system where high failures number is difficult to manage by human

operators (support IT engineers). A second goal is achieving significant savings in time and IT human

resources consumed in failure analysis. In the following paragraphs it is provided a detailed discussion

of ERP systems, AI, ML, DL, and NLP models to provide the reader with some background and

knowledge foundation. Later, in the methodology section, it is explained all the experimentation steps

done for this research.

Erp Systems

The industrial revolution brought with it the emergence of large corporations in the United States and

Europe. These large scale organizations, many of them dedicated to the manufacture of a diversity of

products, had the need to implement inventory and manufacturing controls for which it was necessary

to have adequate inventories of raw materials. Thus it appeared the planning of material requirements

and the need to have information systems for inventory control, material purchase and production

planning. Information systems quickly evolve to MRP (materials requirements planning) and later to

ERP (enterprise resource planning) systems around the year of 1960. Also, about the same time, the

expert systems appear, being the first attempt to imitate human intelligence using computers for the

decision-taking process in large corporations. The integration between ERP and AI could be

accomplished as suggested on this work (Figure 1).

Figure 1. An example of ERP and AI integration

pág. 13238

Artificial intelligence

Artificial Intelligence (AI) has been a field of research for several decades, but in the twenty-first century

it has taken on a special boom largely due to lower hardware costs and increased computational power,

in addition to the fact that it has been able to capitalize on the market and has ceased to be a purely

academic and research subject and has become a way of doing business and a way of employing

thousands of software engineers and many other specialists. AI is intended to simulate human

intelligence processes; it’s not aimed to duplicate or replace them. Many thought leaders in AI space

even think AI’s goal should be to augment human capabilities.

What is intelligence? As per John McCarthy (McCarthy, J. (1970, January 1). What is AI? / Basic

Questions), “Intelligence is the computational part of the ability to achieve goals in the world. Varying

kinds and degrees of intelligence occur in people, many animals and some machines”

What is artificial intelligence? John McCarthy (McCarthy, J. (1970, January 1). What is AI? /

Applicaons of AI) defines AI as follows: ‘It is the science and engineering of making intelligent

machines, especially intelligent computer programs. It is related to the similar task of using computers

to understand human intelligence, but AI does not have to confine itself to methods that are biologically

observable’.

Brief history of artificial intelligence. Perhaps the generations of people who in this third decade of

the 21st century work with information technologies, software development, data science, artificial

intelligence, automatic learning and other disciplines similar to data processing with computers, may

think that artificial intelligence is something that was born in this 21st century, but it is not so. In fact, it

is considered that the discipline and field of knowledge currently known as 'artificial intelligence' was

born in 1956 in a workshop held at Dartmouth University(Chow, 2021), in the city of Hanover, in the

state of New Hampshire in the United States of America, a workshop that brought together the most

brilliant minds of the time, from various disciplines such as cognitive science and computer science.

The workshop was held in the summer of 1956 and was called the 'Dartmouth Summer Research Project

on Artificial Intelligence'. The organizers of the workshop, such as Assistant Professor John McCarthy,

thought that if they could get all the eminent students and professors interested in the subject together

to devote time to it and avoid distractions, they could make real progress, for even before the workshop

pág. 13239

took place, they were somewhat disappointed with the research papers submitted to the journal Annals

of Mathematical Studies. It was thought that the contributors to the journal for some reason did not focus

on the potential of computers to possess intelligence and this prompted the workshop to be organized

by having a group of eminent students to clarify and develop ideas about thinking machines. Professor

McCarthy approached the Rockefeller Foundation to request funding for a seminar at Dartmouth for 10

participants and in the summer of 1955, he formalized the project with his friends and colleagues Marvin

Minsky of Harvard University, Nathaniel Rochester of IBM Corporation and Claude Shannon of Bell

Telephone Laboratories to lay the foundations for artificial intelligence. The key idea of the workshop

was that any feature or aspect of both learning and human intelligence could be described in a simple

but very precise way as if they were instructions to be followed step by step or as if they were a procedure

or an algorithm that a computer could then simulate with a program in some programming language.

In 1956, what many consider to be the first AI computer program was deveZ|loped, which was intended

to mimic humans in their problem-solving ability. This program was called the logic theorist and the

code was written by programmers Allen Newell and Herbert A. Simon.

Later in the 1960s to 1970s, "expert systems" were developed, which to this day use a series of rules

and knowledge bases to solve specific problems in various fields, but ultimately use deterministic logic

with classical programming based on typical concepts such as variables, iterative structures and single

and multiple conditionals to translate the logic of business rules to a series of conditionals and result

assignment logic in some procedural programming language.

The following decade, from the 1970s to the 1980s, is a stage known as "The AI Winter" (Thorwirth,

2021) in which AI experienced a period of failure and reduced funding and even loss of public interest,

as the previous decade created great expectations for AI, possibly exaggerating its capabilities in order

to achieve attractive economic profits from the sales of AI software in the market. However, in practice

there were really poor advances in trying to use AI in real business applications and this led to a loss of

credibility and interest in investing money and resources in AI software.

In 1980, despite "The Winter of AI," a new idea or invention emerged that rekindled interest in artificial

neural networks. This idea was called "back-propagation". Neural networks usually have multiple

layers, which are called "hidden layers" and work as in mathematical differential calculus and the "chain

pág. 13240

rule" to derive, there may be a function that in turn calls another function, i.e. they are nested functions

and the innermost function is calculated first and delivers its results to the outermost function that called

it so that it works with the result of the first one and generates a new final resultant value.

Then in the 1990s-2000s, support vector machines, decision trees and Bayesian networks became

popular. This was due to the availability of large datasets and increased computational power of

hardware with the cheapening of microprocessors and memory and the advent of graphics processing

units or GPUs for more efficient image processing on computers. It was this same cheapening of

electronics and hardware that in the decades from 2000 to 2010 set the tone for the resurgence of deep

learning and neural networks of all types (convolutional or CNN, recurrent or RNN, etc.) that made it

possible to apply AI to image recognition and natural language processing.

It is from 2010 onwards that AI becomes more general use as it is introduced in industry and in everyday

life thanks to social media and mobile devices that are now accessible to everyone. In the year 2024, the

time this paper is written, use cases are so common in situations like whatsapp chatbots for natural

language processing in all kinds of commercial and government internet sites, computer vision with

applications like facial recognition used to unlock cell phones, robotics which is mostly used in industrial

environments, and other sectors such as healthcare, finance, transportation and entertainment, the latter

where it is very common to see entertainment platforms such as Netflix and Spotify analyzing customer

preferences and consumption habits and making suggestions based on these consumption patterns. Large

Language Models (LMM) is the most recent invention being used in generative AI with applications

like ChatGPT.

Machine learning techniques

Machine Learning (ML) is a sub-field derived from AI that specializes in applying statistical methods

to make classifications or predictions. ML works with data sets and there are several dozen different

computer programs that have been developed and tested over the decades that are called models. These

models perform statistical calculations by applying various methods that statisticians and

mathematicians have developed since at least the middle of the 20th century. The logic or procedure

used to perform these calculations is called an algorithm and each model is based on a given set of

parameters. When you want to work with a model, you feed it with data sets. Before a dataset can be

pág. 13241

used, it is necessary to analyze its columns of information and perform data cleaning or data preparation.

For example, if there are columns containing date data, it is necessary to put them in a format that the

model can read and interpret as a date. If any column has outliers, which are those values that are too

far away from the arithmetic mean, for example 2.5 standard deviations or more, it is necessary to decide

what should be done with such data value because they can significantly affect the result of the

calculations and the behavior of the model. In many cases it is decided to eliminate such outliers once

their nature is understood. In other cases a different strategy is defined where average or mode values

could be calculated to replace the outliers. Another scenario that is very often encountered is that there

are missing values. This happens frequently, for example, in the field of medicine, where there is no

consistency of data because each hospital or medical office, public or private, may have different

processes and procedures for capturing and processing data on patients and their diseases, diagnoses,

treatments, laboratory studies, follow-up information, etc. For all these reasons, it is necessary to

standardize the datasets in order to curate them and have cleaner information to feed the ML models.

Once the data is clean, the next step is to feed the model waiting for it to perform either a classification

task or a prediction or prognosis task. In ML, the model is said to learn based on its experience with the

data without the need to explicitly program it (write specific code) to recognize every possible scenario.

Therefore, it is said that the model learns and then it could be stated that there are different types of

learning. There are many authors who claim that there are at least three types of learning, so it can be

considered that there is a consensus in this conceptualization (Figure 2) (Sarker, 2021) (Mahesh, 2018)

(Edwards, 2020):

1. Supervised learning

2. Unsupervised learning

3. Reinforced learning

pág. 13242

Figure 2. An overview of ML models: Supervised/Unsupervised/Reinforcement learning and general

use case

In the following sections, an explanation of the different learning types is provided together with the

main models used as of present.

Supervised learning. In this approach, an output is mapped from an input and the function that achieves

this is trained with a sample of input-output value pairs. The output is a tag or label that corresponds to

a group of input values. The training data is a collection of such input-output pairs. The learning is

achieved by the model when prediction accuracy is maximized. Supervised learning is most commonly

used for classification tasks, for example: text classification. For example the classification of a series

of words to determine if it is a verb, noun, adjective, preposition, adverb, etc. IBM ( IBM, 2023, What

Is Supervised Learning?) Claims that ‘Supervised learning, also known as supervised machine learning,

is a subcategory of machine learning and artificial intelligence. It is defined by its use of labeled datasets

to train algorithms that to classify data or predict outcomes accurately. As input data is fed into the

model, it adjusts its weights until the model has been fitted appropriately, which occurs as part of the

cross validation process. Supervised learning helps organizations solve for a variety of real-world

problems at scale, such as classifying spam in a separate folder from your inbox.‘.

Linear regression is used to identify the relationship between a dependent variable (a number) and one

or more independent variables (also numbers), and is normally used to make predictions about future

outcomes (the result is also a number).

There are two types of regression functions: simple and multiple linear regression. The first is when

there is only one independent variable and one dependent variable. The second, is when there are many

pág. 13243

independent variables impacting on the result (dependent variable). Linear regression is used when the

dependent variables are continuous.

Logistic regression is used when the dependent variable is categorical, meaning that it has binary

outputs, such as true and false or yes and no. Logistic regression is used when the dependent variables

are discrete. Logistic regression is mainly used to solve binary classification problems, such as email

spam identification or violence vs. not-violence classification on thread news on the web, etc.

Naïve Bayes, is a classification method that uses class conditional independence concept from the Bayes

Theorem where every feature is totally independent of the others. The probability to produce a given

result is the same for every feature (a.k.a. predictor). Hence the term “naïve” which means ingenuity

because in the real world characteristics are almost never independent. There is always certain influence

between them in most of the real-life problems.

K-nearest neighbor (KNN), is a method that classifies data points based on their proximity and

association to other data points. It is a non-parametric algorithm that works by assuming that similar

data points can be found near each other by calculating the Euclidean distance between data points,

which is a physical distance, and it assigns a category based on the most frequent category or average.

It is a very popular model because it is easy to use and it has a low computing processing time, however

when the size of the dataset increases, the processing time also grows and performance decreases, and

that’s why data scientist do not use it classification tasks especially on big datasets. However, KNN is

frequently used for image recognition and suggestion engines.

Support vector machine (SVM) is used for classification and regression tasks. It is a model developed

by the Russian mathematician Vladimir Vapnik in 1992 at AT&T Bell Laboratories. It can be used for

classification and regression tasks. It is used for small dataset as it has long processing times. SVM is

based on the idea of finding a hyperplane that best separates the features into different domains. This

hyperplane is known as the decision boundary, separating the classes of data points (Figures 3 and 4)

on either side of the plane. The points closest to the hyperplane are called as the support vector points

and the distance of the vectors from the hyperplane are called the margins. The farther SV points are

from the hyperplane, more is the probability of correctly classifying the points in their respective region

or classes.

pág. 13244

Figure 3. Iris flowers (Iris Setosa, Iris Versicolour, Iris Virginica)

Figure 4. Iris flower classification by length and width of sepal (Iris Setosa vs. Iris Versicolour vs. Iris

Virginica), an example of SVM classification

Random forest is a method also used for classification and regression tasks. The forest is made of many

decision trees without correlation and are merged to create more accurate predictions by reducing

variance. It uses feature bagging and randomness when building individual trees to create an

uncorrelated forest of trees whose prediction by committee is more accurate than that of any individual

tree.

Random Forest is one of the best classification models in the ML algorithms hierarchy.

Neural networks, as prior mentioned, there are several types of neural networks such as "recurrent

neural networks", "convolutional neural networks", “modular neural networks”, etc., and all of these

ultimately are "artificial neural networks" because they all use mathematical models which are not

biological in origin. Neural networks process data by imitating the interconnection of the human brain

through layers of biological neuron connections (Figure 5). Each node is made up of inputs, weights, a

bias (or threshold), and an output. If the output value exceeds a given threshold, it activates the node,

passing data to the next layer in the network. Neural networks learn this function by adjusting the cost

pág. 13245

function in a process known as gradient descent which can be thought of as a blindfolded person going

down a hill, taking small steps, feeling only the slope and if it is very steep taking a small step forward

and continuing until the person feels how the slope ends and managed to reach the bottom of the valley

where there is no longer a slope because the floor is even or flat. When the cost or loss function is near

or equal zero the steepness of a slope is also zero (or almost zero) meaning the model has completed its

descent, it has reach the local minimum of the function and the output of the model is very likely to be

accurate yielding the correct answer.

Figure 5. Model of a multilayer neural network or perceptron, with N number of hidden layers

Unsupervised learning. This approach uses machine learning algorithms to analyze and cluster

unlabeled data sets. This means, there is no group of humans pre-classifying and labeling the data in

order to use it to train a model.

Unsupervised learning models discover hidden patterns or clusters of data by themselves, no human

need is needed for this clustering or grouping task because the model has the ability to discover

similitudes and differences in data points which makes them the best option to be used in exploratory

data analysis (EDA) which is a visual data analysis with the help of predefined libraries or packages

available in Python language. These models can also reduce the number of features needed by means of

a process called “dimensionality reduction” where each feature is considered a dimension and where the

goal is to determine which features can be ignored by the model as they do not have a significant impact

on the prediction or classification task. Having fewer features saves memory and computational costs

pág. 13246

and generates better predictions by reducing overtraining and improving generalization. It also helps to

simplify visualization by focusing on important features during EDA. Principal component analysis

(PCA) and singular value decomposition (SVD) are two common methods for dimensionality reduction.

IBM (IBM, 2023, What Is Unsupervised Learning?) Defines unsupervised learning as ‘Unsupervised

learning, also known as unsupervised machine learning, uses machine learning algorithms to analyze

and cluster unlabeled datasets. These algorithms discover hidden patterns or data groupings without

the need for human intervention. Its ability to discover similarities and differences in information make

it the ideal solution for exploratory data analysis, cross-selling strategies, customer segmentation, and

image recognition.’.

Unsupervised learning models are used for three main tasks:

1. Association

2. Clustering

3. Dimensionality reduction.

Below a definition each learning method is provided and common algorithms and approaches to carry

them out effectively are highlighted.

Association rules. IBM defines this term as it follows: ‘An association rule is a rule-based procedure

for finding relationships between variables in a data set. These procedures are used for shopping cart

analysis, a situation that enables companies to better understand the connections between different

products. Understanding customer consumption habits allows businesses to develop better cross-selling

methods and suggestion engines. For example, the playlist "Customers who bought this item also

bought" in Amazon web pages’ (What Is Unsupervised Learning? | IBM, 2023.).

Techniques like Apriori, Eclat, and FP-Growth are referred as most commonly used for generating

association rules. As an example of this, assume “Justin Bieber radio” is set as search string within the

application Spotify, then the resulting list may begin with his song "Next to you" and there is a significant

probability that the next song shown on this channel is a “Sean Kingston” song, like "Eanie Meanie",

based on users’ previous listening habits and those of other customers of the Spotify application.

Clustering IBM defines this approach as follows: ‘Clustering is a data mining technique which groups

unlabeled data based on their similarities or differences. Clustering algorithms are used to process raw,

pág. 13247

unclassified data objects into groups represented by structures or patterns in the information.’(What Is

Unsupervised Learning? | IBM, 2023)

It is also referred that there are several types of clustering: probabilistic with methods like the Gaussian

Mixture Model (GMM) and, hierarchical a.k.a. ‘hierarchical cluster analysis’ that can be in turn

agglomerative ("bottom-up approach") or divisive. Also, overlapping and exclusive clustering where

data points can either overlap or be mutually exclusive on clusters. It is recommended further reading

of the original sources for more details.

Dimensionality reduction. IBM defines this concept as ‘Dimensionality reduction is a technique used

when the number of features, or dimensions, in a given dataset is too high. It reduces the number of data

inputs to a manageable size while also preserving the integrity of the dataset as much as possible’.

(What Is Unsupervised Learning? | IBM, 2023.)

It is understandable that the more data the more accurate results but also the worse performance of ML

models which can lead to overfitting. This last concept refers to a situation when a given model produces

correct results when is ran for training data but not when executed for new data. ML models need to be

trained first with around twenty percent of the data records in a given dataset. Once fitting is successfully

done, the model can be released to production to be used with new real-world data with the expectation

that model’s predictions or classifications are still accurate as they were during the fitting phase, unless

too many dimensions (features) where used during fitting so the model resulted in a overfitting

condition, which is obviously undesired. Two dimensionality reduction methods can be used and an

overview is given in the following sections.

Principal component analysis (PCA).

IBM defines this model as follows: ‘PCA is a type of dimensionality reduction algorithm which is used

to reduce redundancies and to compress datasets through feature extraction. This method uses a linear

transformation to create a new data representation, yielding a set of "principal components." The first

principal component is the direction which maximizes the variance of the dataset. While the second

principal component also finds the maximum variance in the data, it is completely uncorrelated to the

first principal component, yielding a direction that is perpendicular, or orthogonal, to the first

component.’ (What Is Unsupervised Learning? | IBM, 2023).

pág. 13248

It can be inferred that the PCA method makes it possible to "condense" the information provided by

multiple variables into just a few components.

Singular value decomposition (SVD). IBM defines this model as follows: ‘SVD is another

dimensionality reduction approach which factorizes a matrix, A, into three, low-rank matrices. SVD is

denoted by the formula, A = USVT, where U and V are orthogonal matrices. S is a diagonal matrix, and

S values are considered singular values of matrix A’(What Is Unsupervised Learning? | IBM, 2023). It’s

also referred that, in a similar way than PCA, SVD is used to reduce noise and compress data, such as

image files.

Applications of unsupervised learning. Unsupervised learning allows for faster identification of

patterns in large datasets. It is not necessary for engineers or data scientists to label the input data to

teach the model what the corresponding output is. The model still has to be trained with about twenty

percent of the data available in the dataset to be treated, but before that a dimensionality reduction is

performed and all that is fed to the model are numbers, there is no categorical data per se and as they

are mathematical models they are optimized to run in the shortest possible time doing only numerical

calculations and that allows them to process large datasets in a reasonable time. The following use cases

are identified for unsupervised learning.

News sections. Like the ones used by Google News to categorize articles. For example, the results of a

school shooting might be categorized under its "USA" news tag.

Computer vision. For example in object recognition, in facial recognition to authenticate users (Face

ID in iPhone to unlock the phone).

Medical Imaging. Used in image classification in radiology, oncology, and pathology to quickly and

accurately diagnose patients with neoplasia (tumors).

Anomaly detection. Used to discover outlier data points within a data set. These are known as anomalies

and can call for attention to a variety of root causes like faulty equipment, human error, or security flaws.

Recommendation Engines. Using past purchasing behavior data, to reveal data trends to develop more

effective cross-selling strategies. This is used to make relevant add-on recommendations to customers

during checkout for online retailers in sections like “users who bought this article also looked at these

others”.

pág. 13249

Reinforcement learning. IBM (IBM, 2023, what is reinforcement learning?) Defines this approach as

follows: ‘Reinforcement learning is a learning paradigm that learns to optimize sequential decisions,

which are decisions that are taken recurrently across time steps, for example, daily stock replenishment

decisions taken in inventory control.

At a high level, reinforcement learning mimics how we, as humans, learn. Humans have the ability to

learn strategies that help us master complex tasks like swimming, gymnastics, or taking a test.

Reinforcement learning broadly seeks inspiration from these human abilities to learn how to act. But

more specifically to practical use cases of reinforcement learning, it seeks to acquire the best strategy

for taking repeated sequential decisions across time in a dynamic system under uncertainty. It does so

by interacting with a simulator of the stochastic dynamic system of interest, also called as an

environment, to learn such winning strategies. A strategy to take repeated sequential decisions across

time in a dynamic system is also called as a policy. Reinforcement learning tries to learn the winning

policy, namely a winning recipe of how to take actions in different states of a dynamic system’.

Reinforcement learning addresses sequential decision-making problems that are often subject to

uncertainty, for example: multi-tier, multi-supplier inventory management with lead times under demand

uncertainty; control issues such as autonomous manufacturing operations or production plan control;

and resource allocation issues in finance or operations.

Reinforcement learning interacts with an environment that is a simulator of the stochastic dynamic

system of interest, in order to learn these winning strategies. A strategy for making sequential decisions

repeated over time in a dynamic system is also called a policy. Reinforcement learning tries to learn the

winning policy, that is, a winning recipe for how to perform actions in different states of a dynamic

system.

Deep Learning Techniques

Deep learning (DL) is a subset of machine learning and has demonstrated significantly superior

performance to some traditional machine learning approaches. DL uses a combination of multi-layered

artificial neural networks, processing- and data-intensive training, inspired by our latest understanding

of human brain behavior. This approach has become so effective that it has even begun to surpass human

pág. 13250

capabilities in many areas, such as image and speech recognition and natural language processing. DL

models process large amounts of data and are generally unsupervised or semi-supervised.

DL uses concepts analogous to those of neural biology, such as the notion of the artificial neuron

analogous to the biological neuron. Another example is the artificial neural network analogous to

mammalian neural networks. But similarities end here. DL uses several types of mathematical models

that are generally called "neural networks" and have different specific names such as "recurrent neural

networks" or "convolutional neural networks", etc., but that these networks are also "artificial neural

networks" because all the mathematical models they use are artificial since they are not biological or

natural. A conclusion that seems obvious, but that is very important to emphasize because within single-

layer ML there is the "artificial neural network" or ANN model and in the computational or information

technology profession it is ignored the fact that all computational neural networks (recurrent,

convolutional, adversarial) used in DL (multilayer) are also artificial even if their name does not include

that word.

DL learns features and tasks directly from data which can be images, text or sound. In medicine, for

example, deep learning models are used for image interpretation in imaging or radiology, deep learning

models learn to classify input images into appropriate categories.

Natural language processing techniques

According to Beysolow (Beysolow II, 2018), ‘natural language processing (NLP) is a subfield of

computer science that is focused on allowing computers to understand language in a “natural” way, as

humans do. Typically, this would refer to tasks such as understanding the sentiment of text, speech

recognition, and generating responses to questions’. NLP is a part of the field of AI and with the

techniques of conversion of words to vectors (vectorization) it is also considered a part of ML focused

on human-computer communication. NLP addresses the inherent problem that while human

communications are often ambiguous and imprecise, computers require unambiguous and precise

messages to be able to communicate between machines.

METHODOLOGY

Several NLP models that could be used in a classical ERP system to investigate the technical feasibility

and the ease or difficulty to provide artificial intelligence to a classic ERP system were explored.

pág. 13251

Two software development platforms, Python and Microsoft ML.Net, were investigated. The

corresponding IDEs were installed and various packages or extensions required to work with ML and

NLP were configured. Several small programs were written in Python and C# to experiment with NLP

principles. The goal was to compare the two referred platforms for easy-of-use and performance. Both

were running on Windows 11 for a 64-bit CPU. No detailed instructions on how to download, install,

and configure the development platforms for Python and ML.Net are provided in the paper because

there is plenty of public documentation about it on the Internet. Hence only a reference to corresponding

websites is provided in the bibliography, so the reader can check detailed installation instructions there.

Also, in the part of ERP system, several ABAP programs were written to emulate a production

environment in SAP ERP where multiple job failures would occur. The job logs of the failures were

downloaded to local files and were fed as input to Python and C# programs executing NLP methods to

experiment with.

In the following paragraphs the steps of the experimentation are described from selecting the

programming language and references to where to download and install integrated development

environments (IDE) up to ERP software development, integration and execution of the components. It

is provided also a discussion of the software development in Python, and C#.

Mathematical model of bag of words

According to (Ghosh, T., & Kumar, S., 2022), the mathematical model for Bag of Words is given by

three statistical calculations that are term frequency, inverse document frequency, and the product of

both, the Term frequency–inverse document frequency (TF-IDF). A brief introduction is given in the

following lines.

Term frequency.

Term frequency, is the relative frequency of term “t” within document “d”.

Freq (i, j) = Frequency of term i in document j.

L (j) = Total number of terms in document j.

pág. 13252

A different notation can be:

Inverse document frequency.

IDF measures the rarity of a term across a collection of documents. It is aimed to penalize words that

are common across all documents. It is calculated as follows:

Term frequency–inverse document frequency (TF-IDF).

The TF-IDF score for a term in a document is obtained by multiplying its TF and IDF scores.

TF-IDF (t, d, D) = TF (t, d) × IDF (t, D) or also:

TF-IDF (t, d, D) =

The document frequency (dfw) is the number of documents where a word is seen.

The inverse document frequency denoted by (idf) is computed by diving the total number of documents

in our corpus by the document frequency (dfw) for each term and then applying logarithmic scaling on

the result. We can add 1 to the document frequency (dfw) for each term to prevent division by 0 for terms

that don’t exist in corpus.

Software installation

As for C# (pronounced “C Sharp”), it is worth to mention the researchers selected this language because

is a general-purpose, multi-paradigm (object-oriented, structured, functional -supports lambda

expressions-) programming language, fast to code, fast to execute, easy to learn language. The

researchers also selected Visual Studio 2022 as IDE for C# because it has a comprehensive feature set:

code editing, debugging, rich marketplace for extensions and plug-ins, friendly user interface, extensive

documentation with code examples directly from Microsoft, and many other features beyond the scope

of current research.

Visual Studio 2022 Community edition (free to download) was installed ( Audrel, 2020) ( Anandmeg,

n.d.) and Nuget package Microsoft.ML 3.0.1 was also installed ( Natke, n.d.) As shown on (Figure 6).

pág. 13253

Figure 6. Nuget package Microsoft.ML 3.0.1 was installed after installing Visual Studio 2022

community

As for Python, the researchers selected this language because it is very popular for academic research,

it has many ready to use libraries (packages) like Numpy for mathematical functions, Pandas for

managing of datasets, scikit-learn for ML, and spaCy for NLP, Matplotlib and Seaborn for data

visualizations in graphs of all kinds. Python is easy to read and write. It has a very large support

community.

The reserachers selected Pycharm as IDE because it has an intelligent code editor that uses different

colors to differentiate reserved python keywords from variables and function/class names. It has code

autocompletion features, it has a comprehesnive debugging tool, and it supports scientific Python

libraries like spaCy designed specially for NLP, the subject of current research work applied to ERP

systems.

A standard python interpreter was installed first (Python.org, n.d.) because interpreter needs to be

configured in PyCharm IDE to execute Python code and then PyCharm Professional Edition was

installed as IDE (JetBrains, 2021) on Windows 11 computer (Figures 7 and 8).

pág. 13254

Figure 7. Python interpreter version 3.12.1 was installed

Figure 8. PyCharm Professional edition as IDE was installed importing NLTK package.

ERP SOFTWARE CONSTRUCTION

A production environment in SAP ERP with multiple job failures was simulated in order to produce a

job log that could be downloaded to a local text file to be used as input for NLP packages both on

PyCharm and C#. The simulation was done on a private SAP development server. Hence details about

system ID, DNS names, IP addresses, programs, job names or custom objects can't be publicly shared.

All the objects for this experimentation were created as local objects, meaning they are temporary

objects and can’t be transported to a QA nor Production SAP server.

For this purpose, several programs were created in SAP's ABAP language.

ABAP programs Z_DTI_JLG_NLPGEN1, 2, … to Z_DTI_JLG_NLPGEN5 (Figure 9) are aimed to

simulate jobs for different tasks running on the background of SAP system. These programs create

pág. 13255

random messages sent to job log when forcing the program to terminate.

Figure 9. ABAP programs Z_DTI_JLG_NLPGEN1, 2, … to Z_DTI_JLG_NLPGEN5.

Jobs with names ZJOB_DTI_JLG_NLPGEN1 to ZJOB_DTI_JLG_NLPGEN5 were created and

released (Figure 10) to run every five minutes starting at arbitrary date and time for our experiment.

Transaction code SM36 is used to create new jobs. Transaction code SM37 is used to monitor existing

jobs. It’s also possible to repeat scheduling of existing jobs. The reader should notice that all job and

program names start with letter ‘Z’. They could also start with letter ‘Y’. Name spaces starting with

either ‘Z’ or ‘Y’ are for custom objects created in SAP. This applies also to other objects like function

modules, include programs, tables, views, data elements, domains, even classes can be named following

this naming convention. Since SAP is a proprietary software, all standard objects have names that start

with letters other than ‘Z’ and ‘Y’. This is useful when new SAP software releases or upgrades are done,

the custom objects aren’t overwritten with objecst in the upgrade.

Figure 10. Transaction code SM37 showing jobs ZJOB_DTI_JLG_NLPGEN1, 2, … to

ZJOB_DTI_JLG_NLPGEN5 released to run every 5 minutes.

The researchers also investigated the SAP standard tables where job logs are saved. For example, table

TBTCO is the Job Status Overview. It is necessary to test condition TBTCO-STATUS = ‘A’ to read

pág. 13256

failed jobs (Figure 11). To read job status for any failed jobs in an ABAP program we can use conditions

like the following:

SELECT list-of-fields FROM TBTCO WHERE

JOBNAME = ‘*’

STRTDATE = ‘10/04/2023’ or any start date

STATUS = ‘A’

SDLUNAME = ‘*’.

Figure 11. Standard table TBTCO (Job Status Overview) with job status ‘A’ for failed jobs.

The FM ‘BP_JOBLOG_READ’ is used to retrieve job logs. This FM call was put on program

Z_DTI_JLG_DWNLD (Figure 12). The table TBTCO is used as driver table to know which are the

jobs that failed for a specific date and provides key search fields like failed jobname and joblog (TemSe

object name) field which is a search key with values like ‘JOBLGX01000200X43494’ that is also used

by the referred FM to search for failed job logs, finally the method ‘FILE_SAVE_DIALOG’ is used to

save or download job logs to a local file i.e.: the output file ‘ZDOWNLOAD_JOBLOG.TXT’ to be

loaded in Python and C# NLP models (Figure 13). The output path in our experiment was arbitrarily

defined as "E:\DATAMV\00.PhD\Python-NLP\PycharmProjects\Bag-of-

Words2\ZDOWNLOAD_JOBLOG.TXT" but it could be any path available.

pág. 13257

Figure 12. FM ‘BP_JOBLOG_READ’ is used to retrieve job logs.

Figure 13. Method ‘FILE_SAVE_DIALOG’ is used to save output file

‘ZDOWNLOAD_JOBLOG.TXT’ to be loaded in Python and C# NLP models.

Since research is incipient, and experimental, all prototype programs were created in the development

environment but this could easily be implemented in production to retriev and download real job logs

for real failed jobs. Normally, when a job fails it produces an ABAP dump and it can be displayed with

SAP transaction code ST22 (Figure 14). Such information could also be retrieved for NLP and not only

the job log.

pág. 13258

Figure 14. Transaction code ST22 to show ABAP short dumps as additional source for NLP.

Python Software Construction

An experimental python program was writen in Pycharm importing NLTK library which supports the

NLP models in Python (Figure 15). Then, Python program reads file ‘ZDOWNLOAD_JOBLOG.TXT’

from local path, in this case "E:\DATAMV\00.PhD\Python-NLP\PycharmProjects\Bag-of-Words2\"

(Figure 16). Finally the input file is passed to the bag of words algorithm to produce a table of

frequencies and get the bag of words (Figure 17).

Figure 15. Libraries imported in Python.

Figure 16. Input file ZDOWNLOAD_JOBLOG.TXT from SAP ERP read by Python.

Figure 17. Algorithm to create the bag of words in Python.

C# Software Construction

An experimental C# program was writen in Visual Studio 2022 importing Micrsoft.ML library which

supports the NLP models in C# (Figure 18). Then, C# program reads file

‘ZDOWNLOAD_JOBLOG.TXT’ from local path, in this case

pág. 13259

"E:\DATAMV\00.PhD\MicrosoftML\Projects\MicrosoftML-NLP\BagOfWordsApp2-

new\BagOfWordsApp2\BagOfWordsApp2\bin\x64\Debug" (Figure 19). Finally the input file is passed

to the bag of words algorithm to produce a table of frequencies and get the bag of words (Figure 20).

Figure 18. Libraries imported in C# (Micsosoft.ML supports NLP methods).

Figure 19. Input file ZDOWNLOAD_JOBLOG.TXT from SAP ERP read by C#.

Figure 20. Algorithm to create the bag of words in C#.

Since research is incipient, only two NLP models were investigated for tokenization and for creating

bags of words and the SAP ERP job log text file was used as input for both NLP models in both

languages as explained before, Python and C Sharp. Both platforms were tested and the different

technical issues were solved as they appeared until the expected results were produced.

RESULTS AND DISCUSION

As a result, it was possible to simulate an SAP ERP production environment of a so to speak “chaotic

nature”, with multiple failed background jobs (Figure 21), and an extractor program downloaded the

job log of failures into a plain text file (Figure 22). The text file with the downloaded job log is shown

(Figure 23). This work was of a medium scale complexity.

pág. 13260

Figure 21. Transaction code SM37 showing failed jobs ZJOB_DTI_JLG_NLPGEN*.

Figure 22. Running program Z_DTI_JLG_DWNLD to download ‘induced’ failed jobs.

Figure 23. Example of contents of file ZDOWNLOAD_JOBLOG.TXT, the input for NLP.

PyCharm, was also successfully installed and configured. The installation and configuration proved to

be of low complexity. But, building the model for Bag of Words was a medium complexity job. The file

‘ZDOWNLOAD_JOBLOG.TXT’ was loaded by python program (Figure 24). The output of the process

showed the resulting bag of words where for example the output string ‘zjob_dti_jlg_nlpgen1': 1’

denotes that the word ‘zjob_dti_jlg_nlpgen1’ was counted 1 time and the word '030909' was counted 6

times (Figure 25). The idea is to compare with the output of NLP model written in C#.

pág. 13261

Figure 24. Python read the file ‘ZDOWNLOAD_JOBLOG.TXT’ and printed it out.

Figure 25. Output showing the resulting bag of words produced by Python.

Alternatively, for comparison purposes, the Visual Studio 2022 IDE and Microsoft.ML Nugget

packages were installed. Models were also built for Tokenization and for Bag of Words, which was a

medium-high complexity job.

Visual Studio 2022 IDE and Microsoft.ML Nuget package, were also successfully installed and

configured. The installation and configuration proved to be of low complexity. But, again (as in

Pycharm), building the model for Bag of Words was a medium complexity job. The file

‘ZDOWNLOAD_JOBLOG.TXT’ was loaded by C# program (Figure 26). The output of the process

showed the resulting bag of words, The layout of the output is slightly different from the output produced

by Pycharm, but the values are the same. The line with the label ‘Ngrams:’ shows all the words while

the line with label ‘Word counts:’ shows the number of times a given word was found. For example the

output string ‘zjob_dti_jlg_nlpgen1’ (second red color box on ‘Ngrams’ row) denotes that the word

‘zjob_dti_jlg_nlpgen1’ was counted 1 time (on the line for ‘Word count:’) and the word '030909' (on

the row for ‘Ngrams:’) was counted 6 times on the line for ‘Word count:’ (Figure 27). It is shown that

results were the same in Python than in C#.

pág. 13262

Figure 26. C# read the file ‘ZDOWNLOAD_JOBLOG.TXT’ and printed it out.

Figure 27. Output showing the resulting bag of words produce by C#.

The interconnection of technological platforms was feasible and of medium difficulty, although we did

not

try to automate the download of data from the SAP ERP system or to automate the loading of data into

the NLP models because we wanted to give priority to the implementation of the logic of the NLP

models. However, the automation of the interface is a must in order to achieve an optimization of the

software and to eliminate the need manual data loads into the NLP models.

CONCLUSIONS

The first conclusion reached is that it is definitely feasible and relatively easy to provide artificial

intelligence to classic ERP systems, even if they are traditional systems, with client-server architecture

and may seem very conventional as they are already very proven technologies and well known by the

IT industry. Some persons might even think that ERP systems are a thing of the past and that the new

artificial intelligence technology cannot be easily applied to them, which is false.

The second conclusion is that it is more difficult to determine what type of artificial intelligence is most

useful to apply to ERP systems. It depends on the problem to be solved. Trying to predict the

consumption of products sold by a company, or trying to analyze dozens or hundreds of purchase

pág. 13263

contracts, or measuring customer satisfaction, or even automating IT staff tasks is not the same need,

and therefore there will not be a single artificial intelligence applicable to all problems.

The third conclusion is that there are definitely programming languages that have specialized in artificial

intelligence and are therefore easier to use. For example, it was easier to implement NLP models in

Python than in C#. Beacause more lines of code are written in C# than in Python because Python is

higher level programming language. However C# is faster to run, because it is compiled and in particular

the Visual Studio 2022 IDE is easier to use because it has code auto-completion functions (predictive

text) and a wizard that suggests how to solve syntax errors. It is also very easy to get help on the

statements used in the programs.

The fourth conclusion is that the work presented here is incipient and further work is needed on more

complex models in both NLP and ML. In the case of NLP, further research on the NER (Named Entity

Recognition) model is suggested. Research on the applicability of Large Language Models (LLM) in

ERP systems is also suggested. Such models are the basis of more advanced artificial intelligences such

as ChatGPT. Also, it is suggested as future research work going deeper in exploring and experimenting

with models like Naïve Bayes with Bag of Words for classification of texts from ERP system failure

logs. The idea would be to recognize relevant error messages, job or program names, days, months or

times when failures are produced, etc., from ERP system failure logs and program system to suggest a

corrective action.

BIBLIOGRAPHIC REFERENCES

Anandmeg. (n.d.). Install visual studio and choose your preferred features. and choose your preferred

features | Microsoft Learn. https://learn.microsoft.com/en-us/visualstudio/install/install-visual-

studio?view=vs-2022

Audrel, Carolline & Wijaya, Vellicia & Azwir, Hery. (2020). Information System Development Using

Microsoft Visual Studio to Speed Up Approved Sample Distribution Process. Journal of Industrial

Engineering. 5. 14-24. 10.33021/jie.v5i1.1268.

Beysolow II, T. (2018). What Is Natural Language Processing? In T. Beysolow II (Ed.), Applied Natural

Language Processing with Python: Implementing Machine Learning and Deep Learning

pág. 13264 
Algorithms for Natural Language Processing (pp. 1–12). Apress. https://doi.org/10.1007/978-1-
4842-3733-5_1 
Chow,  R.  (2021,  September  30).  Dartmouth  Summer  Research  Project:  The  Birth  of  Artificial 
Intelligence. History of Data Science. https://www.historyofdatascience.com/dartmouth-summer-
research-project-the-birth-of-artificial-intelligence/ 
Edwards, G. (2020, January 21). Machine Learning | An Introduction. Medium.  
https://towardsdatascience.com/machine-learning-an-introduction-23b84d51e6d0 
Ghosh, T., & Kumar, S. (2022). Chapter 11 Natural Language Processing. In Practical Mathematics for 
AI and Deep Learning: A Concise yet In-Depth Guide on Fundamentals of Computer Vision, 
NLP, Complex Deep Neural Networks and Machine Learning (English Edition) (pp. 456–458). 
BPB Publications. 
IBM.  (2023,  June  14).  What  is  supervised  learning?.  IBM.  Retrieved  June  14,  2023,  from 
https://www.ibm.com/topics/supervised-learning 
IBM.  (2023,  June  14).  What  is  unsupervised  learning?.  IBM.  Retrieved  June  14,  2023,  from 
https://www.ibm.com/topics/unsupervised-learning 
IBM. (2023, Jun 14). What is reinforcement learning?. IBM.  
https://developer.ibm.com/learningpaths/get-started-automated-ai-for-decision-making-
api/what-is-automated-ai-for-decision-making/ 
JetBrains. (2021, June 2). Download PyCharm: The python IDE for data science and web development 
by jetbrains. https://www.jetbrains.com/pycharm/download/?section=windows 
McCarthy, J. (1970, January 1). What is AI? / Applications of AI. What is AI / Applications of AI. 
Retrieved  June  11,  2023,  from  http://jmc.stanford.edu/artificial-intelligence/what-is-
ai/applications-of-ai.html 
McCarthy, J. (1970, January 1). What is AI? / Basic Questions. What is AI / Basic Questions. Retrieved 
June 11, 2023, from http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html 
Mahesh, B. (2018). Machine Learning Algorithms—A Review. 9(1). 
Natke. (n.d.). ML.NET Documentation - tutorials, API reference. Tutorials, API reference | Microsoft 
Learn. https://learn.microsoft.com/en-us/dotnet/machine-learning/  

pág. 13265

Python.org. (n.d.) Download python. https://www.python.org/downloads/

Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions.

SN Computer Science, 2(3), 160. https://doi.org/10.1007/s42979-021-00592-x

Thorwirth, Z. (2021, September 1). AI Winter: The Highs and Lows of Artificial Intelligence. History of

Data Science. https://www.historyofdatascience.com/ai-winter-the-highs-and-lows-of-artificial-

intelligence/