Process Mining encompasses a relatively recent research field which finds itself at the overlap of business process management (BPM) and data science. The field aims to support the analysis of business processes based on event logs as captured by process aware information systems, by applying data mining techniques on those in order to discover an as-is process model, identify bottlenecks or opportunities of improvement, or assess the degree of conformance between a to-be model and the data as recorded.
Although a plethora of techniques have been proposed for process discovery, the automated extraction of a semantically rich model describing the process best according to various quality metrics, new works have investigated the use of recurrent neural networks (RNNs) -- a particular form of artificial neural networks which is able to utilize sequence based information -- in this past, particularly in the area of predictive process monitoring. That is, such models are oftentimed trained on an existing event log to provide real time predictions of the likely next step, remaining case time, and so on.
The usage of such models for process discovery has been somewhat less explored, even although one can expect a well-fitting recurrent neural network to have abstracted over the semantics of the underlying process, similar to the domain of text, where such techniques have also been applied for language modeling, text classification and text generation. The main reason behind this is due to the fact of such neural networks being 'black box', i.e. easily exposing millions of parameters which are not easily summarised or abstracted to a hollistic model. This work will investigate how the task of process discovery can be merged with these techniques, with three main goals: first, the development of a process discovery technique based on an RNN which has been trained to construct a "language model" of the process. Second, the incorporation of data elements on top of existing control flow elements to assess whether the technique can be expanded to the domain of control flow as well as data flow mining, an area which has posed to be challenging so far. Third: the development of interpretability techniques which can match aspects of the RNN model with elements in the discovered model.