Discretize Discretizes floating point time series data into finite number of states


Center for Quantitative Medicine


An increasing number of algorithms for biochemical network inference from experimental data require discrete data as input. For example, dynamic Bayesian network methods and methods that use the framework of finite dynamical systems, such as Boolean networks, all take discrete input. Experimental data, however, are typically continuous and represented by computer floating point numbers. The translation from continuous to discrete data is crucial in preserving the variable dependencies and thus has a significant impact on the performance of the network inference algorithms. We compare the performance of two such algorithms that use discrete data using several different discretization algorithms. One of the inference methods uses a dynamic Bayesian network framework, the other—a time‐and state‐discrete dynamical system framework. The discretization algorithms are quantile, interval discretization, and a new algorithm introduced in this article, SSD. SSD is especially designed for short time series data and is capable of determining the optimal number of discretization states. The experiments show that both inference methods perform better with SSD than with the other methods. In addition, SSD is demonstrated to preserve the dynamic features of the time series, as well as to be robust to noise in the experimental data.

Use now