Abstract:
This paper addresses the task of learning concept descriptions from streams of data. As new data are obtained the concept description has to be updated regularly to include the new data. In this case we can face the problem that the concept changes over time. Hence the old data become irrelevant to the current concept and have to be removed from the training dataset. This problem is known in the area of machine learning as concept drift. We develop a mechanism that tracks changing concepts using an adaptive time window. The method uses a significance test to detect concept drift and then optimizes the size of the time window, aiming to maximise the classification accuracy on recent data. The method presented is general in nature and can be used with any learning algorithm. The method is tested with three standard learning algorithms (kNN, ID3 and NBC). Three datasets have been used in these experiments. The experimental results provide evidence that the suggested forgetting mechanism is able significantly to improve predictive accuracy on changing concepts.