DBPapers
DOI:10.5593/SGEM2013/BC3/S12.018

EVALUATION OF THE IMPACT OF THE PRE-PROCESSING OF DATA ON THE EFFECTIVENESS AND ACCURACY OF SVM

M. Cisty, J. Bezak, Z. Bajtek
Monday 5 August 2013 by Libadmin2013

References: 13th SGEM GeoConference on Water Resources. Forest, Marine And Ocean Ecosystems, www.sgem.org, SGEM2013 Conference Proceedings, ISBN 978-619-7105-02-5 / ISSN 1314-2704, June 16-22, 2013, 141 - 148 pp

ABSTRACT

The authors have applied support vector machine methodology (SVMs) to the flow predictions in this work. SVMs are gaining popularity due to various attractive features, which equip SVMs with a greater ability to generalize the main goal in data-driven modeling. From a practical point of view, perhaps the most serious problem with SVMs is their high algorithmic complexity and the extensive memory requirements of the quadratic programming required (which is part of solving a problem with an SVM model) for large-scale tasks. In this situation the pre-processing of the data, namely sampling methods are useful for reducing software-hardware requirements. The preprocessing methods investigated were applied in the Hron River watershed in Slovakia to hydrological and meteorological daily data of various variables, which are the predictants in SVM flow prediction models. A compromise exists between the size of the training dataset and the degree of accuracy, when the degree of accuracy is not lowered and the computation time is significantly decreased. The authors demonstrate in the paper how such a compromise could be found. A model trained with a reduced dataset of 50% achieved the same degree of accuracy as the model with all the data, but with an execution time four times shorter.

Keywords: sampling, data-driven models, flow predictions