|
Mehr Infos zum Thema:
Telephone: +49 371 27093-21
Whitepaper:
| On the parallelization of the sparse grid approach for data mining J. Garcke and M. Griebel, in S. Margenov, J. Wasniewski, and P. Yalamov, editors, Large-Scale Scientific Computations, Third International Conference, LSSC 2001, Sozopol, Bulgaria, volume 2179 of Lecture Notes in Computer Science, pages 22-32, 2001. also as SFB 256 Preprint 721, Universität Bonn, 2001. 195kB | | Data mining with sparse grids using simplicial basis functions J. Garcke and M. Griebel, in F. Provost and R. Srikant, editors, Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, pages 87-96, 2001. also as SFB 256 Preprint 713, Universität Bonn, 2001. 391kB | | Data mining with sparse grids Garcke J., Griebel M., Thess M., SIAM Journal of Scientific Computing, submitted, (2000),also as SFB 256 Preprint 675, Institut für Angewandte Mathematik, Uni Bonn, 2000. 694kB | | Sparse Grids Tutorial Sparse Grids Tutorial 308kB | | Semi-supervised learning with sparse grids Semi-supervised learning with sparse grids 422kB |
|
Introduction
The sparse grids technique is one of the most ambitious methods of solving classification and regression problems. It is the first universal multivariate method that linearly scales with the number of datasets and can thus be used on huge volumes of data. The basic concept is to solve classification and regression problems via their operator equations, usually in the form of differential equations, by discretizing the sample space. This method, which has been proven effective with physical problems over decades - especially in the form of the finite element method, has previously failed in the data mining field due to the computational complexity, which increases exponentially with the number of dimensions ("curse of dimensionality").
For the first time, sparse grids enable the discretization of high dimensional spaces and have been used since the 90s, especially for solving integral and differential equations in high dimensions. Mathematically, sparse grid functions represent high dimensional wavelets over a hierarchy of anisotropic grids. Adapting the sparse grid technique for classification and regression problems enabled the use of high grade nonlinear classification and regression problems for large volumes of data for the first time. It also represents a leap in quality compared to conventional methods such as Bayesian and Neuronal networks or SVMs, both in terms of the volume of data processed, the complexity of solvable problems and thus the quality.
Advantages
The advantages of the sparse grid technique are:
|
■
|
Classification and regression problems in virtually unlimited numbers of datasets
can be solved for the first time using sparse grids. The method scales linearly with the number
of datasets and in this regard is asymptotically optimal.
|
|
■
|
The spectral representation of the regression function is an added advantage.
The representation can thus be interpreted and can be analysed, compressed and smoothed using the
proven methods of signal processing.
|
|
■
|
Finally, the universal method via sparse grids allows this to be applied to completely new operator
formulations. While current methods are tailored to fixed operator equations,
e.g. SVMs to regularization networks, sparse grids are a general approximation method for high
dimensional integral and differential equations and thus can be used on a wide range of operator equations.
The integration of apriori knowledge can result explicitly. This opens up completely new problems
for data mining.
|
"The transition to high dimensional discretization methods in general and to their predecessor – the sparse grid technique – in particular, represents a central revolution in data mining. Behind this is a simple, yet fundamental concept: By replacing infinite dimensional functional spaces with finite dimensional ones, most practical challenges are able to be handled for the first time! Not unlike the transition from analytical to numerical solving of differential equations in the 50s or from analogue to digital calculation in the 60s, the transition to the sparse grid method means not only an increase in the speed of calculation but also represents a completely new quality of data mining", summarises Dr. Michael Thess, chairman of research at prudsys AG. prudsys AG has been developing sparse grid technology for data mining applications for the past ten years in cooperation with university partners University of Bonn (Prof. Michael Griebel) and TU Berlin (Dr. Jochen Garcke).
Integration
Sparse grids are an integral part of the XELOPES library and are thus available in the prudsys classification product DISCOVERER as well as in the scoring component, PMML Applicator. The PMML format was expanded by a proprietary sparse grid model by prudsys. That means that sparse grid analyses can be executed in the prudsys DISCOVERER. The models can then be exported as PMML and transferred to the PMML Applicator, the prudys RECOMMENDATION ENGINE or the XELOPES library for operative scoring.
|
|