IMPACT OF PARAMETERS CHARACTERIZING CLUSTERING ON DATA ANALYSIS RESULTS

Authors

  • Pēteris Grabusts Rezekne Academy of Technologies

DOI:

https://doi.org/10.17770/lner2012vol1.4.1828

Keywords:

clustering algorithms, metrics, k–means, cluster validity

Abstract

Clustering algorithms are used to group some given objects defined by a set of numerical properties in such a way that the objects within a group are more similar than the objects in different groups. All clustering algorithms have common parameters the choice of which characterizes the effectiveness of clustering. The most important parameters characterizing clustering are: metrics (the distance between cluster elements and cluster centre), number of clusters k and cluster validity criteria. The goal of the paper – to perform the evaluation of the validity of metrics’ choice, to describe the change with respect to the number of clusters for experimental data purposes and to evaluate the credibility of clustering results. As an input data the table describing the rating of Latvian state higher educational institutions for year 2011 has been used and the goal of the experiment was to show, how by using the clustering methods it is possible to analyze the mentioned data in an alternative way.

Downloads

Download data is not yet available.

Author Biography

  • Pēteris Grabusts, Rezekne Academy of Technologies
    Dr. sc. ing., associated professor

References

AGRAWAL, R. et al. Efficient similarity search in sequence databases. Proc. 4th

Int. Conf. On Foundations of Data Organizations and Algorithms, Chicago.1993.

pp. 69–84.

EVERITT, B. Cluster analysis. Edward Arnold, London, 1993.

GAN, G. et al. Data clustering: Theory, algorithms and applications. ASA–SIAM

series on Statistics and Applied Probability, SIAM, Philadelphia, ASA,

Alexandria, VA, 2007.

GRABUSTS, P. Distance Metrics Selection Validity in Cluster Analysis. RTU

zinātniskie raksti. 5. sēr., Datorzinātne. 49. sēj. 2011. 72.–77. lpp.

HAN, J. et al. Geographic Data Mining and Knowledge Discovery. Taylor and

Francis, 2001. 372 pages.

KAUFMAN, L., ROUSSEEUW, P. Finding groups in data. An introduction to

cluster analysis. John Wiley & Sons, 2005.

LI, M. et al. The similarity metric. IEEE Transactions on Information Theory,

vol.50, No. 12, 2004. pp.3250–3264.

VITANYI, P. Universal similarity. ITW2005, Rotorua, New Zealand, 2005.

XU, R., WUNVH, D. Clustering. John Wiley & Sons, 2009. pp. 263–278.

KUZMINA, I. Augstskolu vērtēšana uzkurina kaislības [tiešsaiste]. Laikraksta

“Latvijas Avīze” publikācija [atsauce 2012.g. 15.feb.]. Pieejas veids:

http://la.lv/index.php?option=com_content&view=article&id=314680:augstsk

olu–vrtana–uzkurina–kaislbas&catid=124:aktuli&Itemid=146

Rank of Universities of Latvia [tiešsaiste]. Ranking Web of World Universities

[atsauce 2012.g. 15.feb.]. Pieejas veids:

http://www.webometrics.info/rank_by_country.asp?country=lv

SIR World Report 2011[tiešsaiste]. SCImago Institutions Rankings [atsauce

g. 15.feb.]. Pieejas veids: http://www.scimagoir.com/

Top 400 World Universities [tiešsaiste]. The Times Higher World University

Ranking [atsauce 2012.g. 15.feb.]. Pieejas veids:

http://www.timeshighereducation.co.uk/world–university–rankings/2011–

/top–400.html

QS World University Rankings 2011/2012 [tiešsaiste]. QS Top Universities

[atsauce 2012.g. 15.feb.]. Pieejas veids:

http://www.topuniversities.com/university–rankings/world–university–

rankings/2011

Downloads

Published

2012-06-23