Improving Efficiency of K-Means Algorithm for Large Datasets
Clustering is a process of grouping objects into different classes based on their similarities. K-means is a widely studied partitional based algorithm. It is reported to work efficiently for small datasets; however the performance is not very appreciable in terms of time of computation for large datasets. Several modifications have been made by researchers to address this issue. This paper proposes a novel way of handling the large datasets using K-means in a distributed manner to obtain efficiency. The concept of parallel processing is exploited by dividing the datasets to a number of baskets and then applying K-means in parallel manner to each such basket. The proposed BasketK-means provides a very competitive performance with considerably less computation time. The simulation results on various real datasets and synthetic datasets presented in the work clearly emphasize the effectiveness of the proposed approach.
Year of publication: |
2016
|
---|---|
Authors: | Swapna, Ch. Swetha ; Kumar, V. Vijaya ; Murthy, J.V.R |
Published in: |
International Journal of Rough Sets and Data Analysis (IJRSDA). - IGI Global, ISSN 2334-4601, ZDB-ID 2798043-1. - Vol. 3.2016, 2 (01.04.), p. 1-9
|
Publisher: |
IGI Global |
Subject: | K-Means | Large Datasets | Parallel Clustering | Performance Measures |
Saved in:
Online Resource
Saved in favorites
Similar items by subject
-
Buchen, Teresa, (2013)
-
Visualizing Historical Patterns in Large Educational Datasets
Martins, Tiago, (2018)
-
Application of Sequential Pattern Mining Algorithm in Commodity Management
Wang, Xiaoli, (2018)
- More ...