algorithm - KMeans clustering for more than 5 million vectors -
I have hit a real problem, I need to do some commensation clustering for 5 million vectors, each of which is approximately There are 32 columns. I tried to exclude Mahavat which is Linux and I am on windows, I have been stopped using Linux OS and any kind of simulator.
Can someone suggest a KMeans clustering algorithm which is capable of scaling up to 5m vectors and can happen quickly?
I have done some tests, but they are used on scale. It means that they are slow and take them completely for ever.
Thanks
OK, so to do clustering for large scale datasets The only way to do this is to use MAHOTT IT requires a Linux platform, so I had to use the virtual box, it was put Ubuntu and then used the festival. It has a long way to establish Mahout, but the two links I used are as follows.
Comments
Post a Comment