How to plot PCA results and kmedoids results in Python?

This is the place for queries that don't fit in any of the other categories.

How to plot PCA results and kmedoids results in Python?

Postby ericrystal » Thu Apr 11, 2013 9:04 am

Hello everybody,
I have faced two questions about the plot problems in Python (how to plot the principal component analysis results and how to plot the kmedoids results ).

The first one is how to plot the PCA results:
I have searched a lot in the web, but I don't find a good answer for this question. There is a good answer for 3D(http://blog.nextgenetics.net/?e=42), however I see that the way for 2D must be quite different but I do not know how to modify it correctly. Could anyone help me to give some codes (following the example of 3D) in python for this question (and when we plot the points, how we can show also its names and the arcs with similarity (cos) between the points in the figure)? My data is the similarity (distance) matrix between the documents and I just want a quite good plot in using like the matplotlib as shown in the example of 3D.


The second one is how to plot the kmedoids results.
For example: I found an example in the web: http://www.dalkescientific.com/writings ... ering.html. There is a distances matrix of the cities, and we treat it by the following codes:
Code: Select all
import Numeric
from Bio import Cluster
names = [ "Bloemfontein", "Cape Town", "Durban", "East London",
      "George", "Johannesburg", "Kimberley", "Mmabatho", "Graskop",
      "Oudtshoorn", "Port Elizabeth", "Umtata"]   
distances = Numeric.array([
[   0, 1009,  625,  559,  746,  389,  157,  409,  788,  716,  632,  508],
[1009,    0, 1532, 1034,  425, 1407,  963, 1307, 1798,  456,  738, 1214],
[ 625, 1532,    0,  650, 1219,  567,  792,  860,  760, 1275,  901,  424],
[ 559, 1034,  650,    0,  615,  963,  721,  994, 1250,  671,  297,  226],
[ 746,  425, 1219,  615,    0, 1142,  726, 1071, 1534,   57,  319,  795],
[ 389, 1407,  567,  963, 1142,    0,  482,  296,  395, 1115, 1059,  831],
[ 157,  963,  792,  721,  726,  482,    0,  345,  873,  671,  736,  670],
[ 409, 1307,  860,  994, 1071,  296,  345,    0,  689, 1015, 1080,  933],
[ 788, 1798,  760, 1250, 1534,  395,  873,  689,    0, 1505, 1450, 1024],
[ 716,  456, 1275,  671,   57, 1115,  671, 1015, 1505,    0,  376,  852],
[ 632,  738,  901,  297,  319, 1059,  736, 1080, 1450,  376,    0,  478],
[ 508, 1214,  424,  226,  795,  831,  670,  933, 1024,  852,  478,    0],
], Numeric.Float)
clusterids, error, nfound = Cluster.kmedoids(distances, 2)
print "Cluster ids:", clusterids
print "error:", error
print "nfound:", nfound
cities_in_cluster = {}
for name, clusterid in zip(names, clusterids):
    cities_in_cluster.setdefault(clusterid, []).append(name)
# Showing off a nice module for when you have long text that
# should fold over multiple lines.
import textwrap
for centroid_id, city_names in cities_in_cluster.items():
    print "Cluster around", names[centroid_id]
    text = ", ".join(city_names)
    for line in textwrap.wrap(text, 70):
        print "  ", line


So who can tell me how I can plot these results (I want to show in the figure: the points with city's names in each cluster , different cluster marked by different color (for example, red oval or round for cluster 1 and blue points for the cities in the oval or round of cluster 1, green oval or round for cluster 2 and yellow points for the cities in cluster 2, etc.), the distance between cluster centroids if we can calculate (if this is the case, how can we calculate?)) and within a cluster, the distance of each city to this cluster centroid? I will be very thankful to your help!!
ericrystal
 
Posts: 18
Joined: Thu Apr 11, 2013 8:56 am

Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 4 guests