Discussion:
hierarchical cluster analysis
(too old to reply)
h***@yahoo.com
2004-09-30 17:57:17 UTC
Permalink
I have some questions regarding hierarchical cluster analysis.

(1) What exactly is a linkage distance (when I have a dendrogram)? I
kind of have an idea that it is the coefficient of the distance at
which different objects form a cluster, and different lower level
clusters form higher level clusters. But, I don't really know how to
explain to people.

(2) How do I decide at which point I should cut the dendrogram tree
(how do I decide how many clusters to retain)? Is it just a
subjective process?

Thank you very much.



'`'`'`'`'`'`'`'`'`'`'`'`'`''`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`
sci.psychology.research is a moderated newsgroup.
Before submitting an article, please read the guidelines which are posted
here bimonthly or the charter on the web at http://psychcentral.com/spr/
Submissions are acknowledged automatically.
unknown
2004-10-01 14:08:24 UTC
Permalink
Post by h***@yahoo.com
I have some questions regarding hierarchical cluster analysis.
(1) What exactly is a linkage distance (when I have a dendrogram)?
kind of have an idea that it is the coefficient of the distance at
which different objects form a cluster, and different lower level
clusters form higher level clusters. But, I don't really know how
to explain to people.
The concept of linkage distance is needed to define what a distance
between two clusters. The distance between two elements is a
simple metric, but not between two sets of elements. It's possible to
define the notion of a multimetric (D. Wolpert "Metrics for more than
two points at once" arXiv:nlin.AO/0404032) in a rigorous way, but
linkage distances are simple heuristics that try to do the same. There
are a few:

1. Average linkage: how different on average are all pairs of
elements, the first element is from the first, and the second from the
second cluster.
2. Single linkage: how different is the closest pair of neighboring
elements
3. Complete linkage: how different are the elements from the most
different pair of two clusters
4. Ward's method: Ward's minimum variance linkage method attempts
to minimize the increase in the total sum of squared deviations from
the mean of a cluster.
5. Weighted linkage method: it is a derivative of average linkage
method, but where both clusters are weighted equally in order to
remove the influence of different cluster size.
Post by h***@yahoo.com
(2) How do I decide at which point I should cut the dendrogram tree
(how do I decide how many clusters to retain)? Is it just a
subjective process?
The subjective guideline is that the longer the branches you cut, the
better the cut. For example, this is a better place to cut:

+------------|
|
+----------------|

than this one:

+---|
|
+-----|

You can do a rudimentary kind of a test by creating several bootstrap
replications of the data, doing the splitting for each replication
independently, and verifying if the resulting cluster assignment is
identical across the replications. If it is not, the splitting has not
been "significantly" obvious. It rarely is, in fact, except for a
subset of typical elements.
--
mag. Aleks Jakulin
http://www.ailab.si/aleks/
Artificial Intelligence Laboratory,
Faculty of Computer and Information Science,
University of Ljubljana, Slovenia.



'`'`'`'`'`'`'`'`'`'`'`'`'`''`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`'`
sci.psychology.research is a moderated newsgroup.
Before submitting an article, please read the guidelines which are posted
here bimonthly or the charter on the web at http://psychcentral.com/spr/
Submissions are acknowledged automatically.
unknown
2004-12-14 08:05:22 UTC
Permalink
doing anything else that would destroy
the intellectual respectability of the ideology.

188. On a second level, the ideology should be propagated in a
simplified form that will enable the unthinking majority to see the
conflict of technology vs. nature in unambiguous terms. But even on
this second level the ideology should not be expressed in language
that is so cheap, intemperate or irrational that it alienates people
of the thoughtful and rational type. Cheap, intemperate propaganda
sometimes achieves impressive short-term gains, but it will be more
advantageous in the long run to keep the loyalty of a small number of
intelligently committed people than to arouse the passions of an
unthinking, fickle mob who will change their attitude as soon as
someone comes along with a better propaganda gimmick. However,
propaganda of the rabble-rousing type may be necessary when the system
is nearing the point of collapse and there is a final struggle between
rival ideologies to determine which will become dominant

Loading...