SNAP Library 6.0, Developer Reference  2020-12-09 16:24:20
SNAP, a general purpose, high performance system for analysis and manipulation of large networks
TCluster Class Reference

#include <circles.h>

Collaboration diagram for TCluster:

Public Member Functions

 TCluster (PGraphAttributes GraphAttributes, TInt K, TFlt Lambda)
 
 ~TCluster ()
 
void Train (TInt OuterReps, TInt GradientReps, TInt MCMCReps)
 Train the model to predict K Clusters. More...
 
TVec< TIntSetGetCircles (void)
 

Public Attributes

TCRef CRef
 

Private Member Functions

TFlt LogLikelihood ()
 Compute the log-likelihood of Parameters and cluster assignments. More...
 
TIntSet MCMC (TInt k, TInt MCMCReps)
 Optimize the cluster assignments for the k'th cluster. More...
 
void Gradient ()
 Update partial derivatives of log-likelihood. More...
 

Private Attributes

TFltTheta
 
TFltDerivative
 
TVec< TIntSetCHat
 
PGraphAttributes GraphAttributes
 
TInt K
 
TFlt Lambda
 

Detailed Description

Definition at line 28 of file circles.h.

Constructor & Destructor Documentation

TCluster::TCluster ( PGraphAttributes  GraphAttributes,
TInt  K,
TFlt  Lambda 
)
inline
Parameters
GraphAttributesattributed graph object with attributes
Knumber of communities to detect
Lambdaregularization parameter

Definition at line 35 of file circles.h.

References CHat, Derivative, K, TGraphAttributes::NFeatures, and Theta.

35  :
36  GraphAttributes(GraphAttributes), K(K), Lambda(Lambda) {
37  Theta = new TFlt[K * GraphAttributes->NFeatures];
38  Derivative = new TFlt[K * GraphAttributes->NFeatures];
39  for (int k = 0; k < K; k++) {
40  for (int f = 0; f < GraphAttributes->NFeatures; f++) {
41  Theta[k * GraphAttributes->NFeatures + f] = 0;
42  Derivative[k * GraphAttributes->NFeatures + f] = 0;
43  }
44 
45  // Clusters are initially empty.
46  CHat.Add(TIntSet());
47  }
48  }
TFlt Lambda
Definition: circles.h:77
TVec< TIntSet > CHat
Definition: circles.h:73
Definition: dt.h:1386
TInt NFeatures
Definition: circles.h:16
TFlt * Derivative
Definition: circles.h:71
THashSet< TInt > TIntSet
Definition: shash.h:1382
TInt K
Definition: circles.h:76
TFlt * Theta
Definition: circles.h:70
PGraphAttributes GraphAttributes
Definition: circles.h:74
TCluster::~TCluster ( )
inline

Definition at line 49 of file circles.h.

References Derivative, and Theta.

49  {
50  delete[] Theta;
51  delete[] Derivative;
52  }
TFlt * Derivative
Definition: circles.h:71
TFlt * Theta
Definition: circles.h:70

Member Function Documentation

TVec<TIntSet> TCluster::GetCircles ( void  )
inline
Returns
the predicted circles

Definition at line 64 of file circles.h.

References CHat.

64  {
65  return CHat;
66  }
TVec< TIntSet > CHat
Definition: circles.h:73
void TCluster::Gradient ( void  )
private

Update partial derivatives of log-likelihood.

Definition at line 456 of file circles.h.

References CHat, Derivative, TGraphAttributes::EdgeFeatures, TGraphAttributes::G, THashKeyDatI< TKey, TDat >::GetDat(), GraphAttributes, Inner(), TUNGraph::IsEdge(), K, Lambda, TGraphAttributes::NFeatures, Theta, TPair< TVal1, TVal2 >::Val1, and TPair< TVal1, TVal2 >::Val2.

Referenced by Train().

456  {
457  for (int i = 0; i < K * GraphAttributes->NFeatures; i++) {
458  if (Theta[i] > 0) {
459  Derivative[i] = -Lambda * Theta[i];
460  } else {
461  Derivative[i] = Lambda * Theta[i];
462  }
463  }
464 
466  !it.IsEnd(); it++) {
467  TFlt InnerProduct = 0;
468  TIntPr Edge = it.GetKey();
469  TInt Src = Edge.Val1;
470  TInt Dst = Edge.Val2;
471  TBool Exists = GraphAttributes->G->IsEdge(Src, Dst);
472  for (int k = 0; k < K; k++) {
473  TFlt d = CHat[k].IsKey(Src) && CHat[k].IsKey(Dst) ? 1 : -1;
474  InnerProduct += d * Inner(it.GetDat(), Theta + k * GraphAttributes->NFeatures);
475  }
476  TFlt expinp = exp(InnerProduct);
477  TFlt q = expinp / (1 + expinp);
478  if (q != q) {
479  q = 1; // Test for nan in case of overflow.
480  }
481 
482  for (int k = 0; k < K; k++) {
483  TBool d_ = CHat[k].IsKey(Src) && CHat[k].IsKey(Dst);
484  TFlt d = d_ ? 1 : -1;
485  for (THashKeyDatI<TInt, TInt> itf = it.GetDat().BegI();
486  !itf.IsEnd(); itf++) {
487  TInt i = itf.GetKey();
488  TInt f = itf.GetDat();
489  if (Exists) {
490  Derivative[k * GraphAttributes->NFeatures + i] += d * f;
491  }
492  Derivative[k * GraphAttributes->NFeatures + i] += -d * f * q;
493  }
494  }
495  }
496 }
TFlt Lambda
Definition: circles.h:77
TVec< TIntSet > CHat
Definition: circles.h:73
Definition: dt.h:1386
const TDat & GetDat() const
Definition: hash.h:81
THash< TIntPr, TIntIntH > EdgeFeatures
Definition: circles.h:19
Definition: dt.h:1137
TInt NFeatures
Definition: circles.h:16
TFlt * Derivative
Definition: circles.h:71
PUNGraph G
Definition: circles.h:15
Definition: ds.h:32
TFlt Inner(TIntIntH &Feature, TFlt *Parameter)
Inner product for sparse features.
Definition: circles.h:349
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
TInt K
Definition: circles.h:76
bool IsEdge(const int &SrcNId, const int &DstNId) const
Tests whether an edge between node IDs SrcNId and DstNId exists in the graph.
Definition: graph.cpp:137
Definition: dt.h:974
TFlt * Theta
Definition: circles.h:70
PGraphAttributes GraphAttributes
Definition: circles.h:74

Here is the call graph for this function:

Here is the caller graph for this function:

TFlt TCluster::LogLikelihood ( void  )
private

Compute the log-likelihood of Parameters and cluster assignments.

Returns
the log-likelihood of the current model

Definition at line 499 of file circles.h.

References CHat, TGraphAttributes::EdgeFeatures, TGraphAttributes::G, GraphAttributes, Inner(), TUNGraph::IsEdge(), K, TGraphAttributes::NFeatures, Theta, TPair< TVal1, TVal2 >::Val1, and TPair< TVal1, TVal2 >::Val2.

Referenced by Train().

499  {
500  TFlt ll = 0;
502  !it.IsEnd(); it++) {
503  TFlt InnerProduct = 0;
504  TIntPr Edge = it.GetKey();
505  TInt Src = Edge.Val1;
506  TInt Dst = Edge.Val2;
507  TBool Exists = GraphAttributes->G->IsEdge(Src, Dst);
508 
509  for (int k = 0; k < K; k++) {
510  TFlt d = CHat[k].IsKey(Src) && CHat[k].IsKey(Dst) ? 1 : -1;
511  InnerProduct += d * Inner(it.GetDat(), Theta + k * GraphAttributes->NFeatures);
512  }
513  if (Exists) {
514  ll += InnerProduct;
515  }
516  TFlt ll_ = log(1 + exp(InnerProduct));
517  ll += -ll_;
518  }
519 
520  if (ll != ll) {
521  printf("ll isnan\n");
522  exit(1);
523  }
524  return ll;
525 }
TVec< TIntSet > CHat
Definition: circles.h:73
Definition: dt.h:1386
THash< TIntPr, TIntIntH > EdgeFeatures
Definition: circles.h:19
Definition: dt.h:1137
TInt NFeatures
Definition: circles.h:16
PUNGraph G
Definition: circles.h:15
Definition: ds.h:32
TFlt Inner(TIntIntH &Feature, TFlt *Parameter)
Inner product for sparse features.
Definition: circles.h:349
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
TInt K
Definition: circles.h:76
bool IsEdge(const int &SrcNId, const int &DstNId) const
Tests whether an edge between node IDs SrcNId and DstNId exists in the graph.
Definition: graph.cpp:137
Definition: dt.h:974
TFlt * Theta
Definition: circles.h:70
PGraphAttributes GraphAttributes
Definition: circles.h:74

Here is the call graph for this function:

Here is the caller graph for this function:

TIntSet TCluster::MCMC ( TInt  k,
TInt  MCMCReps 
)
private

Optimize the cluster assignments for the k'th cluster.

Parameters
kcommunity index on which to run MCMC
MCMCRepsnumber of iterations of MCMC
Returns
node ids for the updated community

Definition at line 358 of file circles.h.

References TVec< TVal, TSizeTy >::Add(), THash< TKey, TDat, THashFunc >::AddDat(), THashSet< TKey, THashFunc >::AddKey(), CHat, TGraphAttributes::EdgeFeatures, TGraphAttributes::G, THash< TKey, TDat, THashFunc >::GetDat(), TRnd::GetUniDev(), GraphAttributes, Inner(), TUNGraph::IsEdge(), K, TVec< TVal, TSizeTy >::Len(), TGraphAttributes::NFeatures, TGraphAttributes::NodeIDs, Theta, TPair< TVal1, TVal2 >::Val1, and TPair< TVal1, TVal2 >::Val2.

Referenced by Train().

358  {
359  TRnd t;
360 
361  THash<TInt, TFlt> CostNotIncludeHash;
362  THash<TInt, TFlt> CostIncludeHash;
363 
364  TVec<TInt> NewLabel;
365  int csize = 0;
366  for (int i = 0; i < GraphAttributes->NodeIDs.Len(); i++) {
367  if (CHat[k].IsKey(GraphAttributes->NodeIDs[i])) {
368  NewLabel.Add(0);
369  } else {
370  NewLabel.Add(1);
371  }
372  if (CHat[k].IsKey(GraphAttributes->NodeIDs[i])) {
373  csize++;
374  }
375  }
376  // Compute edge log-probabilities.
378  !it.IsEnd(); it++) {
379  TIntPr e = it.GetKey();
380  TInt kv = GraphAttributes->EdgeFeatures.GetKeyId(e);
381  TInt Src = e.Val1;
382  TInt Dst = e.Val2;
383 
384  TBool Exists = GraphAttributes->G->IsEdge(Src, Dst);
385  TFlt InnerProduct = Inner(it.GetDat(), Theta + k * GraphAttributes->NFeatures);
386  TFlt Other = 0;
387  for (int l = 0; l < K; l++) {
388  if (l == k) {
389  continue;
390  }
391  TFlt d = (CHat[l].IsKey(Src) && CHat[l].IsKey(Dst)) ? 1 : -1;
392  Other += d * Inner(it.GetDat(), Theta + l * GraphAttributes->NFeatures);
393  }
394 
395  TFlt CostNotInclude;
396  TFlt CostInclude;
397 
398  if (Exists) {
399  CostNotInclude = -Other + InnerProduct + log(1 + exp(Other - InnerProduct));
400  CostInclude = -Other - InnerProduct + log(1 + exp(Other + InnerProduct));
401  } else {
402  CostNotInclude = log(1 + exp(Other - InnerProduct));
403  CostInclude = log(1 + exp(Other + InnerProduct));
404  }
405 
406  CostNotIncludeHash.AddDat(kv) = -CostNotInclude;
407  CostIncludeHash.AddDat(kv) = -CostInclude;
408  }
409 
410  // Run MCMC using precomputed probabilities
411  TFlt InitialTemperature = 1.0; // Initial temperature
412  for (int r = 2; r < MCMCReps + 2; r++) {
413  TFlt Temperature = InitialTemperature / log((double) r);
414  for (int n = 0; n < GraphAttributes->NodeIDs.Len(); n++) {
415  TFlt l0 = 0;
416  TFlt l1 = 0;
417  for (int np = 0; np < GraphAttributes->NodeIDs.Len(); np++) {
418  if (n == np) {
419  continue;
420  }
422  if (ed.Val1 > ed.Val2) {
423  ed = TIntPr(ed.Val2, ed.Val1);
424  }
425  TInt kv = GraphAttributes->EdgeFeatures.GetKeyId(ed);
426  TFlt m0 = CostNotIncludeHash.GetDat(kv);
427  if (NewLabel[np] == 0) {
428  l0 += m0;
429  l1 += m0;
430  } else {
431  l0 += m0;
432  l1 += CostIncludeHash.GetDat(kv);
433  }
434  }
435  TFlt LogLikelihoodDiff = exp(l1 - l0);
436  TFlt AcceptProb = pow(LogLikelihoodDiff, 1.0 / Temperature);
437  if (t.GetUniDev() < AcceptProb) {
438  NewLabel[n] = 1;
439  } else {
440  NewLabel[n] = 0;
441  }
442  }
443  }
444 
445  TIntSet Result;
446  for (int i = 0; i < GraphAttributes->NodeIDs.Len(); i++) {
447  if (NewLabel[i]) {
448  Result.AddKey(GraphAttributes->NodeIDs[i]);
449  }
450  }
451 
452  return Result;
453 }
TPair< TInt, TInt > TIntPr
Definition: ds.h:83
TVec< TIntSet > CHat
Definition: circles.h:73
Definition: dt.h:11
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
const TDat & GetDat(const TKey &Key) const
Definition: hash.h:262
Definition: dt.h:1386
THash< TIntPr, TIntIntH > EdgeFeatures
Definition: circles.h:19
int AddKey(const TKey &Key)
Definition: shash.h:1254
Definition: dt.h:1137
TInt NFeatures
Definition: circles.h:16
PUNGraph G
Definition: circles.h:15
Definition: ds.h:32
TFlt Inner(TIntIntH &Feature, TFlt *Parameter)
Inner product for sparse features.
Definition: circles.h:349
double GetUniDev()
Definition: dt.h:30
TVal1 Val1
Definition: ds.h:34
TVal2 Val2
Definition: ds.h:35
TInt K
Definition: circles.h:76
bool IsEdge(const int &SrcNId, const int &DstNId) const
Tests whether an edge between node IDs SrcNId and DstNId exists in the graph.
Definition: graph.cpp:137
TSizeTy Add()
Adds a new element at the end of the vector, after its current last element.
Definition: ds.h:602
Definition: dt.h:974
TDat & AddDat(const TKey &Key)
Definition: hash.h:238
TVec< TInt > NodeIDs
Definition: circles.h:20
TFlt * Theta
Definition: circles.h:70
PGraphAttributes GraphAttributes
Definition: circles.h:74

Here is the call graph for this function:

Here is the caller graph for this function:

void TCluster::Train ( TInt  OuterReps,
TInt  GradientReps,
TInt  MCMCReps 
)

Train the model to predict K Clusters.

Parameters
OuterRepsnumber of coordinate ascent iterations
GradientRepsnumber of iterations of gradient ascent
MCMCRepsnumber of iterations of MCMC

Definition at line 292 of file circles.h.

References CHat, Derivative, TRnd::GetUniDevInt(), Gradient(), GraphAttributes, K, TVec< TVal, TSizeTy >::Len(), LogLikelihood(), MCMC(), TGraphAttributes::NFeatures, TGraphAttributes::NodeIDs, and Theta.

292  {
293  // Learning rate
294  TFlt Increment = 1.0 / (1.0 * GraphAttributes->NodeIDs.Len() * GraphAttributes->NodeIDs.Len());
295  TRnd t;
296 
297  for (int OuterRep = 0; OuterRep < OuterReps; OuterRep++) {
298  // If it's the first iteration or the solution is degenerate,
299  // randomly initialize the weights and Clusters
300  for (int k = 0; k < K; k++) {
301  if (OuterRep == 0 || CHat[k].Empty() || CHat[k].Len()
302  == GraphAttributes->NodeIDs.Len()) {
303  CHat[k].Clr();
304  for (int i = 0; i < GraphAttributes->NodeIDs.Len(); i++) {
305  if (t.GetUniDevInt(2) == 0) {
306  CHat[k].AddKey(GraphAttributes->NodeIDs[i]);
307  }
308  }
309  for (int i = 0; i < GraphAttributes->NFeatures; i++) {
310  Theta[k * GraphAttributes->NFeatures + i] = 0;
311  }
312  // Just set a single Feature to 1 as a random initialization.
314  Theta[k * GraphAttributes->NFeatures] = 1;
315  }
316  }
317 
318  for (int k = 0; k < K; k++) {
319  CHat[k] = MCMC(k, MCMCReps);
320  }
321  TFlt llPrevious = LogLikelihood();
322 
323  // Perform gradient ascent
324  TFlt ll = 0;
325  for (int gradientRep = 0; gradientRep < GradientReps; gradientRep++) {
326  Gradient();
327  for (int i = 0; i < K * GraphAttributes->NFeatures; i++) {
328  Theta[i] += Increment * Derivative[i];
329  }
330  printf(".");
331  fflush( stdout);
332  ll = LogLikelihood();
333 
334  // If we reduced the objective, undo the update and stop.
335  if (ll < llPrevious) {
336  for (int i = 0; i < K * GraphAttributes->NFeatures; i++) {
337  Theta[i] -= Increment * Derivative[i];
338  }
339  ll = llPrevious;
340  break;
341  }
342  llPrevious = ll;
343  }
344  printf("\nIteration %d, ll = %f\n", OuterRep + 1, (double) ll);
345  }
346 }
TIntSet MCMC(TInt k, TInt MCMCReps)
Optimize the cluster assignments for the k'th cluster.
Definition: circles.h:358
TVec< TIntSet > CHat
Definition: circles.h:73
Definition: dt.h:11
TSizeTy Len() const
Returns the number of elements in the vector.
Definition: ds.h:575
Definition: dt.h:1386
TInt NFeatures
Definition: circles.h:16
TFlt * Derivative
Definition: circles.h:71
void Gradient()
Update partial derivatives of log-likelihood.
Definition: circles.h:456
TInt K
Definition: circles.h:76
int GetUniDevInt(const int &Range=0)
Definition: dt.cpp:39
TFlt LogLikelihood()
Compute the log-likelihood of Parameters and cluster assignments.
Definition: circles.h:499
TVec< TInt > NodeIDs
Definition: circles.h:20
TFlt * Theta
Definition: circles.h:70
PGraphAttributes GraphAttributes
Definition: circles.h:74

Here is the call graph for this function:

Member Data Documentation

TVec<TIntSet> TCluster::CHat
private

Definition at line 73 of file circles.h.

Referenced by GetCircles(), Gradient(), LogLikelihood(), MCMC(), TCluster(), and Train().

TCRef TCluster::CRef

Definition at line 67 of file circles.h.

TFlt* TCluster::Derivative
private

Definition at line 71 of file circles.h.

Referenced by Gradient(), TCluster(), Train(), and ~TCluster().

PGraphAttributes TCluster::GraphAttributes
private

Definition at line 74 of file circles.h.

Referenced by Gradient(), LogLikelihood(), MCMC(), and Train().

TInt TCluster::K
private

Definition at line 76 of file circles.h.

Referenced by Gradient(), LogLikelihood(), MCMC(), TCluster(), and Train().

TFlt TCluster::Lambda
private

Definition at line 77 of file circles.h.

Referenced by Gradient().

TFlt* TCluster::Theta
private

Definition at line 70 of file circles.h.

Referenced by Gradient(), LogLikelihood(), MCMC(), TCluster(), Train(), and ~TCluster().


The documentation for this class was generated from the following file: