Cluster

From Easyrec Wiki
Jump to: navigation, search

A cluster groups a number of items and can be manually created and populated using the new Cluster Manager section (picture below) in the easyrec administration interface, which also offers the option of importing item sets from CSV.

Easyrec clustermanager overview.png

Clicking on start the cluster manager opens a new window. Clusters can be organized in a hierarchical way.

Note: Even though clusters can have hierarchical relations, an item only belongs to the cluster it is specifically assigned to. This means an item associated to a certain child cluster does NOT automatically belong to any parent cluster of that child cluster. However, items can be assigned to an arbitrary number of clusters, so if you want to see items of child clusters in queries to a parent cluster they need to be manually assigned to the parent cluster again.


Easyrec clustermanager.png

Cluster naming

Cluster names are used as ids and hence have to be UNIQUE for the cluster system to work properly. Because clusters can be organized in a hierarchical way it might be tempting to reuse cluster names. For example consider you have 2 clusters for movies named "Animation" and "Action" and further want to cluster the movies by year of release. The way to do this is by creating child clusters called "Animation_2011" and "Action_2011" as creating two child clusters named simply "2011" would create a naming conflict and easyrec could not distinguish between the child clusters of "Animation" and "Action".

GOOD:  CLUSTERS           BAD:    CLUSTERS
      /        \                 /        \
  Animation  Action          Animation  Action
      |         |                |         |
Animation_2011 Action_2011      2011      2011

This is just a short explanation of the reasoning behind the cluster naming restrictions. Of course the easyrec admin tool will automatically reject conflicting names.

Cluster fallback mechanism

When calling the API method itemsofcluster there is an optional parameter 'usefallback' available to pass with a call. If usefallback is set to 'true', easyrec tries to traverse the cluster hierarchy and adds items from sibling and parent clusters to the returned recommendation in case the given cluster only has less that 'numberOfResults' matching items.

Cluster CSV import

easyrec supports the upload of item to cluster relations using a .csv (comma separated value) file. csv files can easily be created with any editor or Microsoft Excel. The content of the file should have the following format:

Cluster;itemId;itemType

The valid separator symbol is the semicolon ";".


Each line can contain exactly one cluster to item relation. The first line of the file is assumed to contain the headlines and is thus ignored on import. Please note that items as well as clusters must exist before doing an import. Otherwise the line will be ignored. You can create clusters in the Cluster Manager and use the import API to import non existing items. Below is an example of valid file content for the movielens data set. It adds items (item type ITEM) of the movielens data set released in the sixties to a cluster named 60ies.

Cluster;ItemId;ItemType
60ies;1021;ITEM
60ies;1121;ITEM
60ies;1125;ITEM
60ies;1154;ITEM
60ies;1198;ITEM
60ies;1252;ITEM
60ies;131;ITEM
60ies;135;ITEM
60ies;1382;ITEM
60ies;139;ITEM
60ies;1411;ITEM
60ies;143;ITEM
60ies;1444;ITEM
60ies;1573;ITEM
60ies;1578;ITEM
60ies;1674;ITEM
60ies;177;ITEM
60ies;185;ITEM
60ies;197;ITEM
60ies;30;ITEM
60ies;417;ITEM
60ies;419;ITEM
60ies;427;ITEM
60ies;435;ITEM
60ies;443;ITEM

Once the upload is done you will be presented with an upload report similar to the output pictured below.

Easyrec importreport.png
Personal tools
Namespaces

Variants
Actions
easyrec documentation
Navigation
Toolbox