To use the code you should first create a database (tables are explained below) and fill it with your data.
Fingeprint works upon clusters extracted from a data stream at consecutive timepoints.
In the code, there is no algorithm for the extraction of clusters. 
In the experiments we used WEKA, you can download it from here: http://www.cs.waikato.ac.nz/ml/weka/

===============================================================================
Regarding the database generation:
===============================================================================
Run the dbScript.sql to create the database tables. 
The database has been implemented in 
"
	Microsoft SQL Server Management Studio						10.50.1617.0
	Microsoft Data Access Components (MDAC)						3.85.1132
	Microsoft MSXML						2.6 3.0 4.0 5.0 6.0 
	Microsoft Internet Explorer						6.0.2900.5512
	Microsoft .NET Framework						2.0.50727.3625
	Operating System						5.1.2600
"
The script was generated with SQL server (right click on the database: Tasks --> Generate Scripts)

Below is a short description of the tables:

*** tbl_datasets ***
Describes the datasets used for the experiments

*** tbl_clusterings ***
Describes a clustering over a dataset and for a given time period.

*** tbl_clusters ***
Describes the clusters of a clustering.

*** tbl_instances_KDD98cup ***
Contains the instances from the KDD cup 98 dataset.


*** tbl_KDD98cup_centroids ***
Contains the centroids for each cluster. The first column (id) is the clusterID.

*** tbl_experiments ***
Describes the settings of an experiment. Which dataset, what monitoring thresholds ....

*** tbl_runnings ***
Running an experiment between two clusterings.

*** tbl_transitions *** 
Transitions (aka cluster changes) in a running.

*** tbl_clustersInstances_asc ***
Which instances participate to which cluster

The tables are generic except for the tbl_instances_KDD98cup, tbl_KDD98cup_centroids which refer to the specific dataset.

===============================================================================
Regarding the code:
===============================================================================
To test the online compression use: TestOnlineCompression
To test the offline compression use: TestOfflineCompression

In both cases you should change the running parameters, like experimentID, the number of timepoints you want to monitor, and the 
centroids threshold distance.

Note that you should first fill the database with your data...

The results are stored into disk, the exact path is specified in the path variable.

Required: Panda.jar
===============================================================================
The Panda.jar library should be included.
PANDA is a framework for comparing different types of simple and complex patterns.
More on PANDA in the following publication:
"Ilaria Bartolini, Paolo Ciaccia, Irene Ntoutsi, Marco Patella, and Yannis Theodoridis. 2009. The Panda framework for comparing patterns. Data Knowl. Eng. 68, 2 (February 2009), 244-260. DOI=10.1016/j.datak.2008.10.004 http://dx.doi.org/10.1016/j.datak.2008.10.004"


Required: MONIC
===============================================================================
This work also uses MONIC for detecting transitions between two clusterings.
Although the original MONIC framework provides much more information (e.g. internal transitions), here we only use the external transitions. 
So, the functionality of the class Monitor.java is sufficient.

More on MONIC in the following publications: 
"Myra Spiliopoulou, Irene Ntoutsi, Yannis Theodoridis, and Rene Schult. 2006. MONIC: modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD '06). ACM, New York, NY, USA, 706-711. DOI=10.1145/1150402.1150491 http://doi.acm.org/10.1145/1150402.1150491"
"I. Ntoutsi, M. Spiliopoulou, Y. Theodoridis "Tracing cluster transitions for different cluster types", Control and Cybernetics Journal, 38(1):239-260, 2009. Polish Academy of Sciences."



