Newest Updates
The newest version of the source code for this project along with a runnable jar file can be found on github:
Project Introduction
The Citation Prediction Project was originally a collaborative project between the University of Mary Washington and the Dahlgren Naval Surface Warfare Center. At the end of the semester, the code was released as an open source project by the two student researchers (Josiah Neuberger and William Etcho) after obtaining appropriate permission from all involved.

Background Information

In the paper Quantifying Long-Term Scientific Impact, Wang, Song and Barabasi (WSB) showed how the citation history of a paper can be used to predict future citation patterns and long-term scientific impact. They start by identifying the three fundamental mechanisms that drive the citation history of individual papers. First, preferential attachment uses the fact that more visible or highly cited papers are more likely to be cited again. Second, aging takes into account that new ideas or publications will integrate the work from previously cited papers and thus will lead to fewer citations in the future. Last, fitness captures a paper's importance relative to other papers and is a measurable quantity they term as “Relative Fitness”.

The project makes use of the WSB Triple discussed in the WSB paper, which is a vector of the three values mentioned above:

  • λ - Relative Fitness
  • μ - Immediacy
  • σ - Longevity
This project is an engineered solution to finding the WSB Triple using a paper's citation history of at least 5 years or more. The software is written was prototyped in R and implemented in Java. The software system requires the paper's citation history to be placed in the 'papers' directory using a CSV file. The CSV file should have each paper’s data in a single row. The first two columns providing identifying information: some kind of integer id and a 4 digit year. The remaining columns are dedicated to the citation history of the paper. Each one should contain a year’s worth of citations. If the paper received no citations for a year than the file should contain 0 in that column, ie:

3040403,1950,3,4,0,10,0,0.....,0

This paper received 3 citations from time=0 (publishing) to time=1 year, 4 in the second, 0 in the third, 10 in the 4th, et cetera. The software will give the user the option to select a paper to process (or you can process the whole file). The software will attempt to find three WSB solutions using 5 years, 10 years, and all years of the citation history as training for the algorithm. The software will also show the solution graphed in a formula extracted from the WSB paper for predicting future citations. This graph will be saved under the directory 'saved_plots\<name_of_file_containing_paper_citation_data>\'.

Before you use the software or this source code you should really read additional background material not covered here. Please refer to the next section for links to these sources and others related to this project.

Links to Additional Reading