The Citation Prediction Project was originally a collaborative project between the University of Mary Washington
and the Dahlgren Naval Surface Warfare Center. At the end of the semester, the code was released as an open source
project by the two student researchers (Josiah Neuberger and William Etcho) after obtaining appropriate permission from all involved.
In the paper
Quantifying Long-Term Scientific Impact, Wang, Song and Barabasi (WSB) showed how the citation history
of a paper can be used to predict future citation patterns and long-term scientific impact. They start by
identifying the three fundamental mechanisms that drive the citation history of individual papers. First,
preferential attachment uses the fact that more visible or highly cited papers are more likely to be cited again.
Second, aging takes into account that new ideas or publications will integrate the work from previously cited papers
and thus will lead to fewer citations in the future. Last, fitness captures a paper's importance relative to other papers
and is a measurable quantity they term as “Relative Fitness”.
The project makes use of the WSB Triple discussed in the WSB paper, which is a vector of the three values mentioned above:
- λ - Relative Fitness
- μ - Immediacy
- σ - Longevity
This project is an engineered solution to finding the WSB Triple using a paper's citation history of at least 5 years or more.
The software is written was prototyped in R and implemented in
Java.
The software system requires the paper's citation history
to be placed in the 'papers' directory using a CSV file. The CSV file should have each paper’s data in a single row. The first
two columns providing identifying information: some kind of integer id and a 4 digit year. The remaining columns are dedicated
to the citation history of the paper. Each one should contain a year’s worth of citations. If the paper received no citations for
a year than the file should contain 0 in that column, ie:
3040403,1950,3,4,0,10,0,0.....,0
This paper received 3 citations from time=0 (publishing) to time=1 year, 4 in the second, 0 in the third, 10 in the 4th, et cetera.
The software will give the user the option to select a paper to process (or you can process the whole file). The software will
attempt to find three WSB solutions using 5 years, 10 years, and all years of the citation history as training for the algorithm.
The software will also show the solution graphed in a formula extracted from the WSB paper for predicting future citations. This graph
will be saved under the directory 'saved_plots\<name_of_file_containing_paper_citation_data>\'.
Before you use the software or this source code you should really read additional background material not covered here. Please refer
to the next section for links to these sources and others related to this project.