Citation Prediction Project

An open source (Apache v2.0) project originating from: A collaborative research project between the University of Mary Washington, UMW, and the Dahlgren Naval Surface Warfare Center, DNSWC**

Newest Updates

The newest version of the source code for this project along with a runnable jar file can be found on github:

Citation and Prediction Project Code/Software

Project Introduction

The Citation Prediction Project was originally a collaborative project between the University of Mary Washington and the Dahlgren Naval Surface Warfare Center. At the end of the semester, the code was released as an open source project by the two student researchers (Josiah Neuberger and William Etcho) after obtaining appropriate permission from all involved.

Background Information

In the paper Quantifying Long-Term Scientific Impact, Wang, Song and Barabasi (WSB) showed how the citation history of a paper can be used to predict future citation patterns and long-term scientific impact. They start by identifying the three fundamental mechanisms that drive the citation history of individual papers. First, preferential attachment uses the fact that more visible or highly cited papers are more likely to be cited again. Second, aging takes into account that new ideas or publications will integrate the work from previously cited papers and thus will lead to fewer citations in the future. Last, fitness captures a paper's importance relative to other papers and is a measurable quantity they term as “Relative Fitness”.

The project makes use of the WSB Triple discussed in the WSB paper, which is a vector of the three values mentioned above:

λ - Relative Fitness
μ - Immediacy
σ - Longevity

This project is an engineered solution to finding the WSB Triple using a paper's citation history of at least 5 years or more. The software is written was prototyped in R and implemented in Java. The software system requires the paper's citation history to be placed in the 'papers' directory using a CSV file. The CSV file should have each paper’s data in a single row. The first two columns providing identifying information: some kind of integer id and a 4 digit year. The remaining columns are dedicated to the citation history of the paper. Each one should contain a year’s worth of citations. If the paper received no citations for a year than the file should contain 0 in that column, ie:

3040403,1950,3,4,0,10,0,0.....,0

This paper received 3 citations from time=0 (publishing) to time=1 year, 4 in the second, 0 in the third, 10 in the 4th, et cetera.

		
											The software will give the user the option to select a paper to process (or you can process the whole file). The software will 
											attempt to find three WSB solutions using 5 years, 10 years, and all years of the citation history as training for the algorithm. 
											The software will also show the solution graphed in a formula extracted from the WSB paper for predicting future citations. This graph 
											will be saved under the directory 'saved_plots\<name_of_file_containing_paper_citation_data>\'.
											

											

											Before you use the software or this source code you should really read additional background material not covered here. Please refer 
											to the next section for links to these sources and others related to this project.


										
										Links to Additional Reading
										
											
											Quantifying Long-Term Scientific Impact by WSB
											WSB Supplemental Material
											Engineering a Software Solution for finding a WSB Triple by Neuberger and Etcho


		
		
		
		
		

		
			
				
				
					
						

							
								
									
					
									
										
											
												
													Acknowledgements
													
										
													
														**This project would like to thank the University of Mary Washington and several people whom were a part of the original project whom made this open source code/project possible:

														
														
Dr. Jeff Solka, DNSWC
														Dr. Allen Parks, DNSWC
														Kristin Ash, DNSWC
														Dr. Melody Denhere, UMW (Supervising Faculty)
														William Etcho, UMW (student researcher, MATH)
														Josiah Neuberger, UMW (st. res., Computer Science)
													
													
													
														
														This project would also like to acknowledge that this site was built on top of a great template by HTML5 Up. I've also customized the template with some great texture patterns by Subtle Patterns.
														
HTML5 UP: Escape Velocity Template (license: CC3 Attribution)
														Subtle Patterns 
													
													
												
											
										
										
											
											
												
													
														
															
																
																	
																		
																			Site Owner
																			
																				Josiah Neuberger

																				Software Engineer
																				
																			
																			
																		
												
																	
																	
								
																	
																		
																			
																			Email
																			
																				josiah@wikimylife.org
																			
																			
																		
																	
																
																
																

															
														
													
												
													
											
										
									
									
								
							

							
								
									
										Website License Creative Commons Attribution 3.0 Unported © Copyright 2014 Josiah Neuberger