Capstone Projects

Capstone Projects

The final class in the Data Science Certificate program involves a "capstone project" where students tackle a realistic problem with realistic (or actual real) data. Here are a few recent capstone projects:


Machine Learning to Detect Anomalies in DoD Contracts
MAJ Nick Lee

Lee diagramOur study of the USASpending data demonstrates the identification of important variables related to best spending practices can be used to identify trends and contracts that stand out as unusual. We show anomalies in multiple ways: (1) visual comparison of averages; (2) outlier identification of spending patterns, (3) secondary testing for values, (4) a more in-depth analysis of an awarding office, and (5) an application to view multiple results at once.

Lee chart


Anomaly Detection Approaches
MAJ Smith & Dr. John Alt

Initial anomaly detection methods identified ~1,000 anomalous records out of 1.45 million, reducing the records for an analyst to explore manually by over 99%.

Clustering--a technique used to put objects into groups that are somehow similar in characteristics.

Clustering Methods: Kmodes, Hierarchical Agglomerative Clustering (HAC), Partition Around Medoids (PAM), Randome Forests (supervised and unsupervised), PAM with Dissimilarity Matrix (Daisy)

Smith diagram

Neural Network Autoencoder--an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs.

Smith diagram


Demographic Prediction
LCDR Jeff Good & MAJ Gabe Samudio

USAREC recruiters regularly receive partial demographic information from potential recruits. Recruiters and associated professionals need a rapid method to predict unknown demographic and financial information to enhance decision making by reducing the uncertainty of missing information.

Good diagram


Moneyball for Maintainers
MAJ Chris Arnold & MAJ Dan Hudalla

Analytics have revolutionized many industries concerned with individual and team performance. Naval Air Systems Command (NAVAIR) seeks an analytic approach to improve maintenance by using maintenance data to gain insights about maintainer performance, end item reliability, and maintenance efficiency.

NAVAIR is interested in a “Moneyball for Maintenance” approach that allows leaders to clearly see and maximize the performance metrics of their maintainers for the good of the organization.

Arnold diagram


Analysis of Large Text Data Corpus
Major Joe Moeller, Captain Chris Teska, & Lieutenant Cang Pham

Develop a process to rapidly extract text data in order to classify documents and identify key topics, issues, or questions.

Currently, much time and effort is needed to analyze documents manually for inclusion into senior level “Decision Packages”. Personnel must organize data into one of three categories: Challenges, Opportunities, or Issues. These categories are then aligned with the current priorities incorporated into AFC Task 5.

Text Summarization

  • Fully connected graph of sentences
  • Based on Cosine Similarity
  • PageRank Algorithm on graph

Latent Dirichlet Allocation

  • Algorithmically identifies topics
  • Assesses document – topic fit
  • Allow association with multiple topics
  • Keywords associated with each topic

Moeller screenshot


  • Easy access to document keywords and summaries
  • Topic Modelling ranks each document by topic assignment
  • Keywords for each topic are identified and easily searched

Moeller screenshot


Special thanks to the following Advisors on these projects: Dr. J. Alt, LTC B. Wade, MAJ K. Klingensmith, MAJ A. Smith, CPT Sean Clement, (TRAC); Dr. C. Darken, Dr. R. Koyak (GSOIS).


Recent "Data Science" Thesis work:

Classifying vessels operating in the South China Sea by origin with the Automatic Identification System 

Cull, Kimberly M. (Monterey, California: Naval Postgraduate School, 2018-03)
to identify patterns of movement in busy waterways followed by regression models (Bay, 2017), neural networks or random forests (Young, 2017) to predict vessel trajectory based on AIS dynamic data. Other research focuses 5 on the data science required prior...
be possible. A special thank you to my husband Danny, my family, and my friends who all supported me during my time as a graduate student. v THIS PAGE INTENTIONALLY LEFT BLANK vi I. INTRODUCTION Many companies and organizations are becoming increasingly interested in data...

Multicommodity logistical support in an anti-access, area denial environment 

Krenz, Jonathan M. (Monterey, California: Naval Postgraduate School, 2018-03)
As countries around the world develop long range anti-ship ballistic missiles (ASBMs) the current method of replenishing warships at sea may no longer be viable. These long range ASBMs can be used to target combat logistic ...

Innovation increase: how technology can create open, decentralized, and trackable data sharing 

Hupka, Erica (Monterey, California: Naval Postgraduate School, 2018-03)
University research must be widely shared to increase innovation; however, regulated and sensitive information must be secured to prevent theft and malicious misuse. The ideal sharing environment will allow universities ...

Assessing the robustness of graph statistics for network analysis under incomplete information 

Chia, Xian Lin Penelope (Monterey, California: Naval Postgraduate School,2018-03)
Due to the emergence of powerful global terrorist organizations such as Al Qaeda and ISIS over the last 15 years, social network analysis is increasingly leveraged by the Department of Defense to develop strategies to ...

A hierarchical multivariate Bayesian approach to ensemble model output statistics in atmospheric prediction 

Wendt, Robert D. T. (Monterey, California: Naval Postgraduate School, 2017-09)
of machine- learning techniques in modern data science—that is, Bayesian parameter estimation with Markov chain Monte Carlo (MCMC) sampling methods—as a compelling new approach to statistical post-processing in atmospheric prediction. 1 B. BACKGROUND Prediction...

Natural language processing of online propaganda as a means of passively monitoring an adversarial ideology 

Holm, Raven R. (Monterey, California: Naval Postgraduate School, 2017-03)
Online propaganda embodies a potent new form of warfare; one that extends the strategic reach of our adversaries and overwhelms analysts. Foreign organizations have effectively leveraged an online presence to influence ...

Software requirement specifications for a social-media threat assessment tool 

Barnett, Craig T. (Monterey, California: Naval Postgraduate School, 2017-12)
Police officers are often the targets of threats, both verbal and written. Twitter and Facebook allow the communications of these threats quickly, anonymously and in high volume. Law enforcement agencies become overwhelmed ...

Automated creation of labeled pointcloud datasets in support of machine learning-based perception 

Watson, Andrew K. (Monterey, California: Naval Postgraduate School, 2017-12)
Autonomous vehicles continue to struggle with understanding their environments and robotic perception remains an active area of research. Machine learning–based approaches to computer vision, particularly the increasing ...

Modeling and optimizing green microgrids at remote U.S. Navy islands 

Kobold, Kyle D. (Monterey, California: Naval Postgraduate School, 2017-12)
This thesis builds upon existing research involving energy microgrid solutions and applies the findings to isolated U.S. Navy locations, specifically, San Nicolas Island. This includes accurately building power system ...

Cybersecurity education for military officers 

Bardwell, Andrew; Buggy, Sean; Walls, Remuis (Monterey, California: Naval Postgraduate School, 2017-12)
Cyber threats are a growing concern for our military, creating a need for cybersecurity education. Current methods used to educate students about cyber, including annual Navy Knowledge Online training, are perceived to be ...

Promotion factors for enlisted infantry Marines 

Steinpfad, Micah A. (Monterey, California: Naval Postgraduate School, 2017-06)
The Marine Corps dedicates itself to ensuring quality retention and promotion. To accomplish this, we must analyze the effects of policy and the quality of Marines currently serving. This thesis considers data from 97,013 ...

Deep learning for media analysis in defense scenarios--an evaluation of an open-source framework for object detection in intelligence-related image sets 

Paul, Taylor H. (Monterey, California: Naval Postgraduate School, 2017-06)
The Department of Defense struggles to develop and maintain cutting-edge software through the Defense Acquisition System. The pace of improvements in machine learning algorithms and software suggests the organization will ...

Spectral LiDAR analysis and terrain classification in a semi-urban environment 

McIver, Charles A. (Monterey, California: Naval Postgraduate School, 2017-03)
Remote-sensing analysis is conducted for the Naval Postgraduate School campus, containing buildings, impervious surfaces (asphalt and concrete), natural ground, and vegetation. Data is from the Optech Titan, providing ...

Systematic assessment of the impact of user roles on network flow patterns 

Dean, Jeffrey S. (Monterey, California: Naval Postgraduate School, 2017-09)
Defining normal computer user behavior is critical to detecting potentially malicious activity. To facilitate this, some anomaly-detection systems group the profiles of users expected to behave similarly, setting thresholds ...

Artificial intelligence: the bumpy path through defense acquisition 

Ehn, Eric J. (Monterey, California: Naval Postgraduate School, 2017-12)
The use of artificial intelligence systems is ready to transition from basic science research and a blooming commercial industry to strategic implementation in the Defense Acquisition system. The purpose of this research ...

Implementing CompStat principles into critical infrastructure protection and improvement 

Molinari, Mark C. (Monterey, California: Naval Postgraduate School, 2016-12)
Roads and bridges, as aspects of transportation that are at the center of critical infrastructure (CI), are central to evacuation and to emergency response. New York City CI needs an accountability and communication model ...

A responsible de-identification of the Real Data Corpus: building a framework for PII management 

An, Johanna (Monterey, California: Naval Postgraduate School, 2016-09)
2.2 Data Science and Big Data . . . . . . . . . . . . . . . . . . . . 11 2.3 PII, PI, and Identifying Information . . . . . . . . . . . . . . . . . 12 2.4 Confidentiality, Integrity, and Availability . . . . . . . . . . . . . . 16 2.5 Risk...

Navy community of practice for programmers and developers 

McFarlane, Cayanne V. (Monterey, California: Naval Postgraduate School,2016-12)
The Navy must employ the talented programmers and developers required to build and maintain its software systems. The establishment of a Navy community of practice (CoP) for programmers and developers can significantly ...

Securing healthcare's quantified-self data: a comparative analysis versus personal financial account aggregators based on Porter's Five Forces Framework for competitive forces 

Chiang, Catherine H. (Monterey, California: Naval Postgraduate School,2016-09)
of multiple 5 Melanie Swan, “The Quantified Self: Fundamental Disruption in Big Data Science and Biological Discovery,” Big Data 1, no. 2 (June 1, 2013): 85–99, doi:10.1089/big.2012.0002. 6 Ibid. 7 Ibid. 8 Ibid. 9 Mario Ballano Barcena, Candid Wueest...

The ethical imperative of reason: how anti-intellectualism, denialism, and apathy threaten national security 

Favre, Greggory J. (Monterey, California: Naval Postgraduate School, 2016-03)
This thesis explores the roots and manifestations of anti-intellectualism, denialism and apathy. Philosophical in its design, this research explores the following question: What are the potential effects of cultural ...

Development of a big data application architecture for Navy Manpower, Personnel, Training, and Education 

Caindoy, Khristian C.; Moazzami, Armin; Santos, Anthony M. (Monterey, California: Naval Postgraduate School, 2016-03)
Navy Manpower, Personnel, Training, and Education (MPTE) decision makers require improved access to the information obtained from the vast amounts of data contained in a number of disparate databases/data stores in order ...