Research Summaries

Back Big Data: A Low Cost Alternative to Data Warehousing

Fiscal Year 2015
Division Research & Sponsored Programs
Department Naval Research Program
Investigator(s) Das, Arijit
Sponsor NPS Naval Research Program (Navy)
Summary Any workplace with has IT (in our case Navy experimentation/Trident Warrior), leads to generation of data which can be structured (20%) and unstructured (80%). Most structured data are well handled by Relational Databases (coupled with Data Warehouse and BI tools), but analysis and storage of the unstructured data is usually ignored due to lack of cost effective tools and techniques. The question of how to handle this (unstructured) data is what this research seeks to answer (specific to Marine Core sponsor command needs). The research also addresses how to train command personnel to effectively use these new methods of data analysis without having to go through a full fledged Data Science degree program. 2 NPS students will leverage this research for their MS these degree requirements.
The effective handling of voluminous amount of Internet data was tackled by Google using GFS (Google File System), which led to an open source version known as HDFS (Hadoop File System). The attractiveness of this architecture is that commodity hardware can be stitched together via an Open-Source operating system (HDFS) which handles fault tolerance and redundancy. The challenge lies in using Java Map-Reduce software to write algorithms to process the unstructured data/glean a meaningful subset, which then moves to a NOSQL database where SQL queries can be used to further fine tune the results. Finally a much smaller set of the data is loaded into a Relational Database like Oracle 11G (or 12C) where B1 software tolls can do the detailed analysis.
Keywords
Publications Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal
Data Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal