Research Summaries

Back Data Science and Analytics for Leveraging METOC Data

Fiscal Year 2015
Division Research & Sponsored Programs
Department Naval Research Program
Investigator(s) Das, Arijit
Sponsor NPS Naval Research Program (Navy)
Summary Study Question #1:
The purpose of this study-question is to determine how to represent and organize METOC data and metadata within NTC - i.e., perform the Data Science work required to support METOC data. Specifically that means determining how to organize and represent METOC data and metadata within Accumulo. There are four elements of this study-question. (1) Select a grammar that will be used to represent METOC data and metadata within NTC. (2) Using that grammar define the metadata that will be created for the various METOC data sets. (3) Define the visibility tags for all the METOC data and metadata. (4) Define and implement Domain Models. Entities that are needed to support analytic development. There are a number of data frameworks that could be used to implement the METOC Data Science approach including NSA's GEM, Army's UCD, and SAVA's modified NuWave. Since the Navy is exploring which data framework to standardize around, we will not require the use of a specific data framework at this time. The NPS/FNMOC team will work with ONR and NTC stakeholders to reach an agreement on which data framework to use, prior to proceeding forward on this study-question.
Study Question #2:
The purpose of this study-question is to develop and execute the ingest scripts required to ingest METOC data sets into NTC. The result will be a set of fully ingested METOC data sets that are ready to support the analytic development in study-question #3. METOC data ingest can be performed using either MapReduce jobs or STORM bolts. The decision as to which to use is up to the data ingesters. Typically MapReduce will be used for batch processing of historical data sets and STORM will be used for stream processing of real-time data feeds. In the end state, we will want to continually populate NTC with the latest incoming METOC data. Given the scope/resources of this effort, this may prove to be too difficult or costly. In case where continual ingest of the selected METOC data set is not feasible, it is acceptable for this effort to ingest the METOC data over a finite time span. However, it is important that the NPS/FNMOC team carefully select the time span to be of sufficient duration so that there are adequate amounts of data to support the analytical development objectives.
Study Question #3:
The purpose of this study-question will be to use the METOC data sets ingested in #2 to develop a number of innovative and value-added analytics. Since the target analytic capabilities will not be fully defined until of the end of #1, we cannot speak to the requirements for individual analytics in this study-question description. However, we can call out two important elements of this study-question. First, it is important to note that while the analytic capabilities to be developed should make good use of METOC data, the analytics need not be restricted to using only METOC data. In fact, there are many compelling analytic capabilities that will bring METOC data together with other types of data in ways that heretofore have not been possible. A good example of this would be to combine real-time METOC data with the planned sensor use to provide much more accurate projections of how the atmosphere and/or ocean will impact sensor performance. Second in some cases it will be valuable to combine an analytic capability with a user-facing component (e.g., widget or App) that enables an operator to interact with the
analytic. For analytics that are intended to be used by people, this study-question would include the development of the widgets and/or Apps to work with the analytic, provided they are relatively easy and inexpensive to develop. We want to ensure that the bulk of the resources expended on this study-question go to analytic development, not widget and/or App.
Keywords
Publications Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal
Data Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal