Research Summaries

Back Undiscovered Secrets: Leveraging Lexical Link Analysis (LLA) to Discover New Knowledge Using Open Social Media Data Sources - Phase II

Fiscal Year 2016
Division Graduate School of Operational & Information Sciences
Department Information Sciences
Investigator(s) MacKinnon, Douglas J.
Sponsor Defense Intelligence Agency (DoD)
Summary Data sources for intelligence analysis for situation awareness include disparate real-time sensor and archival sources with multiple dimensions, with very high rates and volumes. Measurement and Signature Intelligence (MASINT) is both scientific and technical intelligence, which are obtained from specific technical sensors. The data sources could be structured data that are of traditional forms (e.g. sensor measurements stored in relational databases, Excel or XML files). They can also be unstructured data including free text, word, pdf, or PowerPoint documents, for example. Cross-examining all the data requires automated tools to maintain situation awareness in real-time. There is a critical need to discover new sources of information from public domains, e.g. from various social media platforms, and then link them with intelligence collected for MASINT/HUMINT applications. Lexical Link Analysis (LLA, see details in Appendix A) is a form of text mining in which word meanings represented in lexical terms (e.g., word pairs) are treated as if they are in a community of a word network. As in military operations, where the term situational awareness is coined, specifically, awareness is defined as the cognitive interface between decision makers and a complex system, expressed in a range of terms or features, or specific vocabulary or lexicon, to describe the attributes and surrounding environment of the system. LLA can provide innovative and automated awareness for analyzing text data and reveal previously unknown, data-driven themed connections. In Phase I of this project, we demonstrated LLA applied to publically available social media data which might be relevant to the MASINT applications to determine who is working on various technologies and get an idea of what they are working on, which would indicate the direction of the research and the organizations, state, private and academia prepared the individual for the work and/or is supporting the work, as we describe as persona archetypes and their deviations for various social actors, e.g. organizations and individuals. Specifically, we studied how LLA can be applied to addressing issues such as who may be an authentic archetype or an imposter of specific personas of interest. For Phase II of the project, we propose to set up a test bed and use broader data collection methods and domains including more public and private data sources to address intelligence applications.
Keywords Big Data Cyber Security Deep Learning cyber situation awareness ogistics
Publications Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal
Data Publications, theses (not shown) and data repositories will be added to the portal record when information is available in FAIRS and brought back to the portal