Home Page
bulete Graduate School of Operational and Information Sciences
bulete Department of Computer Science
bulete Administration
bulete Program Officer
bulete Curriculum
bulete Faculty
bulete Faculty Openings
bulete

Research

bulete Publications
bulete Thesis Projects
bulete Thesis Resources
bulete Partners
bulete Contact the CS Department
   
NPS Student at Laptop
GSOIS CS
Home >> Academics >> GSOIS >>  Computer Science >> Events >> Event Details

Using Text Categorization to Distinguish Predators from Victims

1/31/2008: 15:00—15:50 at Glasgow East 117

Nick Pendar, Applied Linguistics & Technology/Human Computer Interaction, Iowa State University, will speak in this CS Department Seminar.

The Internet has forever changed how almost everyone lives, works, and plays; and that includes criminals. Of particular concern to this work are on-line pedophiles who meet underage victims on-line, engage in sexually explicit chat with them, and eventually convince the children to meet them in person. There is now a great need for software applications that can flag suspicious on-line chats automatically.

This talk presents the results of a pilot study on using automatic text categorization techniques in identifying on-line sexual predators. We acquired a two-million word sample of predator/pseudo-victim (volunteers posing as underage victims) on-line text chats, and used standard text categorization techniques to identify the two sides. The question was given that the two sides of the conversation are on the same topic, is it possible to distinguish the two sides of such conversations automatically. A favorable result would mean that a conversation can potentially be flagged as suspicious if it can be said to include a likely child and a likely sexual predator.

We report on our SVM and k-NN models. Our distance weighted k-NN classifier reaches an f-measure of 0.943 on test data distinguishing the child and the victim sides of text chats between sexual predators and pseudo-victims. The less aggressive predators were more likely to be mistaken for a victim. Given these promising results, we will next have to experiment with some negative data to see if it is possible to identify suspicious text chats from among those that do not involve sexual predators.

Speaker Bio: Nick Pendar received his Ph.D. from the University of Toronto in 2005 and joined the Applied Linguistics & Technology, and Human Computer Interaction programs at Iowa State University as an Assistant Professor in the same year. His specialty is computational linguistics and natural language processing, and his research interests involve the combination of statistical natural language processing and symbolic computational linguistic techniques in such applications as text categorization, automated essay scoring, and computer-assisted language learning.

Please contact Prof. Squire with questions.

View the flyer here.