Toan Ong, Michael Mannino, & Dawn Gregg
Electronic Commerce Research and Applications, Vol. 13, Issue 2, March-April 2014, pp. 69-78
This exploratory study investigates the linguistic characteristics of shill reviews and develops a tool for extracting product features from the text of product reviews. Shill reviews are increasingly used to manipulate the reputation of products sold on websites. To overcome limitations of identifying shill reviews, we collected shill reviews as primary data from students posing as shills. Using semi-automated natural language processing techniques, we compared shill reviews and normal reviews on informativeness, subjectivity and readability. The results showed evidence of substantial differences between shill reviews and normal reviews in both subjectivity and readability. Informativeness appears to be a mixed separator of shill and normal reviews so additional studies may be necessary. Overall, the study provides improved understanding of shill reviews and demonstrates a method to extract and classify features from product reviews with an eventual goal to increase effectiveness of review filtering methods.
Yanjuan Yang and Michael V. Mannino
Decision Support Systems, Vol 53 Issue 3, June, 2012, Pages 543-553
To develop a data mining approach for a deception application, data collection costs can be prohibitive because both deceptive data and truthful data are necessary to be collected. To reduce data collection costs, artificially generated deception data can be used, but the impact of using artificially generated deception data is not well understood. To study the relationship between artificial and real deception, this paper presents an experimental comparison using a novel deception generation model. The deception and truth data were collected from financial aid applications, a document centric area with limited resources for verification. The data collection provided a unique data set containing truth, natural deception, and boosted deception. To simulate deception, the Application Deception Model was developed to generate artificial deception in different deception scenarios. To study differences between artificial and real deception, an experiment was performed using deception level and data generation method as factors and directed distance and outlier score as outcome variables. Our results provided evidence of a reasonable similarity between artificial and real deception, suggesting the possibility of using artificially generated deception to reduce the costs associated with obtaining training data.
Michael V Mannino, Elizabeth S Cooperman
Journal of Pension Economics & Finance,Vol. 10, Issue 3, Pages: 457-483.
This study uses a unique data set of retiree characteristics and salary histories for administrators, teachers, and non-professional employees of the Denver Public School Retirement System (DPSRS) to analyze surplus deferred compensation for DPSRS and four state K-12 defined benefit pension plans. We find sizable levels of surplus deferred compensation for each plan, with significant differences across plans, job classes, and age groups. Across plans, differences in cost of living allowances impact the expected present …
Hyo-Jeong Kim, Michael Mannino, and Robert J. Nieschwietz
International Journal of Accounting Information Systems, Vol. 10, Issue 4, pp. 214-228
Although various information technologies have been studied using the technology acceptance model (TAM), the study of acceptance of specific technology features for professional groups employing information technologies such as internal auditors (IA) has been limited. To address this gap, we extended the TAM for technology acceptance among IA professionals and tested the model using a sample of internal auditors provided by the Institute of Internal Auditors (IIA). System usage, perceived usefulness, and perceived ease of use were tested with technology features and complexity. Through the comparison of TAM variables, we found that technology features were accepted by internal auditors in different ways. The basic features such as database queries, ratio analysis, and audit sampling were more accepted by internal auditors while the advanced features such as digital analysis, regression/ANOVA, and classification are less accepted by internal auditors. As feature complexity increases, perceived ease of use decreased so that system usage decreased. Through the path analysis between TAM variables, the results indicated that path magnitudes were significantly changed by technology features and complexity. Perceived usefulness had more influence on feature acceptance when basic features were used, and perceived ease of use had more impact on feature acceptance when advanced features were used.
Michael Mannino, Yanjuan Yang, and Young Ryu
Decision Support Systems Vol. 46, Issue 3, p. 743-751
We present an empirical comparison of classification algorithms when training data contains attribute noise levels not representative of field data. To study algorithm sensitivity, we develop an innovative experimental design using noise situation, algorithm, noise level, and training set size as factors. Our results contradict conventional wisdom indicating that investments to achieve representative noise levels may not be worthwhile. In general, over representative training noise should be avoided while under representative training noise is less of a concern. However, interactions among algorithm, noise level, and training set size indicate that these general results may not apply to particular practice situations.
Nguyen, C., Mannino, M., Gardner, K., and Cios, K.
Journal of Bioinformatics and Computational Biology Vol. 6, Issue 1, p. 203 – 222
We introduce a new hybrid algorithm, ClusFCM, which combines techniques of clustering and fuzzy cognitive maps for prediction of protein function. ClusFCM takes advantage of protein homologies and protein interaction network to improve low recall predictions associated with existing prediction methods. ClusFCM exploits the fact that proteins of known function tend to cluster together and deduce funtions not only through their direct interaction with other known proteins, but also from other proteins in the network. We use ClusFCM to annotate protein functions for cerevisiae (yeast), Caenorhabditis elegans (worm) and Drosophila melanogaster (fly) using protein-protein interaction data from the General Repository for Interaction Datasets (GRID) database and functional labels from Gene Ontology (GO) terms. The algorithm’s performance is compared with four state of the art methods for function prediction – Majority, 2 statistics, Markov random field, and Functional Flow using measures of Matthews correlation coefficient, harmonic mean, and receiver operating characteristic (ROC) curves. The results indicate that ClusFCM predicts protein functions with high recall while not lowering precision.
Injun Choi, Jisoo Jung, Michael Mannino, and Chulsoon Park
Data & Knowledge Engineering, Vol. 66, Issue 2, p. 243-263
BPTrigger is a process-oriented trigger model that provides economy of specification and efficient execution for complex business constraints. An essential part of trigger execution is detection and resolution of cycles. This paper presents an approach to determine the terminability of a cycle introduced by a BPTrigger in a business process and determine whether a cycle is allowable in terms of compensatibility. The foundation of the approach is a set of conditions for cycle termination derived from classifications of business processes by resource usage and activity types by compensation status. This paper formally presents cycle analysis procedures using the notion of cycle analysis graph. Further, a procedure is proposed which checks the terminability of multiple cycles using a composite cycle analysis graph constructed from the cycle analysis graphs of the associated cycles. The paper proves the correctness of the analysis and presents a validation example. The presented results extend some limitations of well-formed sphere which has addressed atomicity of workflow transactions.
Mannino, Michael, Hong, Sa Neung and Choi, In Jun
Decision Support Systems Vol 44, Issue 4, Pages 883-898
We evaluate an efficiency model for data warehouse operations using data from USA and non-USA-based (mostly Korean) organizations. The analysis indicates wide dispersions in operational efficiency, industry and region differences, large differences in labor budgets between efficient and inefficient firms, few organizations efficient in both refresh processing and query production, and difficulty of providing some variables. Follow-up interviews provide insights about the value of efficiency comparisons of information technology organizations and suggestions to improve the model. Using this analysis, we propose a framework containing data warehouse characteristics and firm characteristics to explain IT operational efficiency at the subfirm level.
Michael V. Mannino and Zhiping Walter
Decision Support Systems Vol. 42 Issue 1, p. 121-143
In a field study to explore influences on data warehouse refresh policies, we interviewed data warehouse administrators from 13 organizations about data warehouse details and organizational background. The dominant refresh strategy reported was daily refresh during nonbusiness hours with some deviations due to operational decision making and data source availability. As a result of the study, we developed a framework consisting of short-term and long-term influences of refresh policies along with traditional information system success variables influenced by refresh policies. The framework suggests the need for research about process design, data timeliness valuation, and optimal refresh policy design.