Tag Archives: Tang

Mann–Whitney test with adjustments to pretreatment variables for missing values and observational study

Song Xi Chen, Jing Qin, and Cheng Yong Tang
Journal of the Royal Statistical Society,  Jan. 2013, Vol. 75 Issue 1, pp. 81-102

The conventional Wilcoxon or Mann–Whitney test can be invalid for comparing treatment effects in the presence of missing values or in observational studies. This is because the missingness of the outcomes or the participation in the treatments may depend on certain pretreatment variables.We propose an approach to adjust the Mann–Whitney test by correcting the potential bias via consistently estimating the conditional distributions of the outcomes given the pretreatment variables.We also propose semiparametric extensions of the adjusted Mann-Whitney test which lead to dimension reduction for high dimensional covariates. A novel bootstrap procedure is devised to approximate the null distribution of the test statistics for practical implementations. Results from simulation studies and an economics observational study data analysis are presented to demonstrate the performance of the approach proposed.

Sparse Matrix Graphical Models

Chenlei Leng and Cheng Yong Tang
Journal of the American Statistical Association
. 107 1187-1200.

Matrix-variate observations are frequently encountered in many contemporary statistical problems due to a rising need to organize and analyze data with structured information. In this article, we propose a novel sparse matrix graphical model for these types of statistical problems. By penalizing, respectively, two precision matrices corresponding to the rows and columns, our method yields a sparse matrix graphical model that synthetically characterizes the underlying conditional independence structure. Our model is more parsimonious and is practically more interpretable than the conventional sparse vector-variate graphical models. Asymptotic analysis shows that our penalized likelihood estimates enjoy better convergent rates than that of the vector-variate graphical model. The finite sample performance of the proposed method is illustrated via extensive simulation studies and several real datasets analysis

An efficient empirical likelihood approach for estimating equations with missing data

Cheng Yong Tang and Yongsong Qin
Biometrika (2012), pp. 1–7

We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semi-parametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis.

Tuning Parameter Selection in High‐Dimensional Penalized Likelihood

Yingying Fan and Cheng Yong Tang
Journal of the Royal Statistical Society, In Press

Determining how to appropriately select the tuning parameter is essential in penalized likelihood methods for high-dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion (GIC) with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in GIC. We find that this model complexity penalty should diverge at the rate of some power of log p depending on the tail probability behavior of the response variables. This reveals that using the AIC or BIC to select the tuning parameter may not be adequate for consistently identifying the true model. Based on our theoretical study, we propose a uniform choice of the model complexity penalty and show that the proposed approach consistently identifies the true model among candidate models with asymptotic probability one. We justify the performance of the proposed procedure by numerical simulations and a gene expression data analysis.