We propose WSAUC, a unified and robust AUC optimization framework for weakly supervised AUC optimization. The framework covers multiple scenarios including noisy labeled AUC optimization, positive-unlabeled AUC optimization, multi-instance AUC optimization, and semi-supervised AUC optimization with or without noise. The framework achieves robust AUC optimization through a novel variety of AUC, i.e., rpAUC. Theorical and empirical results validate the effectiveness of the framework.
ICDM
Beyond Lexical Consistency: Preserving Semantic Consistency for Program Translation
Yali Du, Yi-Fan Ma,
Zheng Xie, and Ming Li
In The 23rd IEEE International Conference on Data Mining, 2023.
Program translation aims to convert the input programs from one programming language to another. Automatic program translation is a prized target of software engineering research, which leverages the reusability of projects and improves the efficiency of development. Recently, thanks to the rapid development of deep learning model architectures and the availability of large-scale parallel corpus of programs, the performance of program translation has been greatly improved. However, the existing program translation models are still far from satisfactory, in terms of the quality of translated programs. In this paper, we argue that a major limitation of the current approaches is lack of consideration of semantic consistency. Beyond lexical consistency, semantic consistency is also critical for the task. To make the program translation model more semantically aware, we propose a general framework named Preserving Semantic Consistency for Program Translation (PSCPT), which considers semantic consistency with regularization in the training objective of program translation and can be easily applied to all encoder-decoder methods with various neural networks (e.g., LSTM, Transformer) as the backbone. We conduct extensive experiments in 7 general programming languages. Experimental results show that with CodeBERT as the backbone, our approach outperforms not only the state-of-the-art open-source models but also the commercial closed large language models (e.g., text-davinci-002, text-davinci-003) on the program translation task. Our replication package (including code, data, etc.) is publicly available at https://github.com/duyali2000/PSCPT .
AAAI
Cooperative and Adversarial Learning: Co-Enhancing Discriminability and Transferability in Domain Adaptation
Hui Sun,
Zheng Xie, Xin-Ye Li, and Ming Li
In The 37th AAAI Conference on Artificial Intelligence, 2023.
We propose the CALE framework to unify and enhance the two main objectives of domain adaptation: discriminability and transferability. To achieve this, CALE swaps the cooperative examples of the two objectives, enabling the learning of discriminability and transferability to mutually benefit each other. Additionally, adversarial examples are utilized to enhance the robustness of the two objectives themselves. The framework can be applied to improve current domain adaptation approaches and has been shown to outperform existing state-of-the-art methods.
AAAI
Semi-Supervised Learning with Support Isolation by Small-Paced Self-Training
Zheng Xie, Hui Sun, and Ming Li
In The 37th AAAI Conference on Artificial Intelligence, 2023.
In this paper, we address a special scenario of semi-supervised learning, where the label missing is caused by a preceding filtering mechanism, i.e., an instance can enter a subsequent process in which its label is revealed if and only if it passes the filtering mechanism. The rejected instances are prohibited to enter the subsequent labeling process due to economical or ethical reasons, making the support of the labeled and unlabeled distributions isolated from each other. In this case, classical semi-supervised learning approaches are prone to fail. We propose a SmallPaced Self-Training framework, which iteratively discovers labeled and unlabeled instance subspaces with bounded Wasserstein distance. We theoretically prove that such a framework may achieve provably low error on the pseudo labels during learning, and validate the approach through experiments.
2018
IJCAI
Cutting the Software Building Efforts in Continuous Integration by Semi-Supervised Online AUC Optimization
Zheng Xie, and Ming Li
In The 27th International Joint Conference on Artificial Intelligence, 2018.
In this paper, we propose a semi-supervised online AUC optimization algorithm, namely SOLA. This algorithm is suitable for tasks that suffers from streaming data, label scarce, and imbalance. The algorithm is used for solving build outcome prediction in software continuous integration, and achieves superior performance.
AAAI
Semi-Supervised AUC Optimization without Guessing Labels of Unlabeled Data
Zheng Xie, and Ming Li
In The 32nd AAAI Conference on Artificial Intelligence, 2018.
We prove the theoretical property of AUC optimization under semi-supervised learning and positive-unlabeled learning scenarios, and propose a simple yet effective algorithm for semi-supervised and positive-unlabeled AUC optimization. Our algorithm outperforms elaborated approaches on semi-supervised and positive-unlabeled AUC optimization approaches.
2017
JOS
Cost-Sensitive Margin Distribution Optimization for Software Bug Localization
Software bug localization problem suffers from data imbalance and heterogeneous code and natural language structure. To tackle this problem, we propose cost-sensitive margin distribution optimization method to enhance the classification tasks under imbalanced scenario, and design a network architecture for processing programming and natural language. Experimental results validates the effectiveness of our method.
CCML
Cost-Sensitive Margin Distribution Optimization for Software Bug Localization
Software bug localization problem suffers from data imbalance and heterogeneous code and natural language structure. To tackle this problem, we propose cost-sensitive margin distribution optimization method to enhance the classification tasks under imbalanced scenario, and design a network architecture for processing programming and natural language. Experimental results validates the effectiveness of our method.
ICMC
Music Style Analysis among Haydn, Mozart and Beethoven: an Unsupervised Machine Learning Approach
Ru Wen,
Zheng Xie, Kai Chen, Ruoxuan Guo, Kuan Xu, Wenmin Huang, Jiyuan Tian, and Jiang Wu
In The 43rd International Computer Music Conference, 2017.
We propose an unsupervised music analysis method. We propose a feature extraction method for extracting consecutive note pitch patterns, and use clustering methods for mining the music styles. We apply our method on new built corpus of Haydn, Mozart, and Beethoven. Our discovered pattern fits the Implication-Realization theory, which confirms the validity of our approach.