PAKDD19 Workshop on
“Weakly Supervised Learning: Progress and Future” WeL’19


Machine learning has achieved great success in various tasks, particularly in supervised learning tasks such as classification and regression. Typically, predictive models are learned from a training data set that contains a large amount of training examples, each corresponding to an event/object. A training example consists of two parts: a feature vector (or instance) describing the event/object, and a label indicating the ground-truth output. In classification, the label indicates the class to which the training example belongs; in regression, the label is a real-value response corresponding to the example. Most successful techniques, such as deep learning, require ground-truth labels to be given for a big training data set. In many tasks, however, it can be difficult to attain strong supervision information due to the high cost of the data-labeling process. Thus, it is highly desirable for machine learning techniques to be able to work with weak supervision data.


The aim of the workshop is to highlight the current research related to weakly supervised learning techniques in different types of weak supervision and their applications in real problems. The workshop will also emphasize a discussion for the major challenges for the future of weakly supervised learning and provide an opportunity to researchers for related fields such as optimization, statistical learning to get a feedback from other community.

The workshop will highlight a growing area of weakly supervised learning. Techniques based on this area have been studied substantially for different application areas in the previous decade. Knowledge Discovery and Data mining has been one of the fastest growing application areas of these techniques. PAKDD 2019 being a major data mining conference will enable the data mining researchers to look at the some of the high quality work presented in the workshop and the potential future benefits of these techniques. On the other hand presenters in the workshop will get an opportunity to get a critical view on their work form other community. The workshop will bring these communities together and will be helpful to increase the number of attendees of the conference and diversify the research in the future.

Invited Speakers

Ming Li: Prof. Ming Li is currently an professor with the National Key Laboratory of Novel Software Technology, Nanjing University. His major research interests include machine learning and data mining, especially on software mining. He has published over 30 papers on refereed international journals including IEEE Trans. Knowledge and Data Engineering, Automated Software Engineering Journal, Software: Practice & Experience, and top conferences including IJCAI, ICML, etc. He has served as the senior PC member of the premium conferences in artificial intelligence such as IJCAI and AAAI, and PC members for other premium conferences such as KDD, NIPS, ACMMM, ICDM, etc., and he is the chair of the 1st – 6th International Workshop on Software Mining. He has served as the associate editor (junior) for Frontiers of Computer Science and editorial board member for International Journal of Data Warehousing and Mining. He is the executive board member of ACM SIGKDD China Chapter. He has been granted various awards including the Excellent Youth Award from NSFC, the New Century Excellent Talents program of the Education Ministry of China, the CCF Distinguished Doctoral Dissertation Award, and Microsoft Fellowship Award, etc.

Quanming Yao: Dr.Quanming Yao is currently a leading researcher in 4Paradigm and managing the company's machine learning research group. He obtained his Ph.D. degree at the Department of Computer Science and Engineering at Hong Kong University of Science and Technology (HKUST) in 2018 and received his bachelor degree at HuaZhong University of Science and Technology (HUST) in 2013. He is Qiming Star (HUST, 2012), Tse Cheuk Ng Tai Research Excellence Prize (CSE, HKUST, 2014-2015), Google Fellowship (machine learning, 2016) and Ph.D. Research Excellence Award (School of Engineering, HKUST, 2018-2019). He has 23 top-tier journal and conference papers, including ICML, NeurIPS, JMLR, TPAMI, KDD, ICDE, CVPR, IJCAI, and AAAI; he was an outstanding reviewer of Neurocomputing in 2017; served as program committee of many prestigious conferences, including ICML, NeurIPS, CVPR, AAAI, and IJCAI; one of the committees of AutoML competition in NeurIPS-2018, IJCNN-2019.


Date: 13:30 – 15: 40, April 14, 2019 (Sunday)
Location: Room 7304, The Conference Venue (The Parisian Macao)

13:30-13:35 Opening Remark by Dr.Yu-Feng Li and Sheng-Jun Huang
13:35-14:15 Keynote Speech by Prof.Ming Li
Title: Learning To Locate Software Bugs
Abstract: Software Systems are becoming larger and more complex, which places a big challenge on software quality assurance because it is almost infeasible to conduct extensive code inspection or testing for every software module. Thus, software systems are usually shipped with bugs. Locating the buggy software modules effectively may help to improve the quality of the software system as well as the productivity of the developers. A plenty of models and approaches have been proposed for locating software bugs. However, some of them may not fully capture the data properties of software. In this talk, we will discuss several attempts to address the problem of locating software bugs from a machine learning perspective, where the properties of software data are carefully considered.
14:15-14:40 Oral Presentation
The Most Related Knowledge First: A Progressive Domain Adaptation Method
by Yunyun Wang, Dan Zhao, Yun Li, Kejia Chen and Hui Xue
Adversarial Active Learning in the Presence of Weak and Malicious Oracles
by Yan Zhou, Murat Kantarcioglu and Bowei Xi
14:40-15:20 Keynote Speech by Dr. Quanming Yao
Title: Robust Learning from Noisy labels
Abstract: Due to the vast learning capacity of deep models, big and high-quality data is almost a must for an excellent predicting performance. However, such data is difficult and expensive to acquire. Usually, there are be noisy in the collected labels.   In this talk, we will present our  Co-teaching approach, which is built upon stochastic optimization and memorization effects of deep networks, to learn from such noisy labels robustly. Specifically, we train two networks simultaneously and let them teach each other given every mini-batch. Firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back-propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10, and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of learning from noisy labels.
15:20-15:40 Oral Presentation
Learning a Semantic Space for Modeling Images, Tags and Feelings in Cross-Media Search
by Sadaqat Ur Rehman, Yongfeng Huang and Shanshan Tu
Weakly Supervised Learning by a Confusion Matrix of Contexts
by William Wu
15:40 Concluding Remark


The research calls for high quality research papers outlining current research, literature surveys, theoretical and empirical studies, and other relevant work including but not limited to the following areas:

  • Supervision is incomplete
    • Active learning
    • Semi-supervised learning (including transductive learning)
    • Transfer learning (including domain adaptation)
    • Few-shot learning (including one-shot, zero-shot learning)
  • Supervision is inexact
    • Partial label learning
    • Multi-instance learning
    • Multi-instance multi-label learning
  • Supervision is inaccurate
    • Label distribution learning
    • Crowdsourcing
    • Noise-label learning
  • Weak supervision in dynamic environment
    • New class label
    • New feature set
    • New objective
    • Data distribution change
  • Applications
    • Computer Vision
    • Natural Language Processing
    • Video/Audio Classification
  • Novel Problems and Settings
    • Self-paced learning
    • Many others
  • Learning Theory
    • Generalization
    • Consistency
    • Convergence
    • Complexity

  • Organization Committee


    School of Artifical Intelligence, Nanjing University