• Online Palette

About Me

Now, I am an assistant professor in School of Intelligence Science and Technology at Nanjing University (Suzhou Campus), and I am also a member of LAMDA Group.

Before that, I obtained my Ph.D. degree from Department of Computer Science and Technology at Nanjing University in September 2022, where I was supervised by Prof. Zhi-Hua Zhou, in LAMDA Group.



Research Interests

My research interests mainly include topics in Machine Learning and Data Mining, especially the Intelligence-Inspired Computing Algorithm and Theory, involving the topics of neural computation, learning theory, and time series analysis.

My Curriculum Vitae: Link

Openinings: Read this page for prospective students.




Correspondings

Emails:
  • zhangsq{at}lamda{dot}nju{dot}edu{dot}cn or zhangsqhndn{at}gmail{dot}com for research-related matters (paper, code, review, etc.)
  • zhangsq{at}nju.edu.cn for teaching-related matters (teaching, admission, hiring, etc.)
Institutions:
  • National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
  • School of Intelligent Science and Technology, Nanjing University, Suzhou 215163, China
Address:
  • Nanjing University Suzhou Campus, No. 1520 Taihu Avenue, Huqiu District, Suzhou 215163, China
  • 江苏省苏州市虎丘区太湖大道1520号南京大学苏州校区(东校区), 邮编: 215163

Research Interests

My research interests concentrate on the Intelligence-Inspired Computing Algorithm and Theory, mainly involving the topics of machine learning, neural computation, learning theory, time series analysis, and logical inference.



Topic 1: Machine Learning & Neural Computation

Our main studies on this topic are summarized in the figure below.

#

Based on the ideas above, I have made some efforts as follows:
  • (A) Biological system colocates the operations with the physical substrate it is processed on. The fundamental computational unit of artificial neural networks is the neuron, corresponding to the cell in biological (nervous) systems. An artificial neuron receives signals from connected neurons, then processes the received signals and generates a signal to other connected neurons. Neurons and edges typically have a weight that adjusts as learning proceeds; the weight increases or decreases the strength of the signal at a connection. Typically, neurons are aggregated into layers, corresponding to the neural circuit. Different layers may perform different transformations on their inputs. Neural operations are the adaptation of the biological system to interact with the environment. The representative operation includes Hebbian rules. Correspondingly, learning algorithms contribute to the neural network to better handle a task by considering sample observations. Learning algorithms involve adjusting the weights (and optional thresholds) of the network to improve the accuracy of the result, which procedure usually is implemented by minimizing the observed errors.

  • (B) A high-level overview of how conventional von Neumann processing isolates the various layers, and how in-memory computing aims to converge these, is depicted in Figure (B). Modern computing, based on the von Neumann architecture, optimizes for generality such that learning algorithms are treated somewhat independently of the hardware they are processed on. We focus on high-performance computing relative to neuromorphic computing.
    • (B1) Lightweight Computations of Deep Neural Networks. The acceleration of neural network learning relies on the discretization of four types of variables, that is, input, weight, neural state, and output. We present early empirical evidence of how artificial neural networks can be discretized to facilitate learning convergence and how this reduces the burden of mixed-signal processing in memristive accelerators. We also aim to overcome several challenges that face the development of memristive accelerators while reducing the adverse impact of limited-precision computation.

    • (B2) Neuromorphic Computing. Low-power biocompatible memristors may enable the construction of artificial neurons that function at voltages of biological action potentials and could be used to directly process bio-sensing signals, for neuromorphic computing and/or direct communication with biological neurons. Processing artificial neural network learning relies heavily on frequent data movement between the processor and memory, and emerging memory technologies that can be directly integrated with advanced CMOS processes offer a promising way to reduce the cost of regular memory access. Most neuromorphic designs and neural network accelerators address this by distributing memory arrays across processing units, which represents a form of near-memory processing. Similarly, in-memory processing physically unites memory and computation within the same substrate and is thought to be analogous to how the brain can both store and operate on information within synapses.

  • (C) Paradigm of neural network learning and Types of artificial neuron models. The neural network learning comprises the neuron model, network architecture, and learning algorithm. Though neural networks have been studied for more than half a century, and various learning algorithms and network architectures have been developed, the modeling of neurons is relatively less considered. The most famous and commonly used formulation of a neuron is the MP neuron model [Mc-Culloch and Pitts, 1943], which formulates the neuron as executing an activation function on the weighted aggregation of signals received from other neurons compared with a threshold. The MP model is very successful though the formulated cell behavior is quite simple. Actual nervous cells are much more complicated, and thus, exploring other bio-plausible formulations with neuronal plasticity is a fundamental and significant problem.
    • (C1) Comprehensive Investigations on Spiking Neural Networks. The spiking neuron model, the computational unit of spiking neural networks (SNNs), takes into account the time of spike firing rather than simply relying on the accumulated signal strength in conventional artificial neural networks, thus offering the potential of temporal and sparse computing. Here, we provide a theoretical framework for investigating the intrinsic structure of spiking neuron models from the perspective of dynamical systems, which exposes the effects of intrinsic structure on approximation power, computational efficiency, and generalization.

    • (C2) Exploring Time-varying Neuron Models. Recently, we proposed a novel bio-plausible neuron model, the Flexible Transmitter (FT) model. The FT model is inspired by the one-way communication neurotransmitter mechanism in nervous systems and mimics long-term synaptic plasticity. In contrast to the MP neuron model (at macroscopic scale) and the spiking neuron model (at microscopic scale), the FT model builds upon the mesoscopic scale and has the formation of a two-variable two-valued function, thus taking the commonly-used MP neuron model as its special case. Besides, the FT model employs an exclusive variable that leads to a local recurrent system, thus having the potential to handle spatio-temporal data. We empirically show its effectiveness on handling spatio-temporal data and present theoretical understandings of the advantages of the FT model.

    • (C3) Low-bit Quantization of Deep Neural Networks, even Large-Language Models. The expanding scale of deep neural networks usually requires higher computational power and larger memory size, thus causing a hardware threshold for developers and efficient manufacturing. Recently, we focus on the cost-effective computing of deep neural networks and resort to the low-bit quantization. The proposed method eliminates numerous multiplication operations, alleviating the computational consumption over those of full-precision formats. The quantized model maintains highly sparse weight matrices and activation rates, thus significantly reducing memory size and computational complexity. The experiments conducted on deep learning models and large-language models demonstrate the effectiveness of our work, which reduces to at most 8.1‰ inference complexity and 7.8 ‰ memory size while maintaining competitive accuracy.

  • (D) Neural selection and competition during memory formation. Neural selection promotes the generation of pseudo circuits from primary circuits, leading to pseudo-groundings; the pseudo-groundings are revised via logical abduction based on minimizing the inconsistency with the Biological Sytem (knowledge base); the abduced circuit competes with the pseudo circuit, which is used to update the selection strategy in the next iteration. Notice the selection and competition here can be extended to the AI tasks such as selective integration and incomplete-information games (ethnic competition, chess and card games, etc.) During this procedure, machine learning takes powerful prediction computing from supervised instances, whereas logical reasoning provides credible support for machine learning. This paradigm works efficiently for tasks of Science Discovery and Gambling.


Topic 2: Deep Learning Theory

Recent years have witnessed an increasing interest and success on deep neural networks. A lot of algorithms and techniques have been developed; however, a theoretical understanding of many aspects of deep neural networks is far from clear. Theoretical characterization of deep neural networks should answer questions about the approximation powers, optimization dynamics, and generalization, especially in overparameterization architectures. Now, I am focusing on the theoretical understanding of deep neural networks in terms of approximation, optimization, and generalization. Especially, I care about the following issues:
  • about Representation Learning. It would be interesting to theoretically study the universal approximation, approximation complexity, and computational efficiency. This topic is concerned with the energy consumption required to use a certain class of neural networks to solve a specific problem, including the number of parameters, computing floating point numbers, running time, etc. In general, the complexity is typically expressed as a function n → f(n) as the number of resources required to run an algorithm generally varies with the size of the input, where n is the size of the input and f(n) is either the worst-case complexity (the maximum of the number of resources that are needed over all inputs of size n) or the average-case complexity (the average of the number of resources over all inputs of size n). The investigation might be a key to understanding the mysteries behind the success of deep neural networks, especially the over-parametric deep learning and large-scale models.

  • about Uncertainty Estimation. Recently, we concern about the uncertainty estimation of generative models and deep neural networks led by stochastic configurations. This investigation involves two steps: uncertainty quantification and reduction, such as using the variance of deep neural kernel for uncertainty estimation.

  • about Generalization. We also focus on the generalization of machine learning and neural network learning. Generalization is the most fundamental issue in machine learning, which refers to a model's ability to apply what it has learned from the training data to unseen data. A model that generalizes well performs accurately on new, previously unseen examples, which is crucial for its practical applicability.


Topic 3: Time Series Analysis

My focus is on how to impute and forecast the discretized, unstructured, and non-stationary sequence data. Especially, I care about the following issues:
  • about Forecasting Algorithm. I am interested in time series forecasting, including Accurate Forecasting, Quantitative Analysis, Uncertainty Estimation, etc.

  • about Forecasting Theory. I also make some efforts on the forecasting theory, like the Predictable PAC Learning theory.

Publications

Remark: * indicates equal contribution; # denotes "I am the corresponding author".


Journals
  • Shao-Qun Zhang, Jia-Yi Chen, Jin-Hui Wu, Gao Zhang, Huan Xiong, Bin Gu, and Zhi-Hua Zhou. On the Intrinsic Structures of Spiking Neural Networks. Journal of Machine Learning Research (JMLR), in press. 2024.
  • Jin-Hui Wu, Shao-Qun Zhang, Yuan Jiang, and Zhi-Hua Zhou. Theoretical Exploration of Flexible Transmitter Model. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 35(3): 3674-3688. 2024. [paper] [bib]
  • Shao-Qun Zhang, Fei Wang, and Feng-Lei Fan. Neural Network Gaussian Processes by Increasing Depth. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 35(2): 2881-2886. 2024. [paper] [code] [bib]
  • Gao Zhang and Shao-Qun Zhang#. Lax Extensions of Conical I-Semifilter Monads. Axioms, 12(11):1034. 2023. [paper] [bib]
  • Gao Zhang and Shao-Qun Zhang#. On Discrete Presheaf Monads. Axioms, 12(6):610. 2023. [paper] [bib]
  • 张绍群, 张钊钰, 姜远, 周志华. 基于误差截尾假设的时间序列预测可学习性理论与算法. 计算机学报, 45(11):2279-2289. 2022. [paper] [bib]
    CORR: Shao-Qun Zhang, Zhao-Yu Zhang, Yuan Jiang, and Zhi-Hua Zhou. Time Series Theory and Algorithm of Predictable Learnability Based on Error Truncation Assumption [in Chinese]. Chinese Journal of Computers, 45(11):2279-2289. 2022.
  • Shao-Qun Zhang, Wei Gao, and Zhi-Hua Zhou. Towards Understanding Theoretical Advantages of Complex-Reaction Networks. Neural Networks (Neural Netw.), 151:80-93. 2022. [paper] [bib]
  • Shao-Qun Zhang, Zhao-Yu Zhang, and Zhi-Hua Zhou. Bifurcation Spiking Neural Network. Journal of Machine Learning Research (JMLR), 22(253):1-21. 2021. [paper] [poster] [code] [bib]
  • Shao-Qun Zhang and Zhi-Hua Zhou. Flexible Transmitter Network. Neural Computation (Neural Comput.), 33(11): 2951–2970. 2021. [paper] [poster] [code] [bib]
  • 张绍群. 基于紧集子覆盖的流形学习算法. 计算机科学, 44(Z6), 88-91. 2017. [paper] [bib]
    CORR: Shao-Qun Zhang. Manifold learning algorithm based on compact set sub-coverage [in Chinese]. Chinese Journal of Computer Science, 44(Z6):88-91. 2017.


Conferences
  • Xiao-Dong Bi, Shao-Qun Zhang#, and Yuan Jiang. MEPSI: An MDL-based Ensemble Pruning Approach with Structural Information. In: Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI'24), pp. 11078-11086. 2024. [paper] [poster] [code] [bib]
  • Jin-Hui Wu, Shao-Qun Zhang, Yuan Jiang, and Zhi-Hua Zhou. Complex-valued Neurons Can Learn More but Slower than Real-valued Neurons via Gradient Descent. In: Advances in Neural Information Processing Systems 36 (NeurIPS'23), pp. 23714-23747. 2023. [paper] [poster] [bib]
  • Qin-Cheng Zheng, Shen-Huan Lve, Shao-Qun Zhang, Yuan Jiang, and Zhi-Hua Zhou. On the Consistency Rate of Decision Tree Learning Algorithms. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS'23), pp. 7824-7848. 2023. [paper] [poster] [code] [bib]
  • Shao-Qun Zhang and Zhi-Hua Zhou. Theoretically Provable Spiking Neural Networks. In: Advances in Neural Information Processing Systems 35 (NeurIPS'22), pp. 19345-19356. 2022. [paper] [poster] [code] [bib]
  • Zhao-Yu Zhang, Shao-Qun Zhang, Yuan Jiang, and Zhi-Hua Zhou. LIFE: Learning Individual Features for Multivariate Time Series Prediction with Missing Values. In: Proceedings of the 21st International Conference on Data Mining (ICDM'21), pp. 1511-1516. 2021. [paper] [code] [bib]
  • Shao-Qun Zhang and Zhi-Hua Zhou. Harmonic recurrent process for time series forecasting. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI'20), pp.1714-1721. 2020. [paper] [code] [bib]


Manuscripts
  • Qi-Jie Li, Qian Sun, and Shao-Qun Zhang#. Horizon-wise Learning Paradigm Promotes Gene Splicing Identification. 2024. [arXiv:2406.11900] [bib]
  • Shao-Qun Zhang, Zong-Yi Chen*, Yong-Ming Tian*, and Xun Lu*. A Unified Kernel for Neural Network Learning. 2024. [arXiv:2403.17467] [bib]
  • Shao-Qun Zhang and Zhi-Hua Zhou. ARISE: ApeRIodic SEmi-parametric Process for Efficient Markets without Periodogram and Gaussianity Assumptions. 2021. [arXiv:2111.06222] [bib]
  • Gao Zhang, Jin-Hui Wu, and Shao-Qun Zhang#. On the Approximation and Complexity of Deep Neural Networks to Invariant Functions. 2022. [arXiv:2210.15279] [paper] [bib]
  • Shao-Qun Zhang, Jin-Hui Wu, Gao Zhang, Huan Xiong, Bin Gu, and Zhi-Hua Zhou. On the Generalization of Spiking Neural Networks via Minimum Description Length and Structural Stability. 2023. [arXiv:2207.04876] [paper] [bib]

Teaching



Current


Previous
  • Probability and Statistics for Undergraduate Students at Nan University (NJU); Fall, 2023.
  • Time Series Analysis for Undergraduate Students at Sichuan University (SCU); Spring, 2017.
  • Linear Algebra for Undergraduate Students at Sichuan University (SCU); Full, 2016 and Full, 2017.
  • Calculus for Undergraduate Students at Sichuan University (SCU); Spring, 2015 - Spring, 2018, per semester.


Teaching Assistants

  • Game Theory with Assoc. Prof. Wei Gao; for Graduate and Undergraduate Students at Nanjing University (NJU); Spring, 2020.
  • Mathematical Analysis with Assoc. Prof. Hong-Jun Fan; for Undergraduate Students at Nanjing University (NJU); Full, 2019.

Activities

My Talks
  • Give a talk about "On the Expressivity of Spiking Neural Networks via Description Languages" in The Chinese University of Hong Kong (CUHK) at August, 2023. [poster]
  • Give a talk about "Long-term Time Series Forecasting" in Chaspark at April, 2023.
  • Give a spotlight talk "Bifurcation Spiking Neural Network" in Jiangsu Artificial Intelligence Academic Conference (江苏省人工智能学术会议2022) at November, 2022. [poster] [slides]
  • Give a talk "Introduction: Theoretical Understanding of Deep Neural Networks" in Tianyuan Mathematical Center in Southwest China at March, 2022. [poster] [slides] [video]
  • Give a talk "Investigation of long-term memory without Periodogram and Gaussianity" in School of Harbin Institute of Technology Institute for Artificial Intelligence (HIT) at November, 2021. [poster] [slides]

 

Academic Service

 

 

 

Area Chair
  • NeurIPS 2024

SPC members
  • AAAI 2021
  • IJCAI 2021

PC members
  • AAAI (2019-2024)
  • AISTATS (2022-2023)
  • ECAI(2020-2024, per two years)
  • ICLR(2021-2024)
  • ICML(2019-2024)
  • IJCAI(2019-2024)
  • NeurIPS(2019-2024)
  • PAKDD(2022)
  • UAI(2022-2024)
Journal reviewers
  • Artificial Intelligence (AIJ)
  • Chinese Journal of Electronics (CJE)
  • Fundamental Research
  • Nature
  • Machine Learning (MLJ)
  • Scientific Reports
  • IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI)
  • ACM Transactions on Knowledge Discovery from Data (TKDD)
  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • World Journal of Surgical Oncology (WJSO)