Topic 1: Machine Learning & Neural Computation
Our main studies on this topic are summarized in the figure below.
Based on the ideas above, I have made some efforts as follows:
- (A) Biological system colocates the operations with the physical substrate it is processed on. The fundamental computational unit of artificial neural networks is the neuron, corresponding to the cell in biological (nervous) systems. An artificial neuron receives signals from connected neurons, then processes the received signals and generates a signal to other connected neurons. Neurons and edges typically have a weight that adjusts as learning proceeds; the weight increases or decreases the strength of the signal at a connection. Typically, neurons are aggregated into layers, corresponding to the neural circuit. Different layers may perform different transformations on their inputs. Neural operations are the adaptation of the biological system to interact with the environment. The representative operation includes Hebbian rules. Correspondingly, learning algorithms contribute to the neural network to better handle a task by considering sample observations. Learning algorithms involve adjusting the weights (and optional thresholds) of the network to improve the accuracy of the result, which procedure usually is implemented by minimizing the observed errors.
- (B) A high-level overview of how conventional von Neumann processing isolates the various layers, and how in-memory computing aims to converge these, is depicted in Figure (B). Modern computing, based on the von Neumann architecture, optimizes for generality such that learning algorithms are treated somewhat independently of the hardware they are processed on. We focus on high-performance computing relative to neuromorphic computing.
- (B1) Lightweight Computations of Deep Neural Networks. The acceleration of neural network learning relies on the discretization of four types of variables, that is, input, weight, neural state, and output. We present early empirical evidence of how artificial neural networks can be discretized to facilitate learning convergence and how this reduces the burden of mixed-signal processing in memristive accelerators. We also aim to overcome several challenges that face the development of memristive accelerators while reducing the adverse impact of limited-precision computation.
- (B2) Neuromorphic Computing. Low-power biocompatible memristors may enable the construction of artificial neurons that function at voltages of biological action potentials and could be used to directly process bio-sensing signals, for neuromorphic computing and/or direct communication with biological neurons. Processing artificial neural network learning relies heavily on frequent data movement between the processor and memory, and emerging memory technologies that can be directly integrated with advanced CMOS processes offer a promising way to reduce the cost of regular memory access. Most neuromorphic designs and neural network accelerators address this by distributing memory arrays across processing units, which represents a form of near-memory processing. Similarly, in-memory processing physically unites memory and computation within the same substrate and is thought to be analogous to how the brain can both store and operate on information within synapses.
- (C) Paradigm of neural network learning and Types of artificial neuron models. The neural network learning comprises the neuron model, network architecture, and learning algorithm. Though neural networks have been studied for more than half a century, and various learning algorithms and network architectures have been developed, the modeling of neurons is relatively less considered. The most famous and commonly used formulation of a neuron is the MP neuron model [Mc-Culloch and Pitts, 1943], which formulates the neuron as executing an activation function on the weighted aggregation of signals received from other neurons compared with a threshold. The MP model is very successful though the formulated cell behavior is quite simple. Actual nervous cells are much more complicated, and thus, exploring other bio-plausible formulations with neuronal plasticity is a fundamental and significant problem.
- (C1) Comprehensive Investigations on Spiking Neural Networks. The spiking neuron model, the computational unit of spiking neural networks (SNNs), takes into account the time of spike firing rather than simply relying on the accumulated signal strength in conventional artificial neural networks, thus offering the potential of temporal and sparse computing. Here, we provide a theoretical framework for investigating the intrinsic structure of spiking neuron models from the perspective of dynamical systems, which exposes the effects of intrinsic structure on approximation power, computational efficiency, and generalization.
- (C2) Exploring Time-varying Neuron Models. Recently, we proposed a novel bio-plausible neuron model, the Flexible Transmitter (FT) model. The FT model is inspired by the one-way communication neurotransmitter mechanism in nervous systems and mimics long-term synaptic plasticity. In contrast to the MP neuron model (at macroscopic scale) and the spiking neuron model (at microscopic scale), the FT model builds upon the mesoscopic scale and has the formation of a two-variable two-valued function, thus taking the commonly-used MP neuron model as its special case. Besides, the FT model employs an exclusive variable that leads to a local recurrent system, thus having the potential to handle spatio-temporal data. We empirically show its effectiveness on handling spatio-temporal data and present theoretical understandings of the advantages of the FT model.
- (C3) Low-bit Quantization of Deep Neural Networks, even Large-Language Models. The expanding scale of deep neural networks usually requires higher computational power and larger memory size, thus causing a hardware threshold for developers and efficient manufacturing. Recently, we focus on the cost-effective computing of deep neural networks and resort to the low-bit quantization. The proposed method eliminates numerous multiplication operations, alleviating the computational consumption over those of full-precision formats. The quantized model maintains highly sparse weight matrices and activation rates, thus significantly reducing memory size and computational complexity. The experiments conducted on deep learning models and large-language models demonstrate the effectiveness of our work, which reduces to at most 8.1‰ inference complexity and 7.8 ‰ memory size while maintaining competitive accuracy.
- (D) Neural selection and competition during memory formation. Neural selection promotes the generation of pseudo circuits from primary circuits, leading to pseudo-groundings; the pseudo-groundings are revised via logical abduction based on minimizing the inconsistency with the Biological Sytem (knowledge base); the abduced circuit competes with the pseudo circuit, which is used to update the selection strategy in the next iteration. Notice the selection and competition here can be extended to the AI tasks such as selective integration and incomplete-information games (ethnic competition, chess and card games, etc.) During this procedure, machine learning takes powerful prediction computing from supervised instances, whereas logical reasoning provides credible support for machine learning. This paradigm works efficiently for tasks of Science Discovery and Gambling.
Topic 2: Deep Learning Theory
Recent years have witnessed an increasing interest and success on deep neural networks. A lot of algorithms and techniques have been developed; however, a theoretical understanding of many aspects of deep neural networks is far from clear. Theoretical characterization of deep neural networks should answer questions about the approximation powers, optimization dynamics, and generalization, especially in overparameterization architectures. Now, I am focusing on the theoretical understanding of deep neural networks in terms of approximation, optimization, and generalization. Especially, I care about the following issues:
- about Representation Learning. It would be interesting to theoretically study the universal approximation, approximation complexity, and computational efficiency. This topic is concerned with the energy consumption required to use a certain class of neural networks to solve a specific problem, including the number of parameters, computing floating point numbers, running time, etc. In general, the complexity is typically expressed as a function n → f(n) as the number of resources required to run an algorithm generally varies with the size of the input, where n is the size of the input and f(n) is either the worst-case complexity (the maximum of the number of resources that are needed over all inputs of size n) or the average-case complexity (the average of the number of resources over all inputs of size n). The investigation might be a key to understanding the mysteries behind the success of deep neural networks, especially the over-parametric deep learning and large-scale models.
- about Uncertainty Estimation. Recently, we concern about the uncertainty estimation of generative models and deep neural networks led by stochastic configurations. This investigation involves two steps: uncertainty quantification and reduction, such as using the variance of deep neural kernel for uncertainty estimation.
- about Generalization. We also focus on the generalization of machine learning and neural network learning. Generalization is the most fundamental issue in machine learning, which refers to a model's ability to apply what it has learned from the training data to unseen data. A model that generalizes well performs accurately on new, previously unseen examples, which is crucial for its practical applicability.
Topic 3: Time Series Analysis
My focus is on how to impute and forecast the discretized, unstructured, and non-stationary sequence data. Especially, I care about the following issues:
- about Forecasting Algorithm. I am interested in time series forecasting, including Accurate Forecasting, Quantitative Analysis, Uncertainty Estimation, etc.
- about Forecasting Theory. I also make some efforts on the forecasting theory, like the Predictable PAC Learning theory.