Sitemap
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
publications
A kind of general ghost removing method in moving object segmentation
Published in Invention Pattern CN, 2017
It is that one kind can be combined with multi-motion object detection algorithms the invention belongs to the moving object segmentation field in Digital Image Processing, eliminates the universal method of the ghost produced in video flowing.Background segment before the present invention is carried out to video stream application moving object segmentation algorithm first;Pre-treatment is carried out with expansion algorithm using medium filtering to prospect masking-out again, then convolution is carried out with Scatter operators and prospect masking-out, the moving object foreground picture of connection is broken up with dynamic;The renewal that the update method based on spatial simlanty carries out background model is reused, next frame is finally read in circular treatment.Beneficial technique effect:The present invention has the advantages that quickly ghost can be eliminated in the case where verification and measurement ratio is higher.
Recommended citation: CN107085836A
Download Paper | Download Slides
SparseNN: An energy-efficient neural network accelerator exploiting input and output sparsity.
Published in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017
Contemporary Deep Neural Network (DNN) contains millions of synaptic connections with tens to hundreds of layers. The large computation and memory requirements pose a challenge to the hardware design. In this work, we leverage the intrinsic activation sparsity of DNN to substantially reduce the execution cycles and the energy consumption. An end-to-end training algorithm is proposed to develop a lightweight run-time predictor for the output activation sparsity on the fly. From our experimental results, the computation overhead of the prediction phase can be reduced to less than 5% of the original feedforward phase with negligible accuracy loss. Furthermore, an energy-efficient hardware architecture, SparseNN, is proposed to exploit both the input and output sparsity. SparseNN is a scalable architecture with distributed memories and processing elements connected through a dedicated on-chip network. Compared with the state-of-the-art accelerators which only exploit the input sparsity, SparseNN can achieve a 10%-70% improvement in throughput and a power reduction of around 50%.
Recommended citation: J. Zhu, J. Jiang, X. Chen and C. Tsui, "SparseNN: An energy-efficient neural network accelerator exploiting input and output sparsity," 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 2018, pp. 241-244, doi: 10.23919/DATE.2018.8342010.
Download Paper | Download Slides
Tight compression: Compressing CNN through fine-grained pruning and weight permutation for efficient implementation
Published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)/57th ACM/IEEE Design Automation Conference (DAC), 2020
The unstructured sparsity after pruning poses a challenge to the efficient implementation of deep learning models in existing regular architectures like systolic arrays. The coarse-grained structured pruning, on the other hand, tends to have higher accuracy loss than unstructured pruning when the pruned models are of the same size. In this work, we propose a compression method based on the unstructured pruning and a novel weight permutation scheme. Through permutation, the sparse weight matrix is further compressed to a small and dense format to make full use of the hardware resources. Compared to the state-of-the-art works, the matrix compression rate is effectively improved from 5.88x to 10.28x. As a result, the throughput and energy efficiency are improved by 2.12 and 1.57 times, respectively.
Recommended citation: X. Chen, J. Zhu, J. Jiang and C. -Y. Tsui, "Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient Implementation," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 2, pp. 644-657, Feb. 2023, doi: 10.1109/TCAD.2022.3178047.
Download Paper | Download Slides
Accelerating Large Kernel Convolutions with Nested Winograd Transformation
Published in Book Chapter of 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), 2023
Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4×4 to 31×31 convolutions.
Recommended citation: J. Jiang, X. Chen and C. -Y. Tsui, "Accelerating Large Kernel Convolutions with Nested Winograd Transformation," 2023 IFIP/IEEE 31st International Conference on Very Large Scale Integration (VLSI-SoC), Dubai, United Arab Emirates, 2023, pp. 1-6, doi: 10.1109/VLSI-SoC57769.2023.10321932.
Download Paper | Download Slides
Data-Pattern-Based Predictive On-Chip Power Meter in DNN Accelerator
Published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2024
Advanced power management techniques, such as voltage drop mitigation and fast power management, can greatly enhance energy efficiency in contemporary hardware design. Nevertheless, the implementation of these innovative techniques necessitates accurate and fine-grained power modeling, as well as timely responses for effective coordination with the power management unit. Additionally, existing performance-counter-based and RTL-based on-chip power meters have difficulty in providing sufficient response time for fast power and voltage management scenarios. In this paper, we propose PROPHET, a data-pattern-based power modeling method for multiply-accumulate-based (MACC) DNN accelerators. Our proposed power model extracts the pre-defined data patterns during memory access and then a pre-trained power model can predict the dynamic power of the DNN accelerators. Thus, PROPHET can predict dynamic power and provide sufficient responding time for power management units. In the experiments, we evaluate our predictive power model in four DNN accelerators with different dataflows and data types. In power model training and verification, our proposed data-patterns-based power model can realize the 2-cycle temporal resolution with R2>0.9, NMAE<7%, and the area and power overhead lower than 4.5%.
Recommended citation: J. Peng et al., "Data-Pattern-Based Predictive On-Chip Power Meter in DNN Accelerator," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, doi: 10.1109/TCAD.2024.3412978.
Download Paper | Download Slides
PRINCE Cipher and eFUSE-based Embedded Encryption System for Optical Nerve Stimulator
Published in IEEE Sensors Journal/2022 IEEE International Symposium on Circuits and Systems (ISCAS), 2024
The use of miniaturized wireless biomedical devices is expanding rapidly. Despite their limited size, power, processing, and storage capabilities, these devices must maintain a minimum level of security to protect against constantly evolving threats. This work presents a 16-bit on-chip embedded encryption system for secure data communication and authentication in an implanted optical nerve stimulator, utilizing eFUSE, PRINCE cipher, hash functions, and error detection code. The foundry-provided eFUSE silicon intellectual property (IP) is modified to accommodate different operating modes and further mitigates the supply voltage drop problem during sensing that causes premature resetting. We also propose an area-constrained resource-sharing-based ASIC implementation of the PRINCE cipher architecture without modifying the arithmetic results of the encryption-decryption process. The area and power consumption of the PRINCE cipher are reduced by 2.7 and 5.1 times, respectively, due to modification while maintaining a substantial cipher strength. The proposed cipher performed at par with the existing ASIC implementations of the PRINCE cipher in throughput and area-based form factor. The entire encryption block and the optical nerve stimulator were fabricated in 0.18 μm BCDlite process, and measurement results showed correct encrypted bi-directional data transmission and stimulation when powered by wirelessly transferred power. The developed encrypted implant performed well in comparison to wirelessly powered neural stimulators (implants) and secured wirelessly powered IOT architectures.
Recommended citation: S. Sarkar, J. Jiang, C. -Y. Tsui and W. -H. Ki, "PRINCE Cipher and eFUSE-based Embedded Encryption System for Optical Nerve Stimulator," in IEEE Sensors Journal, doi: 10.1109/JSEN.2024.3439614.
Download Paper | Download Slides