About Me
Industry
I’m currently a 5+ year experienced senior digital design engineer at Huawei Hong Kong Research Center HKRC, specializing in designing the micro-architecture and delivering production-level Verilog code for AI-GPU core used in key products like Huawei Ascend. My experience includes:
- Designing a complete compute-pipeline inside the AI-GPU core, receving instruction from the issue queue as well as retiring the instructions.
- Developing mixed data type ALUs/FPUs for matrix cores operating at GHz speeds.
- Delivering an application specific load/store unit (LSU).
- Mentoring graduate students (from BSc to PhD).
I’m also a research scientist in our group responsible for proposing ideas to improve the PPA (power performance and area) of the next generation AI-GPUs. That’s why I also possess some handfule side skills and knowledges, such as the C++ programming used for CA (cycle-accurate) or functional modeling of the architecture and circuits; Python programming for building the optimizaion algorithms and training neural networks; Verification knowledge and skills such as building UVM and Hector (Formal) verification environment; BES skills like synthesing circuit (DC), running power simulation (PTPX), and uses a little bit Innovous.
Academic
I’m a graduated Mphil (Research Master) student from VLSI Lab, which belongs to the integrated circuit design center (ICDC) in the Department of Electronic and Computer Engineering (ECE), Hong Kong University of Science and Technology (HKUST). My research focus at that time is designing the micro-architecture of the CNN-based AI accelerator, and implement on FPGA and as an ASIC. I was especially focused on improving the fast algorithm to reduce the amount of multiplication required for the compute-intensive CNN computations.
I’m a PhD student in the Reconfigurable Computing System Lab (RCSL), same ICDC, same ECE, and the same HKUST :> But currently I focus more on the architectural optimization and modeling of the AI-GPU, especially about the power management. We saw that very frequently, power is the one who limits the potential of modern GPUs, and this situation is worse with the use of advanced technology node (<5nm), and the 2.5D, 3D IC packaging. We try different ways to mitigate the influence of frequency drop during the maximum power and power throghting senario.