Speaker
Description
This project investigates the use of FPGA-based acceleration for artificial intelligence workloads using a PYNQ-enabled Zynq system-on-chip platform. The work focuses on designing and deploying a custom hardware accelerator for low-precision matrix multiplication, a core computational kernel underlying many modern AI algorithms including convolutional neural networks and transformer models. Using a hardware–software co-design approach, the accelerator is implemented in programmable logic and integrated with the processing system via AXI interfaces and direct memory access (DMA). High-level synthesis is employed to explore architectural optimizations such as loop unrolling, pipelining, and on-chip buffering, enabling efficient data reuse and increased throughput. Performance is evaluated by comparing FPGA-accelerated execution against CPU-based baselines, with particular emphasis on the impact of data movement, memory bandwidth, and quantization on overall system efficiency. The project demonstrates how FPGA architectures can be leveraged to tailor computation and dataflow to AI workloads, providing insight into the tradeoffs between performance, resource utilization, and precision in edge-oriented AI acceleration.
| Academic or Professional Status | Undergraduate Student |
|---|