top of page

Pedro Gimenes

Electronics Engineer

  • LinkedIn


I am a recently-graduated Electrical and Electronic Engineering (EEE) student from Imperial College London, with interests in Digital Hardware Design and Deep Learning.

During my 3rd year at Imperial, I undertook a 6-month internship in RTL Design in the GPU Design Center at Apple. I also hold previous experience as a Part-Time Undergraduate in the GPU group at Arm.

In my 4th year, I undertook my Final Year Project titled "Bit-Level Manipulated Graph Neural Networks", which involved implementing an FPGA accelerator for Graph Neural Networks. You can find details in the Projects section.

In my free time, I enjoy learning languages and climbing.


Work Experience

My industry experience in Hardware Engineering began as a Part-Time Undergraduate (PTUG) in the GPU Group at Arm. I worked full-time in the Debug Infrastructure team for 3 months, followed by a further 6 months working part-time. During this time, I was responsible for extending software libraries used internally for hardware debug purposes. This required me to gain a comprehensive understanding of the Mali GPU architecture. I also became accustomed to Python development in an Agile environment.


Between April and September 2022, I continued to pursue my interest in Hardware Engineering by undertaking my 6-month Industrial Placement as an RTL Design Intern in the GPU Group at Apple. My role involved analysing architectural specifications for new GPU features and formulating microarchitectures that achieve those desired features, under performance, power and area constraints. I also implemented these microarchitectures at the Register Transfer Level (RTL) using Hardware Description Languages (HDL).

Work Experience

Degree Overview

The MEng Electrical and Electronic Engineering course at Imperial College offers students a broad introduction to various technical fields, and thorough training in the fundamental competencies required by modern-day engineers. This is a 4-year integrated Master’s course accredited by the Institution of Engineering and Technology (IET).

The first two years follow a fixed programme, covering the fundamentals that every Electrical engineer is expected to master through a mixture of lectures and project work. In the 3rd and 4th years, a range of advanced modules can be chosen by the students, enabling specialisation in their areas of interest. The 3rd year ends with a 6 month placement (see Work Experience), giving students an opportunity to gain industry-relevant skills. The course culminates in the Final Year Individual project (see Projects), in which students display the full range of skills cultivated over the course.

Despite its focus on STEM, Imperial College enables students to pursue Humanities subjects through the Horizons programme. I have taken this opportunity to learn Italian, which I have completed to B1 standard.

Relevant Modules

Year 1 (2019-2020)

  • C++ Programming for Engineers

  • Digital Electronics and Computer Architecture

Year 2 (2020-2021)

  • Linear Algebra

  • Circuits and Systems

  • Power Electronics

  • Control Systems

Year 3 (2021-2022)

  • Deep Learning

  • Semiconductor Devices

  • Biomedical Electronics

  • Control Engineering

Year 4 (2022-2023)

  • Full-Custom Integrated Circuit Design

  • Hardware and Software Verification

  • Advanced Optimization

  • Computer Vision and Pattern Recognition

  • Signal Processing and Machine Learning for Finance
Degree Overview


Bit-Level Manipulated Graph Neural Networks


Neural networks have been widely deployed to achieve state-of-the-art performance in classification and regression tasks within various domains. Inference is typically performed on GPU devices, which are readily available and offer large performance improvements over general-purpose CPUs due to their deeply parallelized architecture.


As cutting-edge models become increasingly complex, GPUs have shown performance limitations due to expensive data copy and synchronization mechanisms. In particular, extreme low-latency applications, such as in high-energy physics or autonomous vehicles, show the need for custom hardware to achieve sub-micro-second predictions. FPGA devices are well capable of meeting these requirements due to their reconfigurable logic fabric and have been shown to achieve up to 10x latency and throughput improvements over GPU counterparts, with orders of magnitude lower power consumption. The added flexibility of FPGAs, stemming from their reconfigurability, enables finer-grained optimizations in network implementation, such as layer-specific quantization schemes. This use of lower precision numerical formats has been shown to reduce memory requirements and computational overhead, as well as power consumption on reconfigurable devices, at a low cost to model accuracy.


In recent times, Graph Neural Networks (GNNs) have attracted great attention due to their classification performance on non-Euclidean data. FPGA acceleration proves particularly beneficial for GNNs given their irregular memory access patterns, resulting from the sparse structure of graphs. These unique compute requirements have been addressed by several FPGA and ASIC accelerators, such as HyGCN and GenGNN.


Despite the relative success of hardware approaches to accelerate GNN inference on FPGA devices, previous works have shown to be limited to small graphs with up to 20k nodes, such as Cora, Citeseer and Pubmed. Since the computational overhead of GNN inference grows with increasing graph size, current accelerators are not well-prepared to handle medium to large-scale graphs, particularly in real-time applications.


This work introduces AGILE (Accelerated Graph Inference Logic Engine), an FPGA accelerator aimed at enabling real-time GNN inference for large graphs by exploring a range of hardware optimisations. A new asynchronous programming model is formulated, which reduces pipeline gaps by addressing the non-uniform distribution in node degrees. Inspired by GNN quantisation analysis from Taylor et al., a multi-precision node dataflow is proposed, improving throughput and device resource usage. Finally, an on-chip Prefetcher unit was implemented to cut down memory access latency. Evaluation on Planetoid graphs shows up to 2.8x speed-up against GPU counterparts.


Get in Touch!

You can send a message using the adjacent form, or contact me directly using the links below.

  • LinkedIn

Thanks for getting in touch! I'll get back to you as soon as possible.

bottom of page