Gabriel Marques Domingues-A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries

סמינר מחלקת מערכות - EE Systems Seminar

11 בפברואר 2024, 15:30 
Electrical Engineering-Kitot Building 011 Hall  
Gabriel Marques Domingues-A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries

Electrical Engineering Systems Seminar

 

Speaker: Gabriel Marques Domingues

M.Sc. student under the supervision of Prof. Guy Even and Prof. Boaz Patt-Shamir

 

Sunday, 11th February 2024, at 15:30

Room 011, Kitot Building, Faculty of Engineering

 

A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries

 

Abstract

We present the first hardware design that supports operations over the Fano–Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store f binary strings, each of length l+log(m) using a string that is m+f+fl bits long (rather than f(l+log(m))). The asymptotic gate-count of the circuit is Θ((m+f)log(m)+fl). The asymptotic delay is Θ(log(m)+log(f)+log(l)). We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits.

We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding.

We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to nmax=45⋅214  = 737, 280 elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued every clock cycle. (5) We prove that the probability of a false-positive error is bounded by 0.385⋅10-2 . (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions.

Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel). A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by nmax). Previous dynamic filter implementations in software (implemented on x86 or GPU’s) do not exhibit stable and constant throughputs.

 

השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום שם מלא + מספר ת.ז. בדף הנוכחות שיועבר באולם במהלך הסמינר

 

אוניברסיטת תל אביב עושה כל מאמץ לכבד זכויות יוצרים. אם בבעלותך זכויות יוצרים בתכנים שנמצאים פה ו/או השימוש
שנעשה בתכנים אלה לדעתך מפר זכויות, נא לפנות בהקדם לכתובת שכאן >>