-
Engineering and technology
- Electronics
As Moore’ Law continues to drive the number of transistors inside integrated circuits (IC) higher, alongside the scaling down of transistor dimensions, more complex digital designs are being realized. Continuous technology evolution has led to increasingly more complex architectures and designs. Verifying and validating these designs has become an increasingly difficult task. Additionally, multiple designs contain a safety-critical feature, and are in need to comply with specific safety-critical standards, making IC reliability a fundamental concern in the design and manufacturing process. Various hardware verification techniques exist, such as formal verification, simulation, fault injection, debugging, fault mitigation, online test, etc. They can be used based on the requirements of each design and at different stages in the design flow. There is a trade-off between speed, design complexity and required fault coverage. This trade-off normally affects the verification methodology and plan that will be followed for each design. Verification, and especially debugging is essential to ensure the functional correctness of the entire system. Pre-Silicon Verification indicates the activities before the silicon chip is available. These processes include testing devices in a virtual environment with simulation and formal verification tools. Historically, the most accepted verification method has been simulation-based, due to its flexibility and ease of use. However, simulation-based techniques are unable to handle the increasing complexity of the designs. As processor frequency scaling levels off, simulation-based techniques are unable to keep up with today’ growing complexity. Additionally, many bugs escape pre-silicon verification, and can only be discovered after the creation of the first silicon. In post-silicon debug designers have early access to bugs, in high speeds, since tests are executed directly on the silicon. Tests occur on actual devices running at-speed in real-world system boards. Hence, the industry has shifted towards post-silicon validation, by using prototyping and FPGA emulation. A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. In FPGA design the developer has to define its function through implementing a digital circuit on the FPGA resources. The FPGA can be reprogrammed (reconfigured) by changing the configuration data that defines the FPGA functionality. These data can be modified according to the user’ needs. Partial Reconfiguration (PR) enables us to configure a part in the configuration memory during run-time, via the Internal Configuration Access Port (ICAP). Researchers at the HES group have created a technique called parameterized reconfiguration technique to create and implement a parameterized design. A design can become parameterized if some of its input values change less frequently than the rest. Instead of implementing these inputs (parameters) as regular inputs, they are implemented as constants, and the design is optimized for these constants. For every change in the parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. The bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. In this dissertation I propose innovative techniques and tools to provide integrated reliability and verification in FPGAs that have integrated custom components. Furthermore, I investigate methods that increase the reliability and the internal observability during debugging, in both commercial and academic FPGAs. First, I propose a method to efficiently introduce debugging to any given design. By leveraging the Parameterized Configurations (PConf) technique I add a virtual overlay that integrates debugging functionality in a design. Since it is virtual, the impact on the area is minimal. A guarantee of virtual connections between signals and tracing memories is provided. Then, I rank internal signals based on classification criteria, to find optimal signal sets that will be able to trace the bugs faster. After that, I investigate how an on-silicon debugging infrastructure can be introduced into an FPGA, in such a way that it introduces minimal area overhead. I start studying the role of FPGA debugging structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I use two different techniques based on the target FPGA architecture and I create the Superimposed Debugging Architecture ( SDA) that is integrated in the VCGRA. After these chapters, I move on to the second part of the thesis, that focuses on design methodologies that increase the reliability of Commercial-Off-The-Shelf (COTS) FPGAs. First, I propose two configuration memory scrubbing techniques that are based on a fine-grained form of reconfiguration used for the PConf, called microreconfiguration. Afterwards, I propose a custom FPGA structure (reconfiguration controller) to increase the reliability and reduce the main overhead (reconfiguration time) of soft error mitigation. This structure not only reduces the scrubbing time, but it also has a fault tolerant version, that aims to increase the overall tolerance of a COTS FPGA against radiation effects. Additionally, I create a fault-tolerant scheme for future high-reliability applications in radiation environments, with multiple level fault mitigation. I leverage the multi-layer VCGRA architecture and its natural resilience against radiation effects. First, I apply a spatial redundancy method (TMR) and the two custom scrubbing mechanisms. Then, I extend the PConf and the VCGRA tool-flow to create a more complete CAD flow. This work provides fast scrubbing with less FPGA resources and it aims at building an integrated fault mitigation scheme that enhances the reliability of COTS FPGAs. I then verify the feasibility of these methods with a fault injection campaign. Finally, I focus on fault injection. I have designed two variants of fault injection, that can be used based on the underlying FPGA architecture. I classify them into two types: the partially parameterized fault injection, and fully parameterized fault injection depending on the level of parameterization used. The fault injection scheme is introduced in gate-level designs. This gives us the advantage of being able to apply the techniques for a cluster of applications, that can be modeled in a similar way. In this dissertation, the proposed method can be used either for injecting permanent faults (for post-silicon testing), or for soft errors (for safety-critical systems). This technique drastically reduces the area overhead of the fault injected design.