Efficient and Side-Channel-Secure Block Cipher Implementation with Custom Instructions on FPGA Suvarna Mane Mostafa Taha Patrick Schaumont Department Virginia Tech Blacksburg, USA email@example.com ECE Department Virginia Tech Blacksburg, USA firstname.lastname@example.org ECE Department Virginia Tech Blacksburg, USA email@example.com Abstract—The security threat of side-channel analysis (SCA) attacks has created a need for SCA countermeasures. While many countermeasures have been proposed, a key challenge remains to design a countermeasure that is effective, that is easy to integrate in existing cryptographic implementations, and that has low overhead in area and performance. We present our solution in the context of an embedded design flow for FPGA. We integrate an SCA-resistant custom instruction set on a soft-core CPU. The SCA resistance is based on dual-rail precharge logic. A balanced-interleaved data format, combined with a novel memory organization, ensures that we can support both logic operations as well as lookup tables. The resulting countermeasure applies to a broad class of block ciphers. We demonstrate our results on an Altera Cyclone-II FPGA with Nios-II/s processor for a 128bit Advanced Encryption Standard (AES) T-box implementation. We show SCA improvement of more than 400X for a systemwide electro-magnetic attack that covers both the FPGA and offchip memory (SSRAM). This comes at an overhead of 2.7x in performance and 1.15X in area. Using comparisons with related work, we demonstrate that this represents an excellent tradeoff between SCA resistance, (software and hardware) design complexity, performance, and circuit area cost. Index Terms—Side Channel Analysis; Custom Instructions; Softcore CPU; FPGA; Cryptography. I. I NTRODUCTION Since their introduction over a decade a ago, side-channel analysis (SCA) techniques have been successfully used to extract secret keys from cryptographic algorithms by exploiting side-channel information such as execution time, power consumption, or electromagnetic emissions. The field of SCA has been intensively researched for attacks and countermeasures. Several recent results highlight the risk of SCA to commercial, deployed systems with a trustworthiness requirement. This includes the use of SCA to extract keys from Virtex-II FPGA  and Virtex-4/5 FPGA  bitstream encryption, from the Mifare DESFire contactless card , from the Keeloq keyless entry system , and from the Atmel Cryptomemory nonvolatile memory . It’s reasonable to assume that most of these devices were not designed with SCA in mind. Indeed, if SCA attacks are considered part of the threat model of a design, one can introduce suitable SCA countermeasures to hamper these attacks. Modern smart-cards, for example, have built-in countermeasures against SCA attacks, as well as against active (fault-based) attacks. scaresistant C C C C scaresistant CPU Performance Circuit Area SCA Resistance C Complexity (a) = CPU CPU Performance Circuit Area SCA Resistance C Complexity (b) ~ = Custom Ins + Memory Org CPU Performance Circuit Area SCA Resistance C Complexity ~ ~ (c) Figure 1. SCA resistant design by (a) C source code transformation, (b) Dedicated circuit styles and (c) Customized CPU Nevertheless, the design and implementation of a sidechannel countermeasure is a complex and error-prone process: literature shows a long string of attacks against countermeasures . Our work is motivated by the need for an easyto-use countermeasure, applicable to a wide range of designs and usable within a standard FPGA design flow. We consider protection of a general class of block ciphers that use logic operations and lookup tables. This includes AES, DES, and many others. We propose our methodology in the context of embedded designs with a CPU, and we develop sidechannel resistance for cryptographic software executing on the processor. The design of side-channel countermeasures is complex because side-channel leakage is a byproduct of the implementation of a cryptographic algorithm. Predicting the amount of side-channel leakage from, say, cryptographic software in C is difficult. Our objective is to systematically remove sidechannel leakage while keeping a reasonable cost in circuit area and performance degradation. Figure 1 illustrates three approaches applicable to the context of embedded processors. The first, Figure 1a, transforms the crypto-software into an implementation without exploitable leakage, for example by using masking . This countermeasure is usually algorithmspecific, and requires in-depth understanding of cryptographic operations. Moreover, masking becomes very complex under advanced SCA techniques . The second approach, Figure 1b, is to implement the CPU in a SCA-resistant circuit style. Past research has shown that these techniques are very expensive in hardware - costing 3 to 15 times the original circuit area  - and thus not applicable to a complete CPU. Our approach, Figure 1c, is to use a customized CPU, with a custom instruction-set and an optimized memory organization. This design configuration is supported by soft-core CPU in mainstream FPGA families. Our work is not the first to suggest a customized CPU for side-channel resistant implementations; previous proposals have included masking-based ,  and hiding-based ,  designs. However, we will demonstrate that our solution offers several advantages over these proposals. The remainder of this paper is organized as follows. In the next section we discuss a few preliminaries, including a brief review of Dualrail Precharge (used by our countermeasure), and a review of the building blocks of modern block ciphers. After that, we present the components of our solution, and show how we can efficiently map SCA-resistant block ciphers into FPGA. In Section IV, we implement a full block cipher (AES) with our technique, and we analyze circuit cost, performance, and SCA resistance. Section V elaborates on these results, discusses the limitations of our approach as well as related work. Finally we conclude the paper. II. P RELIMINARIES In this section, we briefly review two important preliminaries of our proposed solution: Dual-rail Precharge Logic, and the overall structure of modern block ciphers. A. Principle of Dual-rail Precharge Logic (DPL) The cause of side-channel leakage is data-dependent processing. In CMOS logic, such processing gives data-dependent signal transitions, which in turn results in data-dependent power consumption or radiation. The idea of Dual-rail Precharge Logic (DPL) is to eliminate side-channel leakage at the level of the implementation. DPL can be achieved as follows. First, every data bit in the circuit is stored and processed in complementary form. For example, for every logic operation a and b, there is a matching complementary operation not(a) or not(b) which is simultaneously executed. Second, every complementary data pair (a, not(a)) is pre-charged to (0,0) before every evaluation. When combined, these two properties result in constant power consumption: every evaluation has an active 0 → 1 transition, either on the true net, or else on the complementary net. DPL has been applied in many different forms since it was first proposed, including ASIC, FPGA, and software , , , . Authors have also identified sources of residual leakage, including early evaluation and imbalance between complementary pairs . However, DPL has demonstrated substantial reduction of side-channel leakage in prototypes. For this reason, we have selected it in our countermeasure. B. Modern Block Ciphers In this paper, we focus our efforts on protecting a broad class of symmetric-key algorithms known as block ciphers. Table I B LOCK C IPHER O PERATIONS (E NCRYPTION W / O K EY S CHEDULE ) Cipher Structure AES Blowfish Camellia CAST-256 Clefia DES GOST KASUMI PRESENT Serpent SPN Feistel Feistel SPN Feistel Feistel Feistel Feistel SPN SPN SBOX (in x out) 8x32 8x32 8x8 8x32 8x8 6x4 4x4 7x7+9x9 4x4 4x4 Unbalanced Full Word Operations XOR XOR, XOR XOR XOR XOR XOR, XOR, XOR XOR, ADD32 ADD32, ROT ROT ROT b31..b16 b15..b0 Balanced-Interleaved b15 b15 Lower Half Word b1 b1 b0 b0 Balanced-Interleaved b31 b31 Upper Half Word b17 b17 b16 b16 Figure 2. Balanced Interleaved data format Block ciphers encrypt a block of plaintext into ciphertext through successive round transformations. As illustrated in Table I, and earlier observed by Kaps , the majority of modern block ciphers are constructed from a limited set of operations, including substitutions with lookup-tables (SBOXes), and operations such as Xor, modular addition, rotations, and shift. Furthermore, round transformations have a common structure, and use either a substitution-permutation network (SPN), or a Feistel network. Of course, within this framework, there are important differences among block ciphers as well, such as the number and size of lookup tables used, and the detailed configuration of the operations. In the following sections, we implement DPL countermeasures for software implementations of block ciphers on softcores. We develop specific techniques to handle lookup tables, and logic operations. Our benchmarks and experiments have focused on AES-128 executing on a Nios/II core configured in a Cyclone-II FPGA; but our results apply to other block ciphers as well. III. O UR S OLUTION We implement a side-channel resistant block cipher by creating DPL versions of both the lookup tables as well as the logic operations in hardware. These modules are efficiently integrated into the soft-core processor using the custominstruction set interface. This way, SCA-resistant block ciphers can be executed as a sequence of custom instructions. Noncrypto software, on the other hand, is written using the regular instruction set without performance hit. The custominstruction hardware for lookup tables is built from on-chip RAM macro’s. Research has demonstrated that such dedicated structures increase side-channel resistance , and we further improve this technique. We next discuss the three components Balanced Address 16 bit address address 0x00 0x01 0xFF 0xFE TBOX_H 256x32 TBOX_L 256x32 0xFF 0x00 Balanced Data (H,L) Figure 3. 32 bit Balanced-Interleaved T-box Organization of our solution: the organization of data, the memory organization for lookup tables, and the system integration of SCAresistant block cipher hardware into software. A. SCA-resistant Data organization We need a data format that is compatible with the requirements of DPL and uses the word-level organization of an embedded system. Figure 2 shows our data arrangement. Each 32-bit word is split into two balanced half-words, and each bit from the original word is interleaved with an associated complementary bit. We call this representation a balancedinterleaved (BI) format. The logical and physical proximity of complementary bits improves symmetry between the bits (e.g. similar electrical loads), and in turn, this improves SCA resistance. Indeed, at the logical level, adjacent bits will share adjacent storage locations. In embedded architectures, storage organization may use a wordlength which is different from the processor wordlength; a 32-bit memory may be organized, for example, as two half-word banks. Keeping complementary bits adjacent ensures that they will share the same physical storage bank. Furthermore, at the physical level, adjacent bits will have closely related routing patterns on the FPGA and PCB, improving symmetry. A consequence of using a balanced-interleaved format is that each 32-bit operation from the original, unprotected block cipher, requires expansion into two balanced operations, each processing a balanced half-word. B. Memory Organization for Lookup Tables Because lookup tables are so common in block ciphers, we use a dedicated approach to implement side-channel resistant lookup tables using the RAM macro’s of the FPGA fabric. We use the AES T-box implementation as a case study. The T-box is a lookup table with 8 input bits and 32 output bits. The Tbox is defined by grouping several steps of the AES round transformation; for the purpose of explaining our method, we treat the T-box simply as an 8x32 lookup table. The complete AES algorithm requires five different T-box tables. The secure T-box design shown in Figure 3 uses a balancedinterleaved data organization. An 8x32 T-box thus needs two Table II SCA- RESISTANT I NSTRUCTION S ET FOR AES Instruction Return Value CONV_INV(a) 0, 0, .., a, a CONV_BIL(a) a, a, .., a, a CONV_BIH(a) a, a, .., a, a B_XOR(a,b) balanced-interleaved xor(a, b) B_TBx_L(a) balanced-interleaved lookup-table (lower) B_TBx_H(a) balanced-interleaved lookup-table (upper) Each AES T-box has its own B_TBx_H(a) and B_TBx_L(a); x=0,1,2,3,4 8x32 balanced-interleaved tables, each storing a half-word of the original T-box with its complementary bits. Each balancedinterleaved table is stored in a separate RAM macro. In order to achieve balancing in the address decoding logic, we follow the storage order suggested in , namely that complementary RAM macro’s require complementary addresses. The difference with our design, however, is that the complementary RAMs do not store complementary data: the data within each RAM is already balanced. Summarizing, our proposed memory organization for lookup tables achieves side-channel resistance by combining three elements. First, the use of RAM cells reduces side-channel leakage because the increased logic density they offer. Second, the use of balanced-interleaved addressing for the overall lookup table. Third, the use of balanced-interleaved data storage for lookup table content. C. System Integration An important, but often overlooked, aspect of side-channel countermeasures is the system integration. On an embedded processor, SCA-resistant encryption is just one of the many tasks handled by software. We have integrated our countermeasures as custom instructions into a soft-core processor. A custom-instruction interface offers the ability to introduce custom-hardware modules in the execution stage of a RISC pipeline. Table II shows the side-channel resistant instruction set for AES. These instructions are implemented in custom hardware using DPL. CONV_INV(a) extracts the even bits from a word, and thus converts balanced-interleaved format into direct form. CONV_BIL(a) and CONV_BIH(a) generate balanced-interleaved form from the lower resp. higher halfword of the input argument a. The round function for a T-box based AES only requires a balanced XOR, which can be supported through a single custom instruction B_XOR(a,b). Move, shift and rotate operations are compatible with balanced-interleaved arguments, so that no custom instruction is needed for those. The AES T-box has 5 different T-box tables. There is a B_TBx_L(a) and a B_TBx_H(a) to access the lower resp. higher half of each T-box table. These instructions are specific for the AES block cipher; a different block cipher would need to use different lookup tables. However, it is perfectly feasible to make the lookup tables fully reconfigurable, so that they can be programmed with the Sbox content required for a specific block cipher. The approach to implement lookup tables in the processor is an important difference with earlier work 8-bit Secret Key by Chen , and we will show how this brings considerable performance gain. The AES T-box algorithm can be written in C by making use of custom instructions embedded as inline assembly macro’s. The pre-charge operation can be supported from C as well, as illustrated in the snippet below. Note the use of volatile to prevent the removal of precharge by an optimizing compiler. 8-bit Plaintext 32-bits Output AES TBOX XOR Figure 4. Single T-box Experiment A strong feature of this approach is that it is fully compatible with the existing memory hierarchy of an embedded system. Variables can be stored into RAM in balanced-interleaved form, and they will maintain their low side-channel leakage provided that pre-charge is properly implemented. Thus, our approach is independent of the number of processor registers; it will not run out of foreground storage (in contrast to e.g. ). Our design makes sure that all sensitive data variables are always in balanced-interleaved format outside custom hardware boundary. Thus, transactions of these data variables to other storage elements such as, cache, external memories etc, do not result in side-channel leakage. Storing balanced interleaved format in background memory may still cause sidechannel leakage due to asymmetry in the physical layout of background memory. We will analyze this in the next section of the paper. IV. E XPERIMENTAL S ETUP AND RESULTS To demonstrate that our solution improves the resistance against SCA, this section presents the experimental results based on real attacks. A. Experimental setup The designs are implemented on an Altera DE2-70 evaluation board, that has a Cyclone-II EP2C70F896C6 FPGA device and NiosII softcore. Our system incorporates a 32bit NiosII/s (50MHz, pipelined) processor, an offchip memory (SDRAM or SSRAM) and communication peripherals (UART, GPIOs). We use Quartus-II with SOPC-builder to integrate the desired peripherals into the system. Electromagnetic emissions are captured with an ETS-LINDGREN EM probe (Model 7405-903) and are sampled on an Agilent Oscilloscope DSO5032 (300MHz bandwidth, 2GSa/s sampling rate). A Correlation Power Attack (CPA)  using the Hamming weight model on the T-box output is used to analyze the acquired EM traces. An oscilloscope is configured to average out 32 consecutive traces so as to reduce the noise in the acquired traces. Maximum Correlation value 0.8 volatile int t1, t2, t3; t1 = 0; // precharge t1 = B_TB0_L(in); // T-box0 lower word t2 = 0; // precharge t2 = B_TB1_L(in); // T-box1 lower word t3 = 0; // precharge t3 = B_XOR(t1, t2); // XOR Successful Attacks 0.7 Unsuccessful Attacks 0.6 0.5 0.4 0.3 0.2 0.1 0 0 2 4 6 8 10 12 14 16 18 Number of Balanced bits Figure 5. Security Improvement: Single Tbox test Our experiment is divided into two parts, a proof-of-concept experiment (Single T-box attack) and a real world implementation (128-bit AES T-box attack). The following subsections give the details. B. Single T-box Experiment In this experiment, we target an attack on single T-box operation to evaluate security gain due to specialized memory organization and balanced-interleaved data format. As illustrated in Figure 4, this test design incorporates essential components of a block cipher (AES) i.e. logical XOR and Tbox lookup. SCA-resistant XOR and lookup table operations are implemented in a custom hardware and are accessed through custom instructions. We vary the number of balanced bits in a BI dataword from 0 (unsecure) to 16 (fully secure) and perform an SCA attack on the output of a lookup table to evaluate its resistance against SCA attacks. For each of these experimental steps, we reconfigure XOR operation and change the format of T-box table contents to have required number of balancing bits. Figure 5 shows the results, where the maximum correlation value for correct key guess is plotted against the number of balanced bits present in a dataword. The correlation is calculated for 2000 traces, at its best attack point. It can be seen that the correlation of the correct key guess reduces with increasing number of balanced bits. For completely secure case, the correlation value reduces to 0.11 at 2000 traces. We could not attack fully balanced design successfully with 170000 averaged traces. This shows that our countermeasure achieves a significant security improvement. C. 128-bit AES-Tbox Prototype In the second part of our experiment, we implement an SCA-resistant AES T-box (128-bit) prototype to evaluate its efficiency in terms of security, performance and cost. We use the same platform as that of single T-box experiment with two different configurations of offchip memory (SDRAM and SSRAM). The T-box lookup tables are implemented in Table IV AES I MPLEMENTATION : A REA AND P ERFORMANCE 16 Unsecured Number of revealed key bytes 14 Configuration Area (LEs, M9K) Cycle count Unsecure Secure Unsecure Secure NiosII/S + SDRAM 3452, 143 3889, 161 13839 36977 NiosII/S + SSRAM 2814, 31 3252, 49 7375 19980 Area of a system with CPU, memory controller and custom hardware. 12 Secured 10 8 6 4 0.08 2 0.06 0 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 0.04 -2 Number of Traces Figure 6. AES-TBOX implementation: NiosII/s + SDRAM Correlation 0.02 0 −0.02 −0.04 onchip RAM macros and offchip memory is used for program execution and stack. The software uses custom instructions for secure operations and includes hardcoded secret key in a balanced-interleaved format. All intermediate variables are precharged to 0 before they are used for next operation. We attack first round of AES and conduct CPA analysis to evaluate its SCA-resistance. We perform SCA attacks on unsecure and secure implementations for SDRAM and SSRAM configurations. We have used several different secret keys and Table III lists the average security gain for the set keys. An unsecure AES implementation on NiosII/s with SDRAM offchip memory reveals 12 key bytes at around 1600 traces whereas, the secure implementation needs 40000 traces to reveal 12 key bytes. This results in an overall security gain of 25x at 75% success rate. Figure 6 plots the number of key bytes revealed as a function of number of traces for SDRAM configuration. In case of SSRAM configuration, an unsecure implementation achieves 75% success rate at an average of 633 traces, whereas we could not attack secure implementation for 300000 traces. Figure 7 shows the correlation trace of correct key byte for 300000 samples. This results in security gain of at least 474, which significantly differs from that of an SDRAM configuration. We investigate the possible reasons for this difference in the next section. Table III AES I MPLEMENTATION : S ECURITY Configuration NiosII/S + SDRAM NiosII/S + SSRAM MTD (# revealed keys) Unsecure Secure 1600 (12) 40000 (12) 633 (12) 300000 (0) Security gain 25 >474 −0.06 −0.08 0 100 200 300 400 500 600 Time Samples 700 800 900 1000 Figure 7. Attack results on secure implementation: NiosII/s + SRAM. Trace of correct key guess (here, first key byte) is plotted in black, while all other key guesses are in yellow(gray). The buried trace means unsuccessful attack. V. A NALYSIS AND C OMPARISON In this section, we analyze the possible causes for residual side-channel leakage in case of SDRAM-system and compare our work with related published secure implementations. A. Impact of PCB Layout The location of peripheral chips on a PCB board has a significant impact on the security of overall system. Figure 8 depicts the layout of DE2-70 board. We can see that, SSRAM has more symmetric location with respect to the CycloneII FPGA than that of SDRAM. A 32-bit SDRAM is configured as two 16-bit memory banks, whereas SSRAM is a 32-bit memory chip. With this layout, SDRAM does not always offer adjacent data-pin locations for a complementary bit-pair. This creates an imbalance between direct and complimentary bitlines irrespective of their balanced format. On the other hand,the SSRAM has a more symmetric data-pin pattern, which routes complementary bit lines together and thus, reduces the residual side-channel leakage. Note that storing the secret data in offchip memories makes the system susceptible to a simple probing attack on its chip pins. In our work, we assume that such attacks do not happen. B. Related Implementations This security improvement comes at the cost of performance and area overhead. A secure implementation occupies extra logic for customized hardware and needs to split every 32-bit sensitive dataword into two 32-bit balanced words. Additionally, all variables need to be precharged before they can be reused. This overhead of the additional instructions causes a small performance degradation. Our secure implementation is 2.7 times slower than unsecure implementation and takes 15% more area. Table IV enlists these results. In this section, we compare our solution with other secure implementations. As shown in Table V, these implementations target different technologies, different countermeasures and different cryptographic algorithms. As the attack methods are not standardized, it is not a straight-forward process to compare them on the same scale. Therefore, the table should be consulted while at the same time referring to the original publications by Barte et. al. , Chen et.al. , Ambrose et. al. , and Regazzoni et. al. . R EFERENCES Byte0 Byte1 Byte2 SSRAM: symmetric pin positions: No Leakage Byte4 SDRAM: Asymmetric pin positions: Leakage Byte0 Byte 1 Byte2 Byte 3 Figure 8. Impact of PCB Layout on Residual Leakage Table V R ELATED WORK : C OMPARISON Work Technology, Base processor Implementation  Spartan-3, MicroBlaze Virtex-4, Leon3 Spartan-3E, Leon3 ASIC 180nm, OpenRISC1000 CycloneII, NiosII/s Masking, DES    Our Design Masking, AES Hiding (VSC), AES Hiding (MCML), PRESENT Hiding, AESTbox Security gain/Area overhead/Performance degradation 2X / 1.34X1 / – 3.5X / – / 2X 20X / 3.3X / 6.5X – / 2.65X / – >474X / 1.15X2 / 2.7X 1 This number represents area overhead in terms of slice-count. 2 Area of a system with only processor and SRAM memory controller. Compared to this earlier work, our design is very systematic, making the design phase simpler than above mentioned implementations. We believe that, for FPGA implementations, it exceeds above-mentioned solutions in terms of the trade-off between security, performance, area and design flexibility. VI. C ONCLUSION Security against side-channel attacks are an important concern with increased use of embedded systems in security applications. This paper reports an efficient and secure embedded system design on FPGA by using industrial design flow. We use a novel memory organization technique and interleaved data format in combination with a hiding countermeasure. Though, we have demonstrated our results on an Altera FPGA for AES-Tbox implementation, the methodology is portable to other FPGA platforms for majority of the block ciphers. We discuss how location of peripheral offchip components on PCB board plays an important role in the overall security evaluation. Our experimental results establish the feasibility of proposed methodology to implement an embedded system to achieve desired security at reasonable cost. VII. ACKNOWLEDGMENTS This research was supported in part by National Science Foundation Grant no. 1115839.  J.A. Ambrose, S. Parameswaran, A. Ignjatovic. MUTE AES: A Multiprocessor Architecture to prevent Power Analysis based Side Channel Attack of the AES Algorithm ICCAD 2008, pp. 678 -684.  J. Balasch, B. Gierlichs, R. Verdult, L. Batina, and I. Verbauwhede. Power Analysis of Atmel CryptoMemory - Recovering Keys from Secure EEPROMs. In Topics in Cryptology - CT-RSA 2012, The Cryptographers’ Track at the RSA Conference, Lecture Notes in Computer Science 7178, O. Dunkelman (ed.), Springer-Verlag, pp. 19-34, 2012.  L. Barthe, P. Benoit, L. Torres. Investigation of a Masking Countermeasure against Side-Channel Attacks for RISC-based Processor Architectures. FPL 2010: 139-144.  S. Chari, Charanjit S. Jutla, J. R. Rao, P. Rohatgi. Towards Sound Approaches to Counteract Power-Analysis Attacks. CRYPTO 1999: 398412.  Z. Chen, A. Sinha, P. Schaumont. Implementing virtual secure circuit using a custom-instruction approach. CASES 2010: 57-66.  Z. Chen, P. Schaumont. Virtual Secure Circuit: Porting Dual-Rail Precharge Techniques into Software on Multicore IACR ePrint Archive 2010/270 (2010)  J.S. Coron, E. Prouff, M. Rivain. Side Channel Cryptanalysis of a Higher Order Masking Scheme. CHES 2007: 28-44.  T. Eisenbarth, T. Kasper, A. Moradi, C. Paar, M. Salmasizadeh, M. T. Manzuri-Shalmani. On the Power of Power Analysis in the Real World: A Complete Break of the KeeLoqCode Hopping Scheme. CRYPTO 2008: 203-220.  S. Guilley, L. Sauvage, P. Hoogvorst, R. Pacalet, G. Bertoni, and S. Chaudhuri. Security Evaluation of WDDL and SecLib Countermeasures against Power Attacks IEEE Transactions on Computers (2008) 57 (11): 1482-1497.  S. Guilley, L. Sauvage, F. Flament, V. Vong, P. Hoogvorst, R. Pacalet. Evaluation of Power Constant Dual-Rail Logics Countermeasures against DPA with Design Time Security Metrics. IEEE Transactions on Computers (2010 Jan 1) 59 (9): 1250-1263.  P. Hoogvorst, G. Duc, J.L. Danger. Software Implementation of DualRail Representation. COSADE 2011.  J. Kaps, G. Gaubatz, B. Sunar. Cryptography on a Speck of Dust IEEE Computer Magazine, 40(2):38-44, 2007.  T. Kasper, D. Oswald, C. Paar. Side-Channel Analysis of Cryptographic RFIDs with Analog Demodulation. RFIDSec 2011: 61-77  S. Mangard, E. Oswald, T. Popp, Differential Power Analysis in Power Analysis Attacks: Revealing the Secrets of Smart Cards., Springer, 2007.  A. Moradi, A. Barenghi, T. Kasper, C. Paar. On the vulnerability of FPGA bitstream encryption against power analysis attacks: extracting keys from xilinx Virtex-II FPGAs. ACM Conference on Computer and Communications Security 2011: 111-124  A. Moradi, M. Kasper, C. Paar. Black-Box Side-Channel Attacks Highlight the Importance of Countermeasures - An Analysis of the Xilinx Virtex-4 and Virtex-5 Bitstream Encryption Mechanism. CT-RSA 2012: 1-18  A. Moradi, A. Poschmann. Lightweight Cryptography and DPA Countermeasures: A Survey. Financial Cryptography Workshops 2010: 68-79.  F. Regazzoni, A. Cevrero, F.X. Standaert, S. Badel, T. Kluter, P. Brisk, Y. Leblebici, P. Ienne. A Design Flow and Evaluation Framework for DPA-Resistant Instruction Set Extensions. CHES 2009: 205-219.  S. Shah, R. Velegalati, J.P. Kaps, D. Hwang. Investigation of DPA Resistance of Block RAMs in Cryptographic Implementation on FPGAs. Reconfig 2010: 274 - 279.  S. Tillich, M. Kirschbaum, A. Szekely. SCA-Resistant Embedded Processors- The Next Generation. ACSAC 2010: .  S. Tillich, M. Kirschbaum, A. Szekely. Implementation and Evaluation of an SCA-Resistant Embedded Processor. CARDIS 2011: 151-165.  K. Tiri, I. Verbauwhede. A digital design flow for secure integrated circuits IEEE TCAD (2006 Jan 1) 25 (7): 1197-1208.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project