List of x86 cryptographic instructions

Instructions that have been added to the x86 instruction set in order to assist efficient calculation of cryptographic primitives, such as e.g. AES encryption, SHA hash calculation and random number generation.

Intel AES instructions

edit

6 new instructions.

Instruction Encoding Description
AESENC xmm1,xmm2/m128 66 0F 38 DC /r Perform one round of an AES encryption flow
AESENCLAST xmm1,xmm2/m128 66 0F 38 DD /r Perform the last round of an AES encryption flow
AESDEC xmm1,xmm2/m128 66 0F 38 DE /r Perform one round of an AES decryption flow
AESDECLAST xmm1,xmm2/m128 66 0F 38 DF /r Perform the last round of an AES decryption flow
AESKEYGENASSIST xmm1,xmm2/m128,imm8 66 0F 3A DF /r ib Assist in AES round key generation
AESIMC xmm1,xmm2/m128 66 0F 38 DB /r Assist in AES Inverse Mix Columns

CLMUL instructions

edit
Instruction Opcode Description
PCLMULQDQ xmm1,xmm2,imm8 66 0F 3A 44 /r ib Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2k).
PCLMULLQLQDQ xmm1,xmm2/m128 66 0F 3A 44 /r 00 Multiply the low halves of the two 128-bit operands.
PCLMULHQLQDQ xmm1,xmm2/m128 66 0F 3A 44 /r 01 Multiply the high half of the destination register by the low half of the source operand.
PCLMULLQHQDQ xmm1,xmm2/m128 66 0F 3A 44 /r 10 Multiply the low half of the destination register by the high half of the source operand.
PCLMULHQHQDQ xmm1,xmm2/m128 66 0F 3A 44 /r 11 Multiply the high halves of the two 128-bit operands.

RDRAND and RDSEED

edit
Instruction Encoding Description Added in
RDRAND r16
RDRAND r32
NFx 0F C7 /6 Return a random number that has been generated with a CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) compliant with NIST SP 800-90A.[a] Ivy Bridge,
Excavator,
Puma,
ZhangJiang,
Knights Landing,
Gracemont
RDRAND r64 NFx REX.W 0F C7 /6
RDSEED r16
RDSEED r32
NFx 0F C7 /7 Return a random number that has been generated with a HRNG/TRNG (Hardware/"True" Random Number Generator) compliant with NIST SP 800-90B and C.[a] Broadwell,
ZhangJiang,
Knights Landing,
Zen 1,
Gracemont
RDSEED r64 NFx REX.W 0F C7 /7
  1. ^ a b The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the issuing of these instructions – if this happens, then software may retry the instructions (although the number of retries should be limited, in order to ensure forward progress[1]). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random number will also set the instruction's destination register to 0.

Intel SHA instructions

edit

7 new instructions.

Instruction Encoding Description
SHA1RNDS4 xmm1,xmm2/m128,imm8 NP 0F 3A CC /r ib Perform Four Rounds of SHA1 Operation
SHA1NEXTE xmm1,xmm2/m128 NP 0F 38 C8 /r Calculate SHA1 State Variable E after Four Rounds
SHA1MSG1 xmm1,xmm2/m128 NP 0F 38 C9 /r Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
SHA1MSG2 xmm1,xmm2/m128 NP 0F 38 CA /r Perform a Final Calculation for the Next Four SHA1 Message Dwords
SHA256RNDS2 xmm1,xmm2/m128,<XMM0> NP 0F 38 CB /r Perform Two Rounds of SHA256 Operation
SHA256MSG1 xmm1,xmm2/m128 NP 0F 38 CC /r Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
SHA256MSG2 xmm1,xmm2/m128 NP 0F 38 CD /r Perform a Final Calculation for the Next Four SHA256 Message Dwords

Intel AES Key Locker instructions

edit

These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access to any unencrypted copies of the key during the actual encryption/decryption process.

Instruction Encoding Description Notes
LOADIWKEY xmm1,xmm2 F3 0F 38 DC /r Load internal wrapping key ("IWKey") from xmm1, xmm2 and XMM0. The two explicit operands (which must be register operands) specify a 256-bit encryption key. The implicit operand in XMM0 specifies a 128-bit integrity key. EAX contains flags controlling operation of instruction.

After being loaded, the IWKey cannot be directly read from software, but is used for the key wrapping done by ENCODEKEY128/256 and checked by the Key Locker encode/decode instructions.

LOADIWKEY is privileged and can run in Ring 0 only.

ENCODEKEY128 r32,r32 F3 0F 38 FA /r Wrap a 128-bit AES key from XMM0 into a 384-bit key handle and output handle in XMM0-2. Source operand specifies handle restrictions to build into the handle.

Destination operand is initialized with information about the source and attributes of the key.

These instruction may also modify XMM4-6 (zeroed out in existing implementations, but this should not be relied on).

ENCODEKEY256 r32,r32 F3 0F 3A FB /r Wrap a 256-bit AES key from XMM1:XMM0 into a 512-bit key handle and output handle in XMM0-3.
AESENC128KL xmm,m384 F3 0F 38 DC /r Encrypt xmm using 128-bit AES key indicated by handle at m384 and store result in xmm. All of the Key Locker encode/decode instructions will check whether the handle is valid for the current IWKey and encode/decode data only if the handle is valid.

These instructions will set the ZF flag to indicate whether the provided handle was valid (ZF=0) or not (ZF=1).

AESDEC128KL xmm,m384 F3 0F 38 DD /r Decrypt xmm using 128-bit AES key indicated by handle at m384 and store result in xmm.
AESENC256KL xmm,m512 F3 0F 38 DE /r Encrypt xmm using 256-bit AES key indicated by handle at m512 and store result in xmm.
AESDEC256KL xmm,m512 F3 0F 38 DF /r Decrypt xmm using 256-bit AES key indicated by handle at m512 and store result in xmm.
AESENCWIDE128KL m384 F3 0F 38 D8 /0 Encrypt XMM0-7 using 128-bit AES key indicated by handle at m384 and store each resultant block back to its corresponding register.
AESDECWIDE128KL m384 F3 0F 38 D8 /1 Decrypt XMM0-7 using 128-bit AES key indicated by handle at m384 and store each resultant block back to its corresponding register.
AESENCWIDE256KL m512 F3 0F 38 D8 /2 Encrypt XMM0-7 using 256-bit AES key indicated by handle at m512 and store each resultant block back to its corresponding register.
AESDECWIDE256KL m512 F3 0F 38 D8 /3 Decrypt XMM0-7 using 256-bit AES key indicated by handle at m512 and store each resultant block back to its corresponding register.

VIA PadLock instructions

edit

The VIA/Zhaoxin PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX. Like the old string instructions, they are all designed to be interruptible.

Padlock subset Instruction Encoding Description Added in
RNG
Random Number Generation.
XSTORE NFx 0F A7 C0 Store random bytes to ES:[rDI], and increment ES:rDI accordingly. XSTORE will store currently-available bytes, which may be from 0 to 8 bytes. REP XSTORE will write the number of random bytes specified by rCX, waiting for the random number generator when needed. EDX specifies a "quality factor". Nehemiah
(stepping 3)
REP XSTORE F3 0F A7 C0
ACE
Advanced Cryptography Engine.
REP XCRYPTECB F3 0F A7 C8 Encrypt/Decrypt data, using the AES cipher in various block modes (ECB, CBC, CFB, OFB and CTR, respectively). rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rAX a pointer to an initialization vector for block modes that need it, and rDX a pointer to a control word.[a] Nehemiah
(stepping 8)
REP XCRYPTCBC F3 0F A7 D0
REP XCRYPTCFB F3 0F A7 E0
REP XCRYPTOFB F3 0F A7 E8
ACE2[b]
REP XCRYPTCTR F3 0F A7 D8 C7 "Esther"[2]
PHE
Hash Engine.
REP XSHA1 F3 0F A6 C8 Compute a cryptographic hash (using the SHA-1 and SHA-256 functions, respectively). ES:rSI points to data to compute a hash for, ES:rDI points to a message digest and rCX specifies the number of bytes. rAX should be set to 0 at the start of a calculation.[c] Esther
REP XSHA256 F3 0F A6 D0
PMM
Montgomery Multiplier.
REP MONTMUL F3 0F A6 C0 Perform Montgomery Multiplication. Takes an operand width in ECX (given as a number of bits – must be in range 256..32768 and divisble by 128) and pointer to a data structure in ES:ESI.[d] Esther
GMI[4][5]
Chinese national cryptographic algorithms. (Zhaoxin only.)
CCS_HASH F3 0F A6 E8 Compute SM3 hash, similar to the REP XSHA* instructions. The rBX register is used to specify hash function (20h for SM3 being the only documented value). ZhangJiang
CCS_ENCRYPT F3 0F A7 F0 Encrypt/Decrypt data, using the SM4 cipher in various block modes. rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rDX a pointer to an initialization vector for block modes that need it, and rAX contains a control word.[e]

Footnotes

edit
  1. ^ The control word for REP XCRYPT* is a 128-bit data structure with the following layout:
    Bits Usage
    3:0 AES round count
    4 Digest mode enable (ACE2 only)
    5 1=allow data that is not 16-byte aligned (ACE2 only)
    6 Cipher: 0=AES, 1=undefined
    7 Key schedule: 0=compute (128bit key only), 1=load from memory
    8 0=normal, 1=intermediate-result
    9 0=encrypt, 1=decrypt
    11:10 Key size: 00=128bit, 01=192bit, 10=256bit, 11=reserved
    127:12 Reserved, set to 0
  2. ^ ACE2 also adds extra features to the other REP XCRYPT instructions: a digest mode for the CBC and CFB instructions, and the ability to use input/output data that are not 16-byte aligned for the non-ECB instructions.
  3. ^ On VIA Nano and later processors, setting rAX to an all-1s value for the REP XSHA* instructions will enable an alternate operation mode, where rCX specifies the number of 64-byte blocks, and where the standard FIPS-180-2 length extension procedure at the end of the hash calculation is omitted. This makes for a variant more suitable for data streaming than the original EAX=0 variant.[3] This functionality also exists for CCS_HASH.
     
  4. ^ The data structure to REP MONTMUL contains six 32-bit elements, where the first one is a negated modular inverse of the bottom 32 bits of the modulus and the remaining 5 are pointers to various memory buffers:
    Offset Data item
    0 Negated modular inverse
    4 Pointer to first multiplicand
    8 Pointer to second multiplicand
    12 Pointer to result buffer
    16 Pointer to modulus
    20 Pointer to 32-byte scratchpad
  5. ^ The CCS_ENCRYPT control word in rAX has the following format:
    Bits Usage
    0 0=Encrypt, 1=Decrypt
    5:1 Must be 10000b for SM4.
    6 ECB block mode
    7 CBC block mode
    8 CFB block mode
    9 OFB block mode
    10 CTR block mode
    11 Digest enable

    Remaining bits in rAX must be set to all-0s.

    Of bits 10:6 in rAX (block mode selection), exactly one bit must be set, or else behavior is undefined.

References

edit
  1. ^ Intel, Digital Random Number Generator (DRNG) Software Implementation Guide rev 2.1, oct 17, 2018, sections 5.2 and 5.3. Archived on nov 19, 2021.
  2. ^ Michal Ludvig, VIA PadLock—Wicked Fast Encryption, Linux Journal, Apr 6, 2005. Archived on Jun 20, 2005.
  3. ^ Stack Overflow, Streaming SHA calculation using VIA's Padlock Hashing Engine?, Aug 11, 2014. Archived on Jun 14, 2019.
    The PadLock SDK (v3.1) referenced in the Stack Overflow answer can be downloaded from the Crypto++ wiki (accessed on Aug 11, 2023) or the Wayback Machine.
  4. ^ Zhaoxin, Core Technology | Instructions for the use of accelerated instructions for national encryption algorithm based on Zhaoxin processor (in Chinese). Archived on Jan 5, 2022
  5. ^ Zhaoxin, GMI User Manual v1.0 (in Chinese). Archived on Feb 28, 2022