(Pseudo)Random Data
PA193 – Secure coding
Petr Švenda
Zdeněk Říha
Faculty of Informatics, Masaryk University, Brno, CZ
Need for “random” data
• Games
• Simulations, …
• Crypto
– Symmetric keys
– Asymmetric keys
– Padding/salt
– Initialization vectors
– Challenges (for challenge – response protocols)
– …
“Random” data
• Sometimes (games, simulations) we only need data
with some statistical properties
– Evenly distributed numbers (from an interval)
– Long and complete cycle
• Large number of different values
• All values can be generated
• In crypto we also need unpredictability
– Even if you have seen all the “random” data generated
until now you have no idea what will be the random data
generated next
“Random” data generators
• Insecure random number generators
– noncryptographic pseudo-random number generators
– Often leak information about their internal state with each output
• Cryptographic pseudo-random number generators (PRNGs)
– Based on seed deterministically generate pseudorandom data
• “True” random data generators
– Entropy harvesters
– gather entropy from other sources and present it directly
What (pseudo)random data to use?
• Avoid using noncryptographic random number
generators
• For many purposes the right way is to get the seed
from the true random number generator and then
use it in the pseudorandom number generator
(PRNG)
– PRNG are deterministic, with the same seed they produce
the same pseudorandom sequence
• There are situations, where PRNG are not enough
– E.g. one time pad
Noncryptographic generators
• Standard rand()/srand(), random ()/srandom()
functions
– libc
• “Mersenne Twister”
• linear feedback shift registers
• Anything else not labeled as cryptographic PRNG…
• Not to be used for most purposes….
Noncryptographic generators
Source: http://xkcd.com/
Source: Writing secure code, 2nd
edition
PRNG
• Cryptographic pseudo-random number generators
are still predictable if you somehow know their
internal state.
• Assuming the generator was seeded with sufficient
entropy and assuming the cryptographic algorithms
have the security properties they are expected to
have, cryptographic generators do not quickly
reveal significant amounts of their internal state.
• Protect the seed of the PRNG!
• Entropy of the seed matters!
Entropy of the seed
How much entropy do we need to seed a cryptographic generator securely?
Give as much entropy as the random number generator can accept. The entropy you get
sets the maximum security level of your data protected with that entropy, directly or
indirectly.
E.g. If a 256-bit AES key is obtained with a PRNG seeded with 56 bits of entropy, then any
data encrypted with the 256-bit AES key will be no more secure than encrypted with a 56-bit
DES key.
Source: Secure programming Cookbook
Entropy estimates
• Entropy
– Definition Shannon
– Definition Min-entropy
• Difficulty of measurement/estimates
– For example, the digits of π appear to be a completely
random sequence that should pass any statistical test for
randomness. Yet they are also completely predictable.
Entropy estimates
• After figuring out how much entropy is in a piece of data (e.g.
expected entropy is 160 bits), it is wise to divide the estimate
by a factor of 4 to 8 to be conservative.
• Because entropy is easy to overestimate, you should
generally cryptographically postprocess any entropy
collected (a process known as whitening) before using it.
– E.g. use hash functions (SHA2)
• As most PRNG take a fixed-size seed, and you want to
maximize the entropy in that seed. However, when collecting
entropy, it is usually distributed sparsely through a large
amount of data.
– E.g. use hash functions (SHA2)
Tips on collecting entropy
• Make sure that any data coming from an entropy-producing source is postprocessed
with cryptography to remove any lingering statistical bias and to help ensure that your
data has at least as many bits of entropy input as bits you want to output.
• Make sure you use enough entropy to seed any pseudo-random number generator
securely. Try not to use less than 128 bits.
• When choosing a pseudo-random number generator, make sure to pick one that
explicitly advertises that it is cryptographically strong. If you do not see the word
“cryptographic” anywhere in association with the algorithm, it is probably not good for
security purposes, only for statistical purposes.
• When selecting a PRNG, prefer solutions with a refereed proof of security bounds.
Counter mode, in particular, comes with such a proof, saying that if you use a block
cipher bit with 128-bit keys and 128-bit blocks seeded with 128 bits of pure entropy, and
if the cipher is a pseudo-random permutation, the generator should lose a bit of entropy
after 264 blocks of output.
• Use postprocessed entropy for seeding pseudo-random number generators or, if
available, for picking highly important cryptographic keys. For everything else, use
pseudo-randomness, as it is much, much faster.
Source: Secure programming Cookbook
Unix Infrastructure
• Special files – reading files provides
(pseudo)random data
– /dev/random
• Always produces entropy
• Provides random data
• Can block the caller until entropy available (blocking)
– /dev/urandom
• Based on cryptographic pseudorandom generator
• Amount of entropy not quaranteed
• Always returns quickly (non-blocking)
Unix Infrastructure
• Available on most modern Unix-like OS
– Including Linux, *BSD, etc.
• Each OS implements the functionality
independently
– Quality of the implementation can vary from OS to OS
• Usually no need to worry
• The core of the system is the seed of PRNG
– The entropy of the seed may be low during/just after
booting (in particular at diskless stations, virtual HW etc.)
– The seed is often saved at shutdown
Unix infrastructure
• Operation on files
– To get entropy use open the file and read it
• use read(2)
• it returns number of bytes read
• short read (even 0 if interrupted by a signal)
• It is also possible to write to /dev/random.
– This allows any user to mix random data into the pool.
– Non-random data is harmless, because only a privileged user
can issue the ioctl needed to increase the entropy estimate.
• Linux
– The current amount of entropy and the size of the Linux kernel
entropy pool are available in /proc/sys/kernel/random/.
Example: Linux
Example: Linux
Example: FreeBSD
• FreeBSD implements a 256-bit variant of the Yarrow algorithm, intended to provide a
cryptographically secure pseudorandom stream—this replaced a previous Linux style
random device. Unlike the Linux /dev/random, the FreeBSD /dev/random device never
blocks. Its behavior is similar to the Linux /dev/urandom, and /dev/urandom on
FreeBSD is linked to /dev/random.
• Yarrow is based on the assumptions that modern PRNGs are very secure if their
internal state is unknown to an attacker, and that they are better understood than the
estimation of entropy. Whilst entropy pool based methods are completely secure if
implemented correctly, if they overestimate their entropy they may become less secure
than well-seeded PRNGs. In some cases an attacker may have a considerable amount
of control over the entropy, for example a diskless server may get almost all of it from
the network—rendering it potentially vulnerable to man-in-the-middle attacks. Yarrow
places a lot of emphasis on avoiding any pool compromise and on recovering from it as
quickly as possible. It is regularly reseeded; on a system with small amount of network
and disk activity, this is done after a fraction of a second.
Source: Wikipedia
MS Windows – (pseudo)random data
• Function CryptGenRandom()
– Part of MS CryptoAPI
• First use CryptAcquireContext( )
• and then CryptGenRandom()
– Based on PRNG
• Internally CryptGenRandom() is using
RtlGenRandom()
– Direct call of RtlGenRandom() possible
– Does not require loading Crypto API
MSDN: CryptGenRandom()
Source: MSDN
MSDN: RtlGenRandom()
Source: MSDN
CryptGenRandom() vs. RtlGenRandom()
"Historically, we always told developers not to use functions such as rand to
generate keys, nonces and passwords, rather they should use functions like
CryptGenRandom, which creates cryptographically secure random numbers. The
problem with CryptGenRandom is you need to pull in CryptoAPI
(CryptAcquireContext and such) which is fine if you're using other crypto functions.
On a default Windows XP and later install, CryptGenRandom calls into a function
named ADVAPI32!RtlGenRandom, which does not require you load all the CryptAPI
stuff. In fact, the new Whidbey CRT function, rand_s calls RtlGenRandom".
Source:http://blogs.msdn.com/b/michael_howard/archive/2005/01/14/353379.aspx
CryptGenRandom() documentation
With Microsoft CSPs, CryptGenRandom() uses the same random number generator used by other
security components. This allows numerous processes to contribute to a system-wide seed. CryptoAPI
stores an intermediate random seed with every user. To form the seed for the random number
generator, a calling application supplies bits it might have—for instance, mouse or keyboard timing
input—that are then combined with both the stored seed and various system data and user data such as
the process ID and thread ID, the system clock, the system time, the system counter, memory status,
free disk clusters, the hashed user environment block. This result is used to seed the pseudorandom
number generator (PRNG). In Windows Vista with Service Pack 1 (SP1) and later, an implementation of
the AES counter-mode based PRNG specified in NIST Special Publication 800-90 is used. In
Windows Vista, Windows Storage Server 2003, and Windows XP, the PRNG specified in Federal
Information Processing Standard (FIPS) 186-2 is used. If an application has access to a good random
source, it can fill the pbBuffer buffer with some random data before calling CryptGenRandom(). The
CSP then uses this data to further randomize its internal seed. It is acceptable to omit the step of
initializing the pbBuffer buffer before calling CryptGenRandom().
Source: MSDN
Design of the old Windows PRNG (up to Vista)
Source: Writing secure code, 2nd
edition
The entropy in Windows comes from …
Source: Writing secure code, 2nd edition
Random data in openSSL
• OpenSSL exports its own API for manipulating random numbers. It
has its own cryptographic PRNG, which must be securely seeded.
• To use the OpenSSL randomness API, you must include
openssl/rand.h in your code and link against the OpenSSL crypto
library.
• void RAND_seed(const void *buf, int num);
• void RAND_add(const void *buf, int num, double entropy);
• int RAND_load_file(const char *filename, long max_bytes);
– Pure entropy expected
• int RAND_write_file(const char *filename);
– To save the state of PRNG
• int RAND_bytes(unsigned char *buf, int num);
HW random number generators
• Require specific devices
– More or less common
– Price
• LavaRnd (Lava Lamp)
• Random.org
• Special devices
• Crypto devices
– Smartcard
– HSM
– SSL cards
PRNG Standards
• FIPS 186-2 (replaced later by -3 and -4)
• NIST SP 800-90A
– Recommendation for Random Number Generation Using
Deterministic Random Bit Generators
– Hash_DRBG
– HMAC_DRBG
– CTR_DRBG
– Dual EC DRBG (problematic)
• Fortuna
• ANSI X9.17-1985, Appendix C
• ANSI X9.31-1998, Appendix A.2.4
• ANSI X9.62-2005, Annex D
(P)RNG Standards
• NIST SP 800-90B
– Recommendation for the Entropy Sources Used for
Random Bit Generation
• NIST SP 800-90C
– Recommendation for Random Bit Generator (RBG)
Constructions
ANSI X9.17
• ANSI X9.17 standard
– It takes as input a TDEA (with 2 DES keys) key bundle k and
(the initial value of) a 64 bit random seed s. Each time a random
number is required it:
• Obtains the current date/time D to the maximum resolution
possible.
• Computes a temporary value t = TDEAk(D)
• Computes the random value x = TDEAk(s  t)
• Updates the seed s = TDEAk(x  t)
– Obviously, the technique is easily generalized to any block
cipher
• AES has been suggested…
ANSI X9.17
ANSI X9.31
Pseudorandom data
Internal state Timestamp
ANSI X9.31
• Security of X9.31 is not considered sufficient
• Bad recovery after internal state compromise
• The only entropy added later are the timestamps
• The entropy of timestamps is problematic
• Too much dependent on the entropy of initial values
of
– The seed
– The symmetric encryption keys (3DES or AES)
Fortuna
• Designed by Bruce Schneier and Niels Ferguson
• Follower of the Yarrow algorithm
• Efforts to recover quickly from the internal state
compromise
• Adding entropy frequently
• Fortuna is state of the art
Fortuna
• It is composed of :
– Generator: produces pseudo-random data.
• Based on any good block cipher (e.g. AES, Serpent,Twofish). Cipher is
running in counter mode, encrypting successive values of an
incrementing counter. Key is changed periodically (no more than 1 MB
of data + key changed after every data request).
– Entropy accumulator: collects genuinely random data and reseeds the
generator.
• The entropy accumulator is designed to be resistant against injection
attacks thanks to the use of 32 pools of entropy (at the nth reseeding of
the generator, pool k is used only if 2k divides n).
– Seed file: stores state
NIST SP 800-90A
• NIST Special Publication 800-90A
– Recommendation for Random Number Generation Using
Deterministic Random Bit Generators
• Mechanisms based on hash functions
– Hash_DRBG
– HMAC_DRBG
• Mechanisms based on block ciphers
– CTR_DRBG
• Mechanisms Based on Number Theoretic Problems
– Dual Elliptic Curve Deterministic RBG (Dual_EC_DRBG)
ECC NIST random number generator
(Dual_EC_DRBG)
• Problematic
• Even more problematic after Snowden
The Guardian and The New York Times have reported that the National Security Agency (NSA)
inserted a CSPRNG into NIST SP 800-90 that had a backdoor which allows the NSA to readily decrypt
material that was encrypted with the aid of Dual_EC_DRBG. Both papers report that, as independent
security experts long suspected, the NSA has been introducing weaknesses into CSPRNG standard
800-90; this being confirmed for the first time by one of the top secret documents leaked to the
Guardian by Edward Snowden. The NSA worked covertly to get its own version of the NIST draft
security standard approved for worldwide use in 2006. The leaked document states that "eventually,
NSA became the sole editor.“In spite of the known potential for a backdoor and other known significant
deficiencies with Dual_EC_DRBG, several companies such as RSA Security continued using
Dual_EC_DRBG until the backdoor was confirmed in 2013.
Source:http://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator#NSA_backdoor_in_the_Dual_EC_DRBG_PRNG
Testing randomness
• Testing whether the generated sequence of bits
“looks random”, i.e. has got some statistical
properties
– E.g. the number of 0s versus the number of 1s in the
sequence of bits.
• 2 important test suits
– NIST
– Diehard
NIST tests
• NIST Special Publication 800-22rev1a
– “A Statistical Test Suite for the Validation of Random
Number Generators and Pseudo Random Number
Generators for Cryptographic Applications”
– Revised in April 2010
– Textual description of the tests (+ mathematics/statistics
behind)
– Software implementation
• STS-2.1.2
Source: NIST Special Publication 800-22rev1a
NIST tests
• The 15 tests are:
– The Frequency (Monobit) Test,
– Frequency Test within a Block,
– The Runs Test,
– Tests for the Longest-Run-of-Ones in a Block,
– The Binary Matrix Rank Test,
– The Discrete Fourier Transform (Spectral) Test,
– The Non-overlapping Template Matching Test,
– The Overlapping Template Matching Test,
– Maurer's "Universal Statistical" Test,
– The Linear Complexity Test,
– The Serial Test,
– The Approximate Entropy Test,
– The Cumulative Sums (Cusums) Test,
– The Random Excursions Test, and
– The Random Excursions Variant Test.
Source: NIST Special Publication 800-22rev1a
NIST test – examples of tests
Source: NIST Special Publication 800-22rev1a
Diehard tests
• Set of statistical tests to verify the quality of random
number generators.
• Developed by George Marsaglia.
• Description of the test and implemetation
• Alternative GPL implemetation “Dieharder”
– Contains also implementation of NIST STS tests
Diehard tests
• Birthday spacings: Choose random points on a large interval. The spacings between the points should be asymptotically
exponentially distributed. The name is based on the birthday paradox.
• Overlapping permutations: Analyze sequences of five consecutive random numbers. The 120 possible orderings should occur
with statistically equal probability.
• Ranks of matrices: Select some number of bits from some number of random numbers to form a matrix over {0,1}, then
determine the rank of the matrix. Count the ranks.
• Monkey tests: Treat sequences of some number of bits as "words". Count the overlapping words in a stream. The number of
"words" that don't appear should follow a known distribution. The name is based on the infinite monkey theorem.
• Count the 1s: Count the 1 bits in each of either successive or chosen bytes. Convert the counts to "letters", and count the
occurrences of five-letter "words".
• Parking lot test: Randomly place unit circles in a 100 x 100 square. If the circle overlaps an existing one, try again. After 12,000
tries, the number of successfully "parked" circles should follow a certain normal distribution.
• Minimum distance test: Randomly place 8,000 points in a 10,000 x 10,000 square, then find the minimum distance between the
pairs. The square of this distance should be exponentially distributed with a certain mean.
• Random spheres test: Randomly choose 4,000 points in a cube of edge 1,000. Center a sphere on each point, whose radius is
the minimum distance to another point. The smallest sphere's volume should be exponentially distributed with a certain mean.
• The squeeze test: Multiply 231 by random floats on [0,1) until you reach 1. Repeat this 100,000 times. The number of floats
needed to reach 1 should follow a certain distribution.
• Overlapping sums test: Generate a long sequence of random floats on [0,1). Add sequences of 100 consecutive floats. The
sums should be normally distributed with characteristic mean and sigma.
• Runs test: Generate a long sequence of random floats on [0,1). Count ascending and descending runs. The counts should
follow a certain distribution.
• The craps test: Play 200,000 games of craps, counting the wins and the number of throws per game. Each count should follow
a certain distribution.
Using Password to derive cryptokeys
• Entropy of the password
– Length
– Character set
• Do not use the password directly as key
• Cryptographically process the password
– E.g. hash it
• Derivation should slow (e.g. 1 second)
– To slow down brute force attacks
PKCS#5
• PBKDF1 (Password-Based Key Derivation
Function 1)
– Up to 160 bits
– Old, replaced by newer function
• PBKDF2 (Password-Based Key Derivation
Function 2)
PBKDF2
• DK = PBKDF2(PRF, Password, Salt, c, dkLen)
– PRF is a pseudorandom function (output of hlen)
– c is the number of iterations
– dkLen is the length of the derived key
• DK = T1 || T2 || ... || Tdklen/hlen
– Ti = F(Password, Salt, Iterations, i)
• F(Password, Salt, Iterations, i) = U1 ^ U2 ^ ... ^ Uc
– U1 = PRF(Password, Salt || INT_32_BE(i))
– U2 = PRF(Password, U1)
– …
Debian random number generator flaw
On May 13th, 2008 the Debian project announced that Luciano Bello found an interesting vulnerability in the
OpenSSL package they were distributing. The bug in question was caused by the removal of the following line of
code from md_rand.c
MD_Update(&m,buf,j);
[ .. ]
MD_Update(&m,buf,j); /* purify complains */
These lines were removed because they caused the Valgrind and Purify tools to produce warnings about the use of
uninitialized data in any code that was linked to OpenSSL. You can see one such report to the OpenSSL team here.
Removing this code has the side effect of crippling the seeding process for the OpenSSL PRNG. Instead of mixing in
random data for the initial seed, the only "random" value that was used was the current process ID. On the Linux
platform, the default maximum process ID is 32,768, resulting in a very small number of seed values being used for
all PRNG operations.
Source: https://www.schneier.com/blog/archives/2008/05/random_number_b.html
Debian flaw- impact
This is a Debian-specific vulnerability which does not affect other operating systems which are not
based on Debian. However, other systems can be indirectly affected if weak keys are imported into
them.
It is strongly recommended that all cryptographic key material which has been generated by
OpenSSL versions starting with 0.9.8c-1 on Debian systems is recreated from scratch. Furthermore,
all DSA keys ever used on affected Debian systems for signing or authentication purposes should
be considered compromised; the Digital Signature Algorithm relies on a secret random value used
during signature generation.
The first vulnerable version, 0.9.8c-1, was uploaded to the unstable distribution on 2006-09-17, and
has since that date propagated to the testing and current stable (etch) distributions. The old stable
distribution (sarge) is not affected.
Affected keys include SSH keys, OpenVPN keys, DNSSEC keys, and key material for use in X.509
certificates and session keys used in SSL/TLS connections. Keys generated with GnuPG or
GNUTLS are not affected, though. Source: http://www.debian.org/security/2008/dsa-1571
Paper: Lousy Random Numbers Cause Insecure
Public Keys
In this paper we complement previous studies by concentrating on computational and randomness properties of
actual public keys, issues that are usually taken for granted. Compared to the collection of certificates considered in
[12], where shared RSA moduli are "not very frequent", we found a much higher fraction of duplicates. More
worrisome is that among the 4.7 million distinct 1024-bit RSA moduli that we had originally collected, more than
12500 have a single prime factor in common. That this happens may be crypto-folklore, but it was new to us, and it
does not seem to be a disappearing trend: in our current collection of 7.1 million 1024-bit RSA moduli, almost 27000
are vulnerable and 2048-bit RSA moduli are affected as well. When exploited, it could act the expectation of security
that the public key infrastructure is intended to achieve.
We checked the computational properties of millions of public keys that we collected on the web. The majority does
not seem to suffer from obvious weaknesses and can be expected to provide the expected level of security. We
found that on the order of 0.003% of public keys is incorrect, which does not seem to be unacceptable. We were
surprised, however, by the extent to which public keys are shared among unrelated parties. For ElGamal and DSA
sharing is rare, but for RSA the frequency of sharing may be a cause for concern. What surprised us most is that
many thousands of 1024-bit RSA moduli, including thousands that are contained in still valid X.509 certificates,
offer no security at all. This may indicate that proper seeding of random number generators is still a problematic
issue....
Source: https://www.schneier.com/blog/archives/2012/02/lousy_random_nu.html
Netscape <2: SSL random number weakness
Source: http://www.hit.bme.hu/~buttyan/courses/Revkomarom/prng.pdf
Netscape <2: SSL random number weakness
• Access to the machine with browser
– pid, ppid is known
– time can guessed +- 1 second
– microsecond unknown: 20 bits
• No access to machine with browser
– Entropy of the seed increases to max. 47 bits
• Contrast with 128 bit session key
Source: http://www.hit.bme.hu/~buttyan/courses/Revkomarom/prng.pdf
Code red worm: IP list generator
On July 12, 2001, a worm began to exploit the aforementioned buffer-overflow vulnerability in Microsoft's IIS
webservers. Upon infecting a machine, the worm checks to see if the date (as kept by the system clock) is
between the first and the nineteenth of the month. If so, the worm generates a random list of IP addresses and
probes each machine on the list in an attempt to infect as many computers as possible. However, this first
version of the worm uses a static seed in its random number generator and thus generates identical lists of IP
addresses on each infected machine. The first version of the worm spread slowly, because each infected
machine began to spread the worm by probing machines that were either infected or impregnable. The worm is
programmed to stop infecting other machines on the 20th of every month. In its next attack phase, the worm
launches a Denial-of-Service attack against www1.whitehouse.gov from the 20th-28th of each month.
Code-Red version 2 uses a random seed, so each infected computer tries to infect a different list of randomly
generated IP addresses. This seemingly minor change had a major impact: more than 359,000 machines
were infected with Code-Red version 2 in just fourteen hours.
Source: http://www.caida.org/research/security/code-red/
CVE-2014-9293
• When no authentication key is set in the
configuration file, ntpd(8) would generate a random
key that uses a non-linear additive feedback
random number generator seeded with very few
bits of entropy. [CVE-2014-9293]
• The ntp-keygen(8) utility is also affected by a similar
issue. [CVE-2014-9294]
Texas hold’em Poker application
• Based random number generation on standard
borland random number generator
• “Reliable Software Technologies” developed a tool
that required five cards from the deck to be known.
Source: http://www.ibm.com/developerworks/library/s-playing/
Dice-o-matic 