<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-text-html" lang="x-unicode">
<div dir="ltr">
<div>The following is an in-depth analysis of the topic by Tim
McLean (<a class="moz-txt-link-freetext" href="https://www.chosenplaintext.ca/">https://www.chosenplaintext.ca/</a>). Posting his thoughts
with the permission.<br>
<br>
***<br>
<br>
Hmmmmm, this is an interesting question!</div>
<div>It seems like it would depend significantly on how postgres
implements the index. I think it would definitely make the
attack harder.<br>
</div>
<div><br>
</div>
<div>It looks like 'hash' indexes in postgres are implemented
using linear hashing: <a
href="https://github.com/postgres/postgres/tree/master/src/backend/access/hash">https://github.com/postgres/postgres/tree/master/src/backend/access/hash</a></div>
<div>Roughly, a hash function <i>H</i> is applied to a key <i>k</i>
(in our case, k = the session id), and the last <i>b</i> bits
of <i>H</i>(<i>k</i>) are used to determine the memory
address of one of 2^<i>b</i> buckets.</div>
<div>Inside the bucket, a binary search is used to find the
entry matching <i>H</i>(<i>k</i>).<br>
</div>
<div>(There's also some fancy stuff going on to handle adding
more buckets as the index grows in size, so <i>b</i> will
change over time, but that's not too relevant here)</div>
<div><br>
</div>
<div>So it seems like timing attacks could discover a fair bit
of information about <i>H</i>(<i>k</i>). Cache timing attacks
could figure out which bucket was accessed, and maybe where in
the bucket the binary search ended up at. Also, the binary
search might involve comparisons that are not timing safe.<br>
</div>
<div><br>
</div>
<div>Now, learning <i>H</i>(<i>k</i>) might not be useful to
the attacker, since hash functions can be irreversible. So,
what is <i>H</i>?</div>
<div><br>
</div>
<div><i>H</i> seems to be a custom hash function design: <a
href="https://github.com/postgres/postgres/blob/73b7f48f78d27b1baf1a6541cbaae0fe6bd6186d/src/backend/access/hash/hashfunc.c#L332">https://github.com/postgres/postgres/blob/73b7f48f78d27b1baf1a6541cbaae0fe6bd6186d/src/backend/access/hash/hashfunc.c#L332</a></div>
<div>Since it's not a cryptographic hash, the normal security
guarantees do not apply. Learning <i>H</i>(<i>k</i>) might
reveal useful info about <i>k</i>! So that may be a problem.<br>
</div>
<div><br>
</div>
<div>(Aside: this all makes me wonder if postgres is vulnerable
to a denial of service attack where an attacker purposely
supplies values that all hash to the same bucket index,
forcing worst-case performance?)</div>
<div><br>
</div>
<div>Another thought that I have: if <i>H</i> is weak enough,
then once we have learned <i>H</i>(<i>k</i>), we may be able
to generate preimages (other values that have the same hash)
that look like session ids.</div>
<div>So we can generate <i>k</i>_0, <i>k</i>_1, ..., <i>k</i>_F
such that <i>H</i>(<i>k</i>_0) = <i>H</i>(<i>k</i>_1) = ...
= <i>H</i>(<i>k</i>_F) = <i>H</i>(<i>k</i>). In other words,
we find a bunch of session id strings that hash to the right
value.</div>
<div>Maybe we can make it so that <i>k</i>_0 starts with '0', <i>k</i>_1
starts with '1', ..., <i>k</i>_F starts with 'F'.<br>
</div>
<div>My understanding is that, if we try a session id that has
the right hash value, postgres will access the right bucket,
find the entry with the real session id, and then do a string
comparison between the real session id with the session id
that was provided. This comparison probably isn't timing safe!</div>
<div><br>
</div>
<div>I don't know if this attack would work though, since it
really depends on how strong <i>H</i> is.<br>
</div>
<div><br>
</div>
<div>***</div>
<div><br>
</div>
<div>How about a different solution? Let's assume the session
ids are generated using a cryptographically secure random
number generator and have 128 bits of entropy (e.g. 16 bytes
encoded as 32 hex chars).</div>
<div><br>
</div>
<div>If we hash the session id with a cryptographic hash (say,
SHA-256), it would be impossible to figure out the session id
from looking at the hash (it would take 2^127 guesses on
average, and the world doesn't have enough electricity to do
this many calculations -- no need for a password hash here).<br>
</div>
<div><br>
</div>
<div>So what if, instead of storing the session id in the
database, we store `sha256(session_id)` instead? When we get a
session id from a client, we hash the session id and do a
database query with the hash.</div>
<div><br>
</div>
<div>The database might do a string comparison that leaks
information about the hash -- that's fine. Even if the
attacker gets the full hash, they still can't figure out the
session id. The only code at risk for timing attacks now is
the piece that receives the session id from the client and
hashes it with SHA-256. It's unusual to see timing leaks in
SHA-256 though, so that's unlikely to be a problem.<br>
</div>
<div><br>
</div>
<div>The important piece here is that the server must do the
hashing. If we do the session id hashing on the client side,
then we're back to the original problem again :)<br>
</div>
<div><br>
</div>
<div>Tim<br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Jun 6, 2018 at 3:00 PM, Mikalai
Birukou <span dir="ltr"><<a href="mailto:mb@3nsoft.com"
target="_blank">mb@3nsoft.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Tim,<br>
<br>
Following timing attacks thoughts and a recent KWLUG talk
about Postgresql db, the following idea came to mind. What
do you think?<br>
<br>
Cheers.<br>
<br>
-------- Forwarded Message --------<br>
<br>
Context (as a follow up to Hadi's presentation on the 4-th
of June):<br>
<br>
Sometimes we store secret session ids in db, and we use
these for<br>
authentication. Usually there is query that get respective
record,<br>
searching a table for a given by user session id.<br>
Usual `WHERE` clause uses the most fast comparison, which
run timing is<br>
dependent on input values. This can be used as a base for an
attack with<br>
session id guessing via timing.<br>
<br>
Question:<br>
<br>
Hadi, and others, will hash index work as a quick fix here?<br>
<br>
1) Should ensure that a query, abusable by an attack, always
uses index.<br>
<br>
2) Index on a secret (e.g. session id) is a hash index. I
assume it<br>
means that provided by a user (attacker) value is hashed,
and fast<br>
comparison will occur on hashes.<br>
Hashes that are close to each other, like `a...`, `aD...`,
`aDf...`,<br>
which is a guessing sequence will be coming from values that
are not<br>
very close to each other. Unless hash has a simple unhash
function,<br>
there seem to be no way to arrange guessing sequence on an
attacking side.<br>
<br>
If both 1 and 2 are achievable, then a simple switch to hash
index can<br>
be a mitigation.<br>
Whereas, the other way, as heard on one of ProtonMail's tech
talks, is<br>
for developer to split a secret token into two part, with
only first<br>
part being used in `WHERE` clause, while biz app does
constant time<br>
comparison of the second part.<br>
<br>
______________________________<wbr>_________________<br>
kwlug-disc mailing list<br>
<a href="mailto:kwlug-disc@kwlug.org" target="_blank">kwlug-disc@kwlug.org</a><br>
<a
href="http://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org"
rel="noreferrer" target="_blank">http://kwlug.org/mailman/listi<wbr>nfo/kwlug-disc_kwlug.org</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
</body>
</html>