<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-text-html" lang="x-unicode">

      <div dir="ltr">

        <div>The following is an in-depth analysis of the topic by Tim

          McLean (<a class="moz-txt-link-freetext" href="https://www.chosenplaintext.ca/">https://www.chosenplaintext.ca/</a>). Posting his thoughts

          with the permission.<br>

          <br>

          ***<br>

          <br>

          Hmmmmm, this is an interesting question!</div>

        <div>It seems like it would depend significantly on how postgres

          implements the index. I think it would definitely make the

          attack harder.<br>

        </div>

        <div><br>

        </div>

        <div>It looks like 'hash' indexes in postgres are implemented

          using linear hashing: <a

href="https://github.com/postgres/postgres/tree/master/src/backend/access/hash">https://github.com/postgres/postgres/tree/master/src/backend/access/hash</a></div>

        <div>Roughly, a hash function <i>H</i> is applied to a key <i>k</i>

          (in our case, k = the session id), and the last <i>b</i> bits

          of <i>H</i>(<i>k</i>) are used to determine the memory

          address of one of 2^<i>b</i> buckets.</div>

        <div>Inside the bucket, a binary search is used to find the

          entry matching <i>H</i>(<i>k</i>).<br>

        </div>

        <div>(There's also some fancy stuff going on to handle adding

          more buckets as the index grows in size, so <i>b</i> will

          change over time, but that's not too relevant here)</div>

        <div><br>

        </div>

        <div>So it seems like timing attacks could discover a fair bit

          of information about <i>H</i>(<i>k</i>). Cache timing attacks

          could figure out which bucket was accessed, and maybe where in

          the bucket the binary search ended up at. Also, the binary

          search might involve comparisons that are not timing safe.<br>

        </div>

        <div><br>

        </div>

        <div>Now, learning <i>H</i>(<i>k</i>) might not be useful to

          the attacker, since hash functions can be irreversible. So,

          what is <i>H</i>?</div>

        <div><br>

        </div>

        <div><i>H</i> seems to be a custom hash function design: <a

href="https://github.com/postgres/postgres/blob/73b7f48f78d27b1baf1a6541cbaae0fe6bd6186d/src/backend/access/hash/hashfunc.c#L332">https://github.com/postgres/postgres/blob/73b7f48f78d27b1baf1a6541cbaae0fe6bd6186d/src/backend/access/hash/hashfunc.c#L332</a></div>

        <div>Since it's not a cryptographic hash, the normal security

          guarantees do not apply. Learning <i>H</i>(<i>k</i>) might

          reveal useful info about <i>k</i>! So that may be a problem.<br>

        </div>

        <div><br>

        </div>

        <div>(Aside: this all makes me wonder if postgres is vulnerable

          to a denial of service attack where an attacker purposely

          supplies values that all hash to the same bucket index,

          forcing worst-case performance?)</div>

        <div><br>

        </div>

        <div>Another thought that I have: if <i>H</i> is weak enough,

          then once we have learned <i>H</i>(<i>k</i>), we may be able

          to generate preimages (other values that have the same hash)

          that look like session ids.</div>

        <div>So we can generate <i>k</i>_0, <i>k</i>_1, ..., <i>k</i>_F

          such that <i>H</i>(<i>k</i>_0) = <i>H</i>(<i>k</i>_1) = ...

          = <i>H</i>(<i>k</i>_F) = <i>H</i>(<i>k</i>). In other words,

          we find a bunch of session id strings that hash to the right

          value.</div>

        <div>Maybe we can make it so that <i>k</i>_0 starts with '0', <i>k</i>_1

          starts with '1', ..., <i>k</i>_F starts with 'F'.<br>

        </div>

        <div>My understanding is that, if we try a session id that has

          the right hash value, postgres will access the right bucket,

          find the entry with the real session id, and then do a string

          comparison between the real session id with the session id

          that was provided. This comparison probably isn't timing safe!</div>

        <div><br>

        </div>

        <div>I don't know if this attack would work though, since it

          really depends on how strong <i>H</i> is.<br>

        </div>

        <div><br>

        </div>

        <div>***</div>

        <div><br>

        </div>

        <div>How about a different solution? Let's assume the session

          ids are generated using a cryptographically secure random

          number generator and have 128 bits of entropy (e.g. 16 bytes

          encoded as 32 hex chars).</div>

        <div><br>

        </div>

        <div>If we hash the session id with a cryptographic hash (say,

          SHA-256), it would be impossible to figure out the session id

          from looking at the hash (it would take 2^127 guesses on

          average, and the world doesn't have enough electricity to do

          this many calculations -- no need for a password hash here).<br>

        </div>

        <div><br>

        </div>

        <div>So what if, instead of storing the session id in the

          database, we store `sha256(session_id)` instead? When we get a

          session id from a client, we hash the session id and do a

          database query with the hash.</div>

        <div><br>

        </div>

        <div>The database might do a string comparison that leaks

          information about the hash -- that's fine. Even if the

          attacker gets the full hash, they still can't figure out the

          session id. The only code at risk for timing attacks now is

          the piece that receives the session id from the client and

          hashes it with SHA-256. It's unusual to see timing leaks in

          SHA-256 though, so that's unlikely to be a problem.<br>

        </div>

        <div><br>

        </div>

        <div>The important piece here is that the server must do the

          hashing. If we do the session id hashing on the client side,

          then we're back to the original problem again :)<br>

        </div>

        <div><br>

        </div>

        <div>Tim<br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Wed, Jun 6, 2018 at 3:00 PM, Mikalai

          Birukou <span dir="ltr"><<a href="mailto:mb@3nsoft.com"

              target="_blank">mb@3nsoft.com</a>></span> wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Tim,<br>

            <br>

            Following timing attacks thoughts and a recent KWLUG talk

            about Postgresql db, the following idea came to mind. What

            do you think?<br>

            <br>

            Cheers.<br>

            <br>

            -------- Forwarded Message --------<br>

            <br>

            Context (as a follow up to Hadi's presentation on the 4-th

            of June):<br>

            <br>

            Sometimes we store secret session ids in db, and we use

            these for<br>

            authentication. Usually there is query that get respective

            record,<br>

            searching a table for a given by user session id.<br>

            Usual `WHERE` clause uses the most fast comparison, which

            run timing is<br>

            dependent on input values. This can be used as a base for an

            attack with<br>

            session id guessing via timing.<br>

            <br>

            Question:<br>

            <br>

            Hadi, and others, will hash index work as a quick fix here?<br>

            <br>

            1) Should ensure that a query, abusable by an attack, always

            uses index.<br>

            <br>

            2) Index on a secret (e.g. session id) is a hash index. I

            assume it<br>

            means that provided by a user (attacker) value is hashed,

            and fast<br>

            comparison will occur on hashes.<br>

            Hashes that are close to each other, like `a...`, `aD...`,

            `aDf...`,<br>

            which is a guessing sequence will be coming from values that

            are not<br>

            very close to each other. Unless hash has a simple unhash

            function,<br>

            there seem to be no way to arrange guessing sequence on an

            attacking side.<br>

            <br>

            If both 1 and 2 are achievable, then a simple switch to hash

            index can<br>

            be a mitigation.<br>

            Whereas, the other way, as heard on one of ProtonMail's tech

            talks, is<br>

            for developer to split a secret token into two part, with

            only first<br>

            part being used in `WHERE` clause, while biz app does

            constant time<br>

            comparison of the second part.<br>

            <br>

            ______________________________<wbr>_________________<br>

            kwlug-disc mailing list<br>

            <a href="mailto:kwlug-disc@kwlug.org" target="_blank">kwlug-disc@kwlug.org</a><br>

            <a

              href="http://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org"

              rel="noreferrer" target="_blank">http://kwlug.org/mailman/listi<wbr>nfo/kwlug-disc_kwlug.org</a><br>

            <br>

          </blockquote>

        </div>

        <br>

      </div>

    </div>

  </body>

</html>