<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <br>

    <blockquote type="cite"

      cite="mid:5fd66107.1c69fb81.9ecf5.ba03@mx.google.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <div dir="auto">Beefing up the out of order execution prediction

        is definitely the main reason why the M1 SoC performs well - but

        this sort of execution can only be efficient if the instruction

        size remains constant, as is the case with RISC-only

        architechtures like ARM. This is where SGI MIPS was headed

        before they died.</div>

    </blockquote>

    <p>Pipeline approach to design essentially partitions tasks that

      forces speed-of-light sensitive things to be localized in regions,

      allowing overall sprawl of silicon estate.<br>

    </p>

    <blockquote type="cite"

      cite="mid:5fd66107.1c69fb81.9ecf5.ba03@mx.google.com">

      <div dir="auto">The main takeaway here IMO is that now that Apple

        has demonstrated that a phone SoC can be beefed up to perform

        general-purposed computing well, we'll start seeing more of this

        hit the market in the workstation space.  And when those fast

        SoC systems start running Linux, developers will flock to them

        and that will accelerate the adoption of ARM in the

        cloud/datacenter.  Yes, Amazon has their nice Graviton platform,

        but without developers running ARM on their workstations,

        adoption of ARM in the cloud/datacenter is not going to gain a

        lot of traction.</div>

      <div dir="auto"><br>

      </div>

      <div dir="auto">If Apple allowed Linux to run natively on their M1

        SoC, it would actually be a game-changer in this space. But that

        would require they release their SoC documentation to the open

        source community, as well as digitally sign Linux boot

        components im their secure enclave (neither of which is likely

        because Apple is as closed as Oracle's wallet ;-)</div>

      <div dir="auto"><br>

      </div>

      <div dir="auto">What I'm most interested in seeing in the coming

        years is what Nvidia is planning for ARM (no matter what they

        say, they definitely have a plan in mind if they bought ARM).</div>

    </blockquote>

    <p>There is this new outpost of "make your RISC-V silicon" shop,

      <a class="moz-txt-link-freetext" href="https://www.sifive.com/risc-v-core-ip">https://www.sifive.com/risc-v-core-ip</a> They also seem to have whole

      RISC-V board for sys devs.<br>

    </p>

    <p>May be this approach will open floodgates of choice, goodness,

      rainbows, etc. :)<br>

    </p>

    <blockquote type="cite"

      cite="mid:5fd66107.1c69fb81.9ecf5.ba03@mx.google.com">

      <blockquote type="cite">

        <div>

          <div dir="auto">Found a nice blog post explaining why M1 is

            fast. </div>

          <div><a

href="https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2"

              moz-do-not-send="true">https://debugger.medium.com/why-is-apples-m1-chip-so-fast-3262b158cba2</a></div>

        </div>

      </blockquote>

      <p>I knew it! I felt it all my life! It takes insurmountable

        amount of time to prepare place for painting, more than painting

        itself takes. ... Eight preppers of micro-ops in M1 versus four

        in Intel/AMD.<br>

      </p>

      <p>I still have feeling that co-locating memory also helps

        preppers' result, besides the benefit of RISC's constant length

        of instruction.</p>

      <p>It also explains talks of AMD going with ARM. RISC-y business

        :)<br>

      </p>

      <blockquote type="cite">

        <div>

          <div class="gmail_quote">

            <blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc

              solid;padding-left:1ex" class="gmail_quote">

              <div>

                <div>Rust provides both Atomic Reference Counting

                  (called Arc) and non-atomic Reference Counting (called

                  Rc). You choose the one that makes sense. Hopefully

                  the type system complains if you use Rc in a context

                  where atomicity is required, but I don't use Rust. C++

                  provides only atomic refcounting in the standard

                  library; for the other kind you roll your own (which I

                  have done).<br>

                  <blockquote id="m_816261540811853656qt" type="cite">

                    <p><moving into discussing silicon and near

                      it><br>

                    </p>

                    <blockquote type="cite">

                      <div>Another trick is that Apple's dev languages

                        and frameworks (Swift and Objective-C) use

                        reference counting, which requires atomic

                        increments and decrements. On Intel, these

                        operations are five times slower than non-atomic

                        operations; on Apple Silicon they run at the

                        same speed. This is something I wish the other

                        CPU vendors would get right, because refcounting

                        has some technical advantages over tracing GC,

                        and I use it in software I write. C++ and Rust,

                        both "performance" languages, provide

                        refcounting but not tracing GC.<br>

                      </div>

                      <blockquote id="m_816261540811853656qt-qt"

                        type="cite">

                        <div>Regarding M1. My Understanding is that

                          placement of RAM inside of processor

                          package/silicon is the trick that makes it run

                          fast. Is there anything else?<br>

                        </div>

                        <div> <br>

                        </div>

                        <blockquote type="cite">

                          <div dir="ltr"> The Apple M1 looks decent, but

                            since Apple no longer lets you run Linux on

                            their hardware, I have no desire to ever buy

                            one.<br>

                          </div>

                        </blockquote>

                      </blockquote>

                    </blockquote>

                    <div>Does Rust standard refcounting, or

                      implementation of such pointers need to use atomic

                      in/decrements? Can't it use non-atomic something,

                      given a more detailed knowledge of ownership? Just

                      wondering.<br>

                    </div>

                  </blockquote>

                </div>

              </div>

            </blockquote>

          </div>

        </div>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>