[kwlug-disc] SSDs and TRIM

Chris Irwin chris at chrisirwin.ca
Thu Apr 29 23:23:02 EDT 2010


On Thu, 2010-04-29 at 22:18 -0400, Lori Paniak wrote:
> On Thu, 2010-04-29 at 19:01 -0400, John Van Ostrand wrote:
> > ----- "Lori Paniak" <ldpaniak at fourpisolutions.com> wrote:
> > > I'd say get a disk with good idle GC and let each disk take care of
> > > itself.  Don't bother worrying about the OS trying to operate TRIM
> > > through LVM/RAID/USB/???
> > 
> > How does it know which sectors are garbage?
> > 
> 
> I'm certain you know more about the details of this topic than I do but,
> from the document I quoted in the post:
> 
> 
> > When it is time to consolidate blocks to free up space, the SSD must copy all of the data it considers valid to a new
> > block before it can erase the current block. Without trim, the SSD doesn’t know a page is invalid unless the LBA
> > associated with it has been rewritten.
> > Trim is simply the function of the operating system telling the drive that a page is no longer valid.
> > 
> 
> I take this to mean that garbage collection autonomous of the OS (trim)
> will only occur on blocks for which the associated addressing info has
> been re-written.  I guess the disk knows about that and the OS can be
> depended upon to (occasionally) rewrite addressing info?  Trim allows
> the SSD GC to operate precisely on finer disk (filesystem?) structures
> for a more efficient and complete recovery of unused resources.
> 
> What exactly is the relationship between a "page" and a "block"?

A block is a collection of pages, apparently.

AnandTech has a really good article talking about SSDs. It mostly
references some older disks, and even newer firmware can actually
drastically affect the performance of these units (such as GC ;)

http://www.anandtech.com/show/2738/8

So based on that article and the awesome OCZ paper you linked, the GC
works based on the wear-leveling mechanism. You write "Logical" disk
address 1 five times, but the SSD actually maps each write to different
physical pages to spread wear, and maps this back to the OS. So your OS
saw five overwrites, the SSD actually did five new writes. The GC sees
the first four pages are not mapped to a logical address.

From the anandtech article, the basic problem is this: You can write
single pages, but you can't overwrite without erasing first. You can
only erase blocks (multiple pages). So to recover those first four
pages, you have to read the last page, write it elsewhere, then erase
the whole first block.

Apparently the original SSDs didn't have the GC, only wear leveling. So
that read/write/erase didn't happen until there was a pending write that
needed to be mapped there (which was after some time when the leveler
logic wrapped around to non-empty pages. Thus performance suddenly dies.
GC just does this earlier so it doesn't hold up pending writes. 

The issue is that the GC very much depends on having free space to
shuffle things around with. OCZ actually recommends partitioning less
than 100% of their drive. They said 50% is ideal, I'd imagine 75% is
somewhat realistic. Less utilized space would net you better GC
performance.

TRIM is a complement to that as far as I understand. GC handles
re/over-written data while TRIM handles deleted data. Whereas the GC has
to wait for an overwrite, a filesystem with TRIM support would have
passed back a list of pages belonging to a file when it was deleted. The
alternate method to using TRIM is to just pass a big list of "free"
filesystem pages periodically (cron, etc). This essentially allows the
GC to preemptively erase unused pages, even if they are not overwritten.

The article you linked to actually suggested partitioning less than 100%
of the drive to allow the wear-leveller and GC a bit more headroom. With
less available space, you force your filesystem to overwrite more often,
allowing the disk GC to do it's job. Even if you fill the filesystem, it
still can still run the GC efficiently. Since with TRIM support, free
filesystem pages = free SSD pages, you could allocate 100% of your disk
to a filesystem, and your performance would still be good.

So here is a scary thought: You fill your disk, then delete 50% of the
data. With TRIM support, you now have 50% of that data immediately
marked as garbage for the GC to erase, it consolidates, erases, and you
are back to having decent write performance in a short period of time.
Without TRIM the GC will *never* know that space is not used, and you
will now *always* be degraded, until you wipe the SSD and start over.

LVM actually sounds like a lot of trouble for a SSD now. If you ever
delete or shrink a LV, your "recovered" space would still be marked as
in-use on the drive. They would not be touched by the GC, and thus would
not include those pages in the wear-leveling algorithm. Sub-optimal.

-- 
Chris Irwin
e:  chris at chrisirwin.ca
w: http://chrisirwin.ca
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20100429/d35a21c0/attachment.sig>


More information about the kwlug-disc mailing list