[kwlug-disc] low tech traffic splitting

Khalid Baheyeldin kb at 2bits.com
Wed Feb 17 17:29:50 EST 2010


The way I understood Glenn's original question is that he was looking for
more capacity in order to survive a traffic spike.

My response basically said that, from past slashdotting experiences, a lowly
512MB Xen VPS can easily survive Slashdotting, properly configured.

You need 2 things:

1. A fat pipe. This means the regular 128/256/512 kbps bandwidth upload is
not going to cut it. If your server is in a good datacenter, then you can
easily get 10 Mbps or more, and that would be sufficient.

2. Caching, so you serve stuff from the web server only, without any dynamic
stuff overhead.

But it all depends on the application. If it does not support caching
internally, then something like Squid or Varnish can help greatly, provided
the app send proper HTTP cache headers. No further modification to the
application is necessary.

So before you go multi server, and over-engineer it to oblivion, just try
the simple things first. Often there are low hanging fruit that can be had
easily for considerable gain. After that things get harder and harder for
less and less gain (diminishing returns).

People go for multiple servers for fault tolerance (one goes down, you still
serve pages to visitors), or for very large scale (when one server can't
keep it up), or both. A mere Slashdotting does not on its own warrant the
complexity.

We have a client who passed the 2.5 million page views a day, 64 million per
month, on a single server.

One way to use 2 servers, is one for MySQL and the other for
PHP/Apache/application. This can be beneficial in certain situations.

Splitting out functionality is another option. If each option is on a
subdomain, then you can split them out to multiple servers, provided there
is no restriction (e.g. single database table sharing across sites for
single signon and such, an ugly hack that I dislike, but peopel do it).

Another way is settings one server for Squid or Varnish as a caching proxy,
and hence traffic from spikes is server really fast from one machine, and
only dynamic requests make it through the first machine to the second
machine that has the PHP/MySQL stuff on it.

Yet another option is using a Content Delivery Network. What they do is
cache your content in a geographically distributed manner, so users hit the
cache nearer to them, and not your server. This means you outsource the
complexity to someone else and not worry about it yourself. You still serve
the site form a small server. If you do that, then make sure you measure
visitors via something other than Apache logs, for example Google Analytics.


The big name here is Akamai, but they are really expensive. Other ones to
explore that are less expensive are

http://www.pantherexpress.net/solutions/
http://www.limelightnetworks.com/infrastructure-services-content-delivery/

On Wed, Feb 17, 2010 at 4:28 PM, unsolicited <unsolicited at swiz.ca> wrote:

> Correct me if I'm wrong / it's my impression and I'd be glad to hear
> differently:
>
> Aside from both Khalid's and John's messages (which are good stuff) ...
>
> Khalid seems to be essentially saying, that simplest / single point of
> serving / K.I.S.S. is best. (For maintaining your sanity.) As John pointed
> out, see Khalid's signature. But (as per Khalid), you can build up the
> robustness of your infrastructure, e.g. ensure adequate pipe sizes (probably
> already in place), try to prefer static over dynamic pages, and so on and so
> forth.
>
> And John points out some ways to address things when the above measures are
> inadequate.
>
>
> It also seems to me, I'm guessing, you could split out your load if you
> can, in the sense of the web servicing you are offering. So, for example, if
> you have 'internal' clients / applications, as well as these static web
> pages, on the same server, and they don't really depend upon each other ...
>
> If you create a second server, and put one or the other of these areas on
> it, then when one gets nailed (and perhaps addressed via Khalid's measures),
> the rest of your servicing won't be impacted. (Assuming adequate pipe size,
> etc.) i.e. Split out functionality.
>
> So, news, announcements, blogs, etc., on one server. Apps. on another.
> Perhaps with a failover from the former to the latter, for uptime. The last
> (perhaps?) via the round robin dns John mentions. And at time of failure,
> you can make the call that either you'll get the non-app server up in
> sufficient time that you can let the app server bear the load for the
> moment, or that you're not able to achieve that and you take the app-server
> out of the round robin. Leaving your static pages unavailable for the
> moment, unfortunately, but at least continuing to serve your clients.
>
> Even in that situation you might hope that google caching will make things
> less debilitating than they otherwise might be.
>
> I'm guessing it would be useful for you to form a plan as to how you might
> handle things, beforehand. [As you are doing now, of course.] Such as, do
> you have that overloaded / redirect page in place "I'm sorry, our servers
> are currently overloaded and you have been redirected to this message. We
> are currently making best effort to restore full functionality. Please check
> back with us in a little while." that you can set in motion when you have
> to. Better to figure out that contingency beforehand to effect when or if
> you have to, than suddenly be there and have to think through things then,
> in panic mode. e.g. If the load turns out to be too much for the app server,
> and you're beating on the non-app server as quickly as you can, you can
> enable this redirect so that you don't just disappear into the void of
> unavailable / unresolved. a la wikiservices.openoffice.org, currently.
>
> For whatever this is worth.
>
>
> john at netdirect.ca wrote, On 02/17/2010 2:22 PM:
>
>  kwlug-disc-bounces at kwlug.org wrote on 02/17/2010 02:06:23 PM:
>>
>>> If we're trying to preplan for getting slashdotted on our webservers,
>>> what's the best low tech way to handle that?  I've got two identical
>>> webservers I can light up, but don't know how to share traffic, sync, etc.
>>>  I suspect some type of clustering is the answer in linux these days, but I
>>> don't even know what that really is :).
>>>
>>
>> It depends on your web application and back-end infrastructure. You need
>> to answer this question: can I have both web servers fully utilised by
>> clients without any problems. If each web server has its own database on
>> which users leave data, can you consolidate the databases? Do users get
>> meaningful cookies? Can you share the cookie database?
>>
>> Typically, for servers that are static, a round-robin DNS will be the
>> simplest. Just add several A records for the same web site name. Put a
>> suitably long Time-To-Live on the records so that clients can finish their
>> business with the same server and you'll likely be okay. If one of the
>> servers goes down then half your clients will get errors until the TTL runs
>> out.
>>
>> For servers or applications that require state tracking (think SSL,
>> cookies, shopping cart, etc.) then an application level proxy or load
>> balancer is required. This is free software that tracks each session and
>> makes sure that the user is directed to the same server. This is used in
>> combination with RR DNS. Load balancers monitor back-end servers so if one
>> goes down requests are forwarded to a surviving server. Users on failed
>> servers would have their session disappear, similar to a time out. For
>> redundancy two load balancing servers are used and are clustered so that if
>> one goes down, the other will take it's load.
>>
>> Finally some back-end sharing of state is needed to ensure that if users
>> are forced to another server their state is maintained, e.g. their shopping
>> cart remains, etc.
>>
>> We have done work with Red Hat Cluster Server which bundles all the free
>> tools for managing load balancing and failover of load balancers. It also
>> handles application failover.
>>
>
> _______________________________________________
> kwlug-disc_kwlug.org mailing list
> kwlug-disc_kwlug.org at kwlug.org
> http://astoria.ccjclearline.com/mailman/listinfo/kwlug-disc_kwlug.org
>



-- 
Khalid M. Baheyeldin
2bits.com, Inc.
http://2bits.com
Drupal optimization, development, customization and consulting.
Simplicity is prerequisite for reliability. --  Edsger W.Dijkstra
Simplicity is the ultimate sophistication. --   Leonardo da Vinci
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20100217/093908b7/attachment.htm>


More information about the kwlug-disc mailing list