» Revisiting Faster PHP Sessions
"Simplicity is prerequisite for reliability." - Edsger W. Dijkstra.
As our experience grows, we learn from past mistakes and discover what's truely important in reliable systems.
When designing systems, simplicity is an often heard mantra, but it isn't getting applied nearly as much as spoken off. I'm guilty of this too. I think it's mainly because engineers love to, well, engineer : ) and will naturally try to outsmart problems by throwing more tech at it.
Article vs Article
In the light of this, I revisit my 2008 article Enhance PHP session management. The article explains how you can use a central memcache server to store sessions for performance & scalability purposes.
Having a shared something when you can avoid it is asking for problems, and I was just throwing unneeded tech at this: network protocols, pecl modules, configuration. All vulnerable to bugs, maintenance, performance penalties and outage.
Using 2007 article Create turbocharged storage using tmpfs, we can
defeat some of this over-engineering and take a simpler approach to speeding up sessions in PHP.
We'll store them decentralized in memory by mounting RAM onto the existing /var/lib/php5 session directories throughout your application servers, which I will call nodes from now on.
Make Session Dir Live in RAM
Add this to your /etc/fstab:
# Make PHP Sessions live in RAM tmpfs /var/lib/php5 tmpfs size=300M,atime 0 0
This will make sure the 300MB RAM device will be available on your next reboot as well.
300MB is a lot.
You can decrease it later on by changing the /etc/fstab entry and
executing mount -o remount /var/lib/php5
Activate & Migrate Existing Sessions
Then execute:
# Create a temporary place for current sessions mkdir -p /tmp/phpsessions/ # Move current sessions to it mv /var/lib/php5/* /tmp/phpsessions/ # Activate our ramdisk mount -a # Move the current sessions back mv /tmp/phpsessions/* /var/lib/php5/ # Remove the temporary placeholder rmdir /tmp/phpsessions
Advantages
What's nice about saving sessions in a tmpfs device compared with saving in memcache is:
- you can migrate to this solution without logging people out :)
- nothing needs to be installed
- instead of throwing errors, it degrades gracefully as disk storage if implementation fails
- you can restart/flush/upgrade any existing memcache instances without people losing sessions
- it uses the default
/var/lib/php5directory, so no.inichanges, and PHP's garbage collector will still purge old sessions - it takes away a bottleneck & single point of failure in your architecture
- it's just a mountpoint, so existing monitoring tools will automatically trigger alerts when you need to allocate more space
- no locking issues with ajax calls (though I believe fixed in memcached-3.0.4beta)
- no protocol overhead
- less tech, so less prone to errors & bugs, easier upgrade process
Decentralizing
Now this doesn't work in clusters without Sticky Sessions.
But you've got to ask yourself: in huge clusters, do you really want Shared Sessions? The bigger the cluster, the more vulerable you'll become as
it really only adds a bottle-neck & single point of failure to your architecture.
With decent loadbalancers like EC2's ELB, Pound, HAProxy it becomes childsplay to implement Sticky Sessions so that people keep ending up on the node that has their session.
When you're designing to tolerate failure, this architecture
is much more robust than depending on anything shared.
Yes, some people will be logged out when you shut down a node (vs all when your session store goes down).
To counter you could:
- drain a node's connections before you take it into planned maintenance, this way nobody is affected
- rsync sessions between nodes if it's crucial that all sessions survive outage.
This could even be automated where nodes can cover for eachother. If it's worth the investment depends on your application. Are your nodes likely to go down completely? How many customers will get logged out? What kind of data is lost?
Even if your session store is clustered and uses persistent storage like
Redis or
MySQL
(not the right tool for the job people): network outage, maintenance and misconfiguration can hurt you badly, logging out all customers or worse, throwing errors throughout your platform.
Problems will be bigger and harder to solve.
Whereas if the RAM mountpoint fails, /var/lib/php5 just degrades gracefully as normal disk-based storage. Making sessions slower on that 1 node, but at you'll still be serving customers.
I welcome your thoughts on this!
You Probably Shouldn't Follow Me
Like this Article?
| I'd appreciate it if you leave a comment, spread the word, or consider a small donation |
RelatedArticles like this one» Create turbocharged storage using tmpfs |
tags: php, performance, PECL, disk IO, memcache, devshm, tmpfs, linux, RAM
category: Howto - Webserver
read: 13,464 times
tagcloud
#11. Maciej Lisiewski on 31 December 2011
1. The obvious: when the machine crashes all sessions are lost. While sometimes that is acceptable it's a definite no-go for ecommerce sites that share session id with cart id (common practice to reduce number of cookies).
2. Limited (compared to other storage solutions) capacity. With small sessions you'll end up with a 1-4 KB session files. That means that 300MB you are using allows 75-300k concurrent sessions. Seems a lot til you remember that webcrawlers from Google, Yahoo, Bing, etc don't use cookies and each request equals new session. With a large website number of requests in millions per day is not uncommon. Boom, suddenly no new session can be created, 502.
... [more]
3. As with any file storage you loose all the benefits of storing sessions in relational database - the most basic being an ability to modify session of a specific user (log him out for example) - this way you can actually store permissions in session and avoid querying permissions per request.
I went for a hybrid solution:
- sessions are stored both in database and key-store (APC/memcached/whatever I plug in depending on site size)
- database is checked only if key-store returns no match - if there is a match in db it's copied to key-store.
- database is written to only if session value is modified or if time since last persistent (db) store is longer than predefined interval
- most popular/aggressive crawlers are matched by user agent and no session is created for them
Result is fast. Not as fast as tmpfs, but close enough and it supports sharding out of the box (add modulo of request id to select memcached and/or db shard and you're done) without the need to sync files. No fancy clusters - just add new box if you need it and change divisor in config.
Garbage collection is dead simple - cron job that deletes entries from db that had no persistent store for predefined interval - key-store expires on it's own.
You can kill any memcached instance - new one will load all the sessions from db (there will be a load spike, but each will be read just once).
It degrades gracefully if at least db works. If it doesn't you're fucked anyway ;-)
#10. webhoster on 30 December 2011
Still might need to cluster a central data layer.
#9. Janak on 01 December 2011
#8. bird on 30 November 2011
#7. Mike on 01 November 2011
If I have 3 servers, and I use sticky sessions and I do have a flash flood, I would get:
- webserver1 - 10% load
- webserver2 - 150% load
... [more] - webserver3 - 40% load
instead of using all three in an even way.
Maybe sticky would be nice for 10-20+ servers in a cluster, numbers that would led the statistics work and assure an almost even distribution.
But having inbalance in an small cluster is like not having the cluster at all. We do LB for performance _and_ failover not only for failover.
#6. David on 25 August 2011
I implemented this as we had a lot of issues with session_start taking up to 3 seconds, just because of opening files, and our page loads decreased drastically!
Since this was an absolutely fantastic result, we decided to save our cached objects (which were now cached into a database), in ram as well... translations are now saved in a serialized array into ram, and still has a copy in the database.
... [more]
page loads for some of our larger pages went from 7 seconds to <0.3 seconds.
So yes - our profiling shows that this is a GREAT result. - we just put in 4 GIGS of additional RAM, and off we go!
Thanks for a great post! This absolutely saves our day, and helps us maintain good structured code, as we didn't want to start compromising by taking shortcuts in our code "just for performance reasons", as we see happen so many times, unfortunately.
#5. Steve Clay on 11 May 2011
#4. Ondra on 09 May 2011
http://www.keboola.com/blog/php-sessions-with-memcached-and-a-database-session-in-the-cloud-done-right/
#3. Coret on 07 May 2011
Also, try to specify a good mode, so your webserver can read/write session files (something like mode=777 in your fstab entry, or less for more security).
First impression: the i/o wait on my server is significantly down!
... [more]
Question: I use "3;/var/lib/php5" as session.save_path, so there's a directory structure of 3 levels deep. When my server is rebooted now, this directory structure has to be regenerated... Should I just abandon this approach and just use 1 directory with 235K files?
#2. Ben Wong on 06 May 2011
I'd be surprised if there is a huge performance improvement over regular disk based sessions. Unless you're on something like Amazon's ELB which isn't a real disk anyways.
A technique that I'm doing is using signed data in cookies as a session storage technique.
... [more]
Pros:
1. no server side state
2. no disk access to check
3. No locking
Cons:
1. bigger cookies (but still < 100 bytes)
2. no server side invalidation of a session w/out some server side state. We use memcache for this, but in a simple, does this key exists? Ok cookie is valid.
For the application I'm working on we'll be seeing hundreds of req/second and keeping less state makes scaling out a lot easier.
Also each user is sending requests concurrently so locking *all* requests until the current one finishes really slows down performance. Instead, to prevent race conditions on data we let the database do atomic operations and always code knowing that data could have changed from the time you read to the time to write back to the db. Compare/Swap patterns are great here.
For security, so users can't forge their own cookies we:
a) sign them w/ sha1
b) expire them with a timestamp (int, embedded into data)
When the cookie comes in, we validate the data (json serialized), the signature and that the cookie has not expired.
If everything is OK we process the request and issue a *new* cookie.
#1. Frank de Graaf on 05 May 2011