Notes: Are You “Cashing In” on Caching?
Caching is a way of saving data after it has been originally computed or loaded. Alleviates load from server(s). Makes them faster and “healthier”. Also improves the users’ experience.
On the server level, we can cache templates. ColdFusion has a trusted cache of compiled CFML. We can also cache data objects (queries, structs, arrays, CFCs, etc), and then render HTML content, either entire pages or HTML regions.
What is good to cache? Things that are global to all users, like lists of states, navigation, etc. Even if not for all users, things that are used repeatedly in the application. Things that aren’t subject to much change. The more variants of something (based on the number of inputs), the less it will benefit from caching.
So how do we find these things? Examine ColdFusion logs, look for long-running pages. Ugh. Better: Go through debugging output. Look for poor execution times. But usually we look at it in a dev environment where load isn’t heavy. Can also use getTickCount(), <cflog>, <cftrace>. Best: SeeFusion, FusionReactor, CF8 Admin will now give us more insight into performance of server and, more specifically, the threads or tasks that might be causing the heavy load.
Ask yourself:
1. How often do changes take place? The more the item is in flux, the less it is a caching candidate.
2. Up-to-the-second data is critical? No good.
3. How many cache variants are possible based on the variable input arguments?
4. What is the average count/size/length of each cache variant? Memory footprint might become an issue.
What is a “Cache Variant”? A single set of data derived from the application code and based on a single combination of values for its arguments. So if we have SQL with userid=#userID and active=#active#, these are two distinct inputs. If you have 1,000 users, and 2 active states, you have 1,000 x 2 = 2,000 variants for the query cache. Obviously, the number of inputs drastically increases the variant pool. But ask yourself: How many likely variants exist? There likely are many permutations that you know are very unlikely to happen. Consider this when thinking about the likely footprint of your caching. Give attention to where you cache as well. Perhaps memory, disk, database.
Measuring effects of caching. A 500ms page executed 60 times per minute is effectively taking 30s/1m. 30s x 60m = 1,800s per hour. If caching that page takes it down to 20ms, this will be 12s/1m x 60m = 72s per hour. You can see how drastic the caching is on overall system performance.
Trusted Cache. Only have this turned on in the production environment. Can be problematic for a dev server since files are changing constantly. Will recompile or depend on compiled cache based on date/time stamps of the files. This could cause problems when templates are deployed via FTP or source control. For instance, if you deploy an older version to roll back some code, if may not recompile the new code! It will use the cache! So be cognizant of this and clear the cache if that occurs.
Restarting entire cache: Restart Application Server. Ugh. But in CFMX7, can “Clear Template Cache Now” in ColdFusion Administrator. Brian Szoszorek have an article for clearing just specific caches.
“Save Class Files” option. In the CF Admin “Caching” settings. When enabled, generates and saves a *.class file for each CFML template executed in WEB-INF/cfclasses/ directory. Can save a small bit of load when server is restarted. Server-wide setting. This option often doesn’t actually give any decent speed gain, because it uses File I/O.
Query Caching. Adding cachedWithin or cachedAfter to <cfquery> tag. Note that CF Admin has a “Maximum number of cached queries” setting. Should be set high for a server-wide multi-application environment to be useful. Query caching is driven by: Query name, SQL statement, datasource, username/password, input arguments to SQL statement. All of these must be the same to use the cache. Even the tabbing/spacing of the SQL statement will cause it to not reference the cache if different!
Good caching examples: CachedWithin is good if something isn’t changing for a short period of time. Especially great when no inputs that would cause variants. Show top 10 news articles, cache for 15 minutes. CachedAfter is good when running a query on data that won’t change after a certain time. Show records for month of 6/2007. You could do cachedAfter with a date of 6/30/2007.
Pros of query caching: Built-in. Debugging output will show that it was cached. But many cons: No control of where queries “go” when being cached. In one big shared resource pool. Max number of cached queries is hard to set reasonably. Difficult to clear particular items unless clear whole cache. Cannot cache queries with <cfqueryparam> (until CF8). Cannot cache with <cfstoredproc>, however you can invoke the stored procs inside of <cfquery> with the EXEC command. No way to track or view the whole cache. Service/instance specific, not application specific.
Alternatives to data caching. Take queries and put them in Server, Application, or Session scope. But you will have to programmatically start managing that. This obviously can be done with not just queries but also structs, arrays, objects, simple vars.
Content Caching. You can use the <cfcache> tag. Allows you to cache entire contents of a page. Like <cfquery>, you give it a cachedWithin value. Stores the pushed HTML to disk. Decision to use cache is based on the URL requested.
Alternatives for content caching. You can use <cfsavecontent> then put it in a shared scope (server, application, session, etc). Various custom solutions (cf_superCache, cf_accelerate, cf_cacheOmatic, cf_turboCache, etc).
