A Cache is like a short term memory. It’s typically faster than original data source, because it is in memory. You know accessing data from memory is faster than from hard drive.
When discussion system design, we may need to clarify the following questions. Many of the same rules we discussed while taking about algorithm design apply here as well!
- What the use cases system need to satisfy?
- What the constraints system need to satisfy?
- According to step1 & step2, what are bottlenecks you need to deal with?
- Abstract Design!
- Follow up — Cache invalidation
Let say we are going to design a search engine. The cache can be placed between application and databases, where access the data from the cache instead of main datastore. Cache store the latest (or frequently) used data in the memory to cut down latency or unnecessary load on it. The data might be queries sent by users and the document index. Consider there is a preprocessing module before we store data, it may extract some key words from the queries. (Of course if we change the design of preprocessing module may cause cache missing)
2. What the constraints system need to satisfy?
- What’s the tradeoff between availability and consistency?
According to CAP theorem,
Availability is every request receives a (non-error) response, without the guarantee that it contains the most recent write.
Consistency is every read receives the most recent write or an error. (To make sure there is always consistency)
Since P of CAP is something we can top on it, one has to choose between Consistency and availability. When it comes to search engine, it may focus on availability more.
To keep the high availability, a master-slave structure is design for this. It contains backup components to which it automatically switches (fails over for fault-tolerance) when an active component stops working. The failover should appear seamless to users and should not interrupt services. .
One of the fault tolerance strategy can be :
— for writing part: as well as #w replications have updated data, then we return the confirmed information to users. ( No need to be updated by every replications.)
— for reading part: as well as #R replications are returning the same data, then we return this result to users. ( No need to be returned by every replications.)
- How many data will put in cache?
- How long of the data cache last?
- What the expected QPS? (It may including how many request per day? how many new request are there? the average length of request and the result from database? These questions can help use estimate the expected QPS and the numbers of machine. I will write another article on that.)
- Eviction Policies?
What if our cache are full? A cache eviction strategy can determine which element to evict (or keep) when the caches are full. The followings are some cache eviction policies.
First In First Out (FIFO): The cache evicts the first block accessed first without any regard to how often or how many times it was accessed before.
Least In First Out(LIFO): The cache evicts the block accessed most recently first without any regard to how often or how many times it was accessed before.
Least Recently Used (LRU): Discards the least recently used items first. (Code here)
Least Frequently Used (LFU): Counts how often an item is needed. Those that are used least often are discarded first.
Random choose: Randomly choose one element and discards.
- Is Latency important? ( Actually one of the reason we use cache is to cut down latency)
3. According to step1 & step2, what are bottlenecks you need to deal with?
Mostly this design will have one or more bottlenecks on the constraints of the problem. Perhaps the user requests is the most heavy part, you need a load balancer and several machines behind it to hand the requests from users. Or maybe the data is so huge that you need to distribute your database on multiple machine. However, the database may too slow and you need to have some in-memory cache to speed it up.
4. Abstract Design!
Let’s say, our bottlenecks is the huge data and we need some in-memory cache to cut out the latency. The abstract design might be..
Here, we have one cache client and a consistent hashing ring. A cache client is responsible for a shard selection and distribute data to the specific host. To know more about what is consistent hashing ring, I found a great material here. It can help scale servers and data without affecting all system and can simply replicate data without duplicating the whole node. To scale this system, you can simply add some node into the ring or every node on the ring can be a master which distribute data to its salves as Fig1 shows. For the hash code, you can simple use lib of python — hashlib — to implement. (sample code here.)
a few follow up issues
1. Cache invalidation: policies of reading /writing data from/to cache
- Read data from cache
Step1. fetch data from cache
Step2. return the data from step2
Step3. if cache missing, fetch data from database
Step4. persisted the data from database in cache
- Write policies
- Write Through Cache
Step1. data write to cache
Step2. data write to database
In this policy, cache and database both always in sync.
Pros: consistency
cons: latency
- useful for applications which re-read data frequently
2. Write Back Cache
Step1. data written to cache & mark the data (to update data for database later)
Step2. sync replication between cache and database
Pros: lower latency
Cons: lag between cache and database
This policy is nor impact to the write latency nor read latency, but the downside is a lag or data consistency.
If the cache suddenly down, may cause data missing where will cause data inconsistency. In addition, there is a lag between cache and database.
3. Write Around Cache
Step1. data written to database
Step2. Cache queries database for missing entries
Pros: consistency
cons: latency
The database is updated without writing to the cache and useful for applications don’t frequently read the most resent data.