本文将介绍在部署Memcached中需要注意的问题以及Memcached的分布式算法
无论你是新上线的系统还是已经上线很久的系统。我们都可以很简单的对Memcached进行配置,但是配置之前我们需要注意如下问题:
1.memcached is only a caching mechanism. It shouldn't be used to store information that you cannot
otherwise afford to lose and then load from a different location.
2.There is no security built into the memcached protocol. At a minimum, make sure that the servers
running memcached are only accessible from inside your network, and that the network ports being
used are blocked (using a firewall or similar). If the information on the memcached servers that is
being stored is any sensitive, then encrypt the information before storing it in memcached.
3. memcached does not provide any sort of failover. Because there is no communication between
different memcached instances. If an instance fails, your application must capable of removing it from
the list, reloading the data and then writing data to another memcached instance.
4. Latency between the clients and the memcached can be a problem if you are using different physical
machines for these tasks. If you find that the latency is a problem, move the memcached instances to
be on the clients.
5. Key length is determined by the memcached server. The default maximum key size is 250 bytes.
6. Try to use at least two memcached instances, especially for multiple clients, to avoid having a single
point of failure. Ideally, create as many memcached nodes as possible. When adding and removing
memcached instances from a pool, the hashing and distribution of key/value pairs may be affected.
7.Use Namespace.The memcached cache is a very simple massive key/value storage system, and as such there is no
way of compartmentalizing data automatically into different sections. For example, if you are storing
information by the unique ID returned from a MySQL database, then storing the data from two different
tables could run into issues because the same ID might be valid in both tables.
Some interfaces provide an automated mechanism for creating namespaces when storing information
into the cache. In practice, these namespaces are merely a prefix before a given ID that is applied
every time a value is stored or retrieve from the cache.
You can implement the same basic principle by using keys that describe the object and the unique
identifier within the key that you supply when the object is stored. For example, when storing user data,
prefix the ID of the user with user: or user-.
Memcached distribution algorithms:
The memcached client interface supports a number of different distribution algorithms that are used in
multi-server configurations to determine which host should be used when setting or getting data from
a given memcached instance. When you get or set a value, a hash is constructed from the supplied
key and then used to select a host from the list of configured servers. Because the hashing mechanism
uses the supplied key as the basis for the hash, the same server is selected during both set and get
operations.
You can think of this process as follows. Given an array of servers (a, b, and c), the client uses a
hashing algorithm that returns an integer based on the key being stored or retrieved. The resulting
value is then used to select a server from the list of servers configured in the client. Most standard
client hashing within memcache clients uses a simple modulus calculation on the value against the
number of configured memcached servers. You can summarize the process in pseudocode as:
@memcservers = ['a.memc','b.memc','c.memc'];
$value = hash($key);
$chosen = $value % length(@memcservers);
Replacing the above with values:
@memcservers = ['a.memc','b.memc','c.memc'];
$value = hash('myid');
$chosen = 7009 % 3;
In the above example, the client hashing algorithm chooses the server at index 1 ( 7009 % 3 = 1),
and store or retrieve the key and value with that server.
Using this method provides a number of advantages:
? The hashing and selection of the server to contact is handled entirely within the client. This
eliminates the need to perform network communication to determine the right machine to contact.
? Because the determination of the memcached server occurs entirely within the client, the server can
be selected automatically regardless of the operation being executed (set, get, increment, etc.).
? Because the determination is handled within the client, the hashing algorithm returns the same value
for a given key; values are not affected or reset by differences in the server environment.
? Selection is very fast. The hashing algorithm on the key value is quick and the resulting selection of
the server is from a simple array of available machines.
? Using client-side hashing simplifies the distribution of data over each memcached server. Natural
distribution of the values returned by the hashing algorithm means that keys are automatically spread
over the available servers.
Providing that the list of servers configured within the client remains the same, the same stored key
returns the same value, and therefore selects the same server.
However, if you do not use the same hashing mechanism then the same data may be recorded
on different servers by different interfaces, both wasting space on your memcached and leading to
potential differences in the information.
The problem with client-side selection of the server is that the list of the servers (including their
sequential order) must remain consistent on each client using the memcached servers, and the servers
must be available. If you try to perform an operation on a key when:
? A new memcached instance has been added to the list of available instances
? A memcached instance has been removed from the list of available instances
? The order of the memcached instances has changed
When the hashing algorithm is used on the given key, but with a different list of servers, the hash
calculation may choose a different server from the list.
If a new memcached instance is added into the list of servers, as new.memc is in the example below,
then a GET operation using the same key, myid, can result in a cache-miss. This is because the same
value is computed from the key, which selects the same index from the array of servers, but index 2
now points to the new server, not the server c.memc where the data was originally stored. This would
result in a cache miss, even though the key exists within the cache on another memcached instance.
This means that servers c.memc and new.memc both contain the information for key myid, but the
information stored against the key in eachs server may be different in each instance. A more significantmemcached instances, causing an increase in database reads.
The same effect can occur if you actively manage the list of servers configured in your clients, adding