<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>Aaron Morton - All About ZooKeeper</title>
 <link href="http://thelastpickle.com//topics/ZooKeeper/index.xml" rel="self"/>
 <link href="http://thelastpickle.com/"/>
 <updated>2012-01-12T13:49:14-08:00</updated>
 <id>http://thelastpickle.com/topics/ZooKeeper</id>
 <author>
   <name>Aaron Morton</name>
   <email>aaron@thelastpickle.com</email>
 </author>

 
 <entry>
   <title>ZooKeeper Reading 12-01-2012</title>
   <link href="http://thelastpickle.com/2012/01/12/ZooKeeper-Reading"/>
   <updated>2012-01-12T00:00:00-08:00</updated>
   <id>http://thelastpickle.com/2012/01/12/ZooKeeper-Reading</id>
   <content type="html">&lt;h2 id='overview'&gt;&lt;a href='http://zookeeper.apache.org/doc/current/zookeeperOver.html'&gt;Overview&lt;/a&gt;&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;There is a leader in the cluster.&lt;/li&gt;

&lt;li&gt;Hierarchical model of &lt;code&gt;znodes&lt;/code&gt; which act like directories and files. Uses &amp;#8221;/&amp;#8221; as the path separator.&lt;/li&gt;

&lt;li&gt;All in memory, high throughput low latency.&lt;/li&gt;

&lt;li&gt;&amp;#8220;The ZooKeeper implementation puts a premium on high performance, highly available, strictly ordered access.&amp;#8221;&lt;/li&gt;

&lt;li&gt;Transaction logs and snapshots on disk.&lt;/li&gt;

&lt;li&gt;Clients hold a TCP connection for duplex messaging.&lt;/li&gt;

&lt;li&gt;All updates have a globally ordered TxID.&lt;/li&gt;

&lt;li&gt;Works best in read heavy workloads, think 10:1 R:W ratios.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;znodes&lt;/code&gt; may have data and children.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;znodes&lt;/code&gt; have a version number for their local (?) state.&lt;/li&gt;

&lt;li&gt;Reads and writes on a &lt;code&gt;znode&lt;/code&gt; are atomic with respect to the version of data.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;znodes&lt;/code&gt; can be protected by an ACL.&lt;/li&gt;

&lt;li&gt;Ephemeral &lt;code&gt;znodes&lt;/code&gt; are deleted when the session that created them ends.&lt;/li&gt;

&lt;li&gt;Clients can set a watch on &lt;code&gt;znode&lt;/code&gt; that is triggered when it changes. Is this for the local data or local data and children ?&lt;/li&gt;

&lt;li&gt;Guarantees: * Sequential Consistency - Updates from a client will be applied in the order that they were sent. * Atomicity - Updates either succeed or fail. No partial results. * Single System Image - A client will see the same view of the service regardless of the server that it connects to. * Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update. * Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.&lt;/li&gt;

&lt;li&gt;Write to WAL before apply to in memory DB.&lt;/li&gt;

&lt;li&gt;Reads are serviced by the local DB on the node, writes are services by an agreement protocol. Guess this is why it&amp;#8217;s tuned for read.&lt;/li&gt;

&lt;li&gt;Writes go to the leader, are then distributed to the other (follower) nodes. Local DB&amp;#8217;s should never diverge.&lt;/li&gt;

&lt;li&gt;&lt;a href='http://zookeeper.apache.org/doc/current/zookeeperOver.html#Performance'&gt;Performance&lt;/a&gt; 3 servers should give between 20k/sec and 80k/sec requests depending on the read/write mix.&lt;/li&gt;

&lt;li&gt;&lt;a href='http://zookeeper.apache.org/doc/current/zookeeperOver.html#Reliability'&gt;Reliability&lt;/a&gt; less than 200ms to elect a new leader, failure of a follower reduces throughput.&lt;/li&gt;

&lt;li&gt;What&amp;#8217;s the recover model for a follower that is down for a while ? Does this affect performance ? &lt;strong&gt;Answer&lt;/strong&gt; (from the internals) If too many &lt;code&gt;Proposals&lt;/code&gt; are missing a snapshot is sent.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='zookeeper_internals'&gt;&lt;a href='http://zookeeper.apache.org/doc/current/zookeeperInternals.html'&gt;ZooKeeper Internals&lt;/a&gt;&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&amp;#8220;At the heart of ZooKeeper is an &lt;a href='http://img135.imageshack.us/img135/5011/atomics.gif'&gt;atomic&lt;/a&gt; messaging system that keeps all of the servers in sync.&amp;#8221;&lt;/li&gt;

&lt;li&gt;Guarantees: * Reliable Delivery * Total Order * Causal Order&lt;/li&gt;

&lt;li&gt;The messaging layer is build around FIFO channels between nodes, and relies on the properties of TCP for this. Specifically: * Ordered delivery * No message after close&lt;/li&gt;

&lt;li&gt;The protocol is composed of: * Packet: a sequence of bytes sent through a FIFO channel * Proposal: a unit of agreement. Proposals are agreed upon by exchanging packets with a quorum of ZooKeeper servers. Most proposals contain messages, however the NEW_LEADER proposal is an example of a proposal that does not correspond to a message. * Message: a sequence of bytes to be atomically broadcast to all ZooKeeper servers. A message put into a proposal and agreed upon before it is delivered.&lt;/li&gt;

&lt;li&gt;QUORUM is (n/2) +1 by default.&lt;/li&gt;

&lt;li&gt;QUORUM can be majority quorums, weights, or a hierarchy of groups.&lt;/li&gt;

&lt;li&gt;Proposals are stamped with the &lt;code&gt;zxid&lt;/code&gt; and sent to all servers, a server ack&amp;#8217;s when it is on persistent store. Messages in the proposal are then delivered.&lt;/li&gt;

&lt;li&gt;&lt;code&gt;zxid&lt;/code&gt; has two parts: the epoch and a counter. Implemented as a 64 bit int, high 32 bits are the epoch, low 32 are the count.&lt;/li&gt;

&lt;li&gt;&amp;#8220;The epoch number represents a change in leadership. Each time a new leader comes into power it will have its own epoch number.&amp;#8221;&lt;/li&gt;

&lt;li&gt;Messaging consists of two phases, Leader Activation and Active Messaging.&lt;/li&gt;

&lt;li&gt;Leader Activation may appear to have worked but later fail when checking the invariant that a QUORUM of followers follow the same leader. During the election it must only hold with a high probability.&lt;/li&gt;

&lt;li&gt;In Active Messaging: * Leader sends &lt;code&gt;PROPOSE&lt;/code&gt; to all followers for a new proposal. * Followers commit to non-volatile storage and then &lt;code&gt;ACK&lt;/code&gt; * Leader sends &lt;code&gt;COMMIT&lt;/code&gt; to all followers once a &lt;code&gt;QUOURM&lt;/code&gt; have &lt;code&gt;ACK&lt;/code&gt;&amp;#8216;d.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='getting_started'&gt;&lt;a href='http://zookeeper.apache.org/doc/current/zookeeperStarted.html'&gt;Getting Started&lt;/a&gt;&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Grab the latest distro and start a single node with &lt;code&gt;bin/zkServer.sh start-foreground conf/zoo_sample.cfg&lt;/code&gt;&lt;/li&gt;

&lt;li&gt;Fire up the command line interface with &lt;code&gt;bin/zkCli.sh -server 127.0.0.1:2181&lt;/code&gt; and work through the examples in the doc.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='zope_zookeeper_client_for_python'&gt;&lt;a href='http://pypi.python.org/pypi/zc.zk/0.5.2'&gt;Zope ZooKeeper client for Python&lt;/a&gt;&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Requires &lt;a href='http://pypi.python.org/pypi/zc-zookeeper-static/3.3.4.0'&gt;zc-zookeeper-static&lt;/a&gt;&lt;/li&gt;

&lt;li&gt;&lt;code&gt;zc-zookeeper-static&lt;/code&gt; is a wrapper around the C libs, it&amp;#8217;s pretty low level. e.g. you get an int handle and pass that into methods, not OO. &lt;code&gt;zc.zk&lt;/code&gt; ads an OO wrapper and some other stuff I cannot work out.&lt;/li&gt;

&lt;li&gt;A lot of methods on the zc.zk.ZooKepper object are pass through to the &lt;code&gt;zc-zookeeper-static&lt;/code&gt; package and do not have any docs. Check to docs on &lt;code&gt;zookeeper&lt;/code&gt; for the function help. For example &lt;code&gt;zc.zk.ZooKeeper.get&lt;/code&gt; has no docs and a crap &lt;code&gt;(*arg, **kwargs)&lt;/code&gt; param list, look at &lt;code&gt;zookeeper.get&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id='get_a_connection'&gt;Get a connection&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;import zc.zk
zk = zc.zk.ZooKeeper(&amp;#39;127.0.0.1:2181&amp;#39;)&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='get_the_children_of_a_'&gt;Get the children of a &lt;code&gt;znode&lt;/code&gt;&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;In [6]: zk.get_children(&amp;quot;/&amp;quot;)
Out[6]: [&amp;#39;consumers&amp;#39;, &amp;#39;brokers&amp;#39;, &amp;#39;zookeeper&amp;#39;, &amp;#39;zk_test&amp;#39;]
# some stuff from kafka there.&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='get_the_properties_of_a_'&gt;Get the properties of a &lt;code&gt;znode&lt;/code&gt;&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# Get a zc.zk.Properties for the path
# **NOTE:** This is heavy weight, for single reads use get_properties()
In [55]: p = zk.properties(&amp;quot;/zookeeper&amp;quot;)

In [56]: p.data
Out[56]: {}

In [58]: p.values()
Out[58]: []

# simple get_properties()
In [64]: zk.get_properties(&amp;quot;/zk_test&amp;quot;)
Out[64]: {&amp;#39;string_value&amp;#39;: &amp;#39;foo&amp;#39;}

#zv.zk assume node data is json
In [49]: zk.set(&amp;quot;/zk_test&amp;quot;, &amp;quot;foo&amp;quot;)
Out[49]: 0

In [51]: p = zk.properties(&amp;quot;/zk_test&amp;quot;)

In [53]: p.values()
Out[53]: [&amp;#39;foo&amp;#39;]

In [54]: p.data
Out[54]: {&amp;#39;string_value&amp;#39;: &amp;#39;foo&amp;#39;}&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='tree_operations'&gt;Tree operations&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;In [59]: zk.print_tree(&amp;quot;/zookeeper&amp;quot;)
/zookeeper
  /quota&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='_operations'&gt;&lt;code&gt;znode&lt;/code&gt; operations&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;# create an ephemeral node

# must have an ACL this is an open one 
In [78]: acl = [{&amp;quot;perms&amp;quot; : zookeeper.PERM_ALL, &amp;quot;scheme&amp;quot; : &amp;quot;world&amp;quot;, &amp;quot;id&amp;quot; : &amp;quot;anyone&amp;quot;}]

# Parent path must exist
In [85]: zk.create( &amp;quot;/fake/ephemeral&amp;quot;, &amp;quot;some data&amp;quot;, acl, zookeeper.EPHEMERAL)
...
NoNodeException: no node

In [84]: zk.create( &amp;quot;/zk_test/ephemeral&amp;quot;, &amp;quot;some data&amp;quot;, acl, zookeeper.EPHEMERAL)
Out[84]: &amp;#39;/zk_test/ephemeral&amp;#39;

# node now listed (locally) on the connection 
In [86]: zk.ephemeral
Out[86]: 
{&amp;#39;/zk_test/ephemeral&amp;#39;: {&amp;#39;acl&amp;#39;: [{&amp;#39;id&amp;#39;: &amp;#39;anyone&amp;#39;,
                                 &amp;#39;perms&amp;#39;: 31,
                                 &amp;#39;scheme&amp;#39;: &amp;#39;world&amp;#39;}],
                        &amp;#39;data&amp;#39;: &amp;#39;some data&amp;#39;,
                        &amp;#39;flags&amp;#39;: 1}}
                        
# View from the cluster
In [88]: zk.get_properties(&amp;quot;/zk_test/ephemeral&amp;quot;)
Out[88]: {&amp;#39;string_value&amp;#39;: &amp;#39;some data&amp;#39;}

In [89]: p = zk.properties(&amp;quot;/zk_test/ephemeral&amp;quot;)
In [91]: p.meta_data
Out[91]: 
{&amp;#39;aversion&amp;#39;: 0,
 &amp;#39;ctime&amp;#39;: 1326337991257L,
 &amp;#39;cversion&amp;#39;: 0,
 &amp;#39;czxid&amp;#39;: 1950L,
 &amp;#39;dataLength&amp;#39;: 9,
 &amp;#39;ephemeralOwner&amp;#39;: 86922380708675587L,
 &amp;#39;mtime&amp;#39;: 1326337991257L,
 &amp;#39;mzxid&amp;#39;: 1950L,
 &amp;#39;numChildren&amp;#39;: 0,
 &amp;#39;pzxid&amp;#39;: 1950L,
 &amp;#39;version&amp;#39;: 0}&lt;/code&gt;&lt;/pre&gt;

&lt;h3 id='watch_for_changes'&gt;Watch for changes&lt;/h3&gt;

&lt;pre&gt;&lt;code&gt;In [8]: children = zk.children(&amp;quot;/zk_test&amp;quot;)

In [9]: def my_callback(node):
   ...:   print &amp;quot;Called with node: &amp;quot;, str(node)
   ...: 

In [11]: children(my_callback)
Called with node:  zc.zk.Children(0, /zk_test)
Out[11]: zc.zk.Children(0, /zk_test)

In [14]: acl = [{&amp;quot;perms&amp;quot; : zookeeper.PERM_ALL, &amp;quot;scheme&amp;quot; : &amp;quot;world&amp;quot;, &amp;quot;id&amp;quot; : &amp;quot;anyone&amp;quot;}]

In [15]: zk.create( &amp;quot;/zk_test/ephemeral&amp;quot;, &amp;quot;some data&amp;quot;, acl, zookeeper.EPHEMERAL)
Out[15]: &amp;#39;/zk_test/ephemeral&amp;#39;
Called with node:  zc.zk.Children(0, /zk_test)    &lt;/code&gt;&lt;/pre&gt;</content>
 </entry>
 
 
</feed>
