<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Outbrain Techblog</title>
	<atom:link href="http://techblog.outbrain.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://techblog.outbrain.com</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Sat, 03 Nov 2012 19:52:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Hurricane Sandy &#8211; Outbrain Service updates</title>
		<link>http://techblog.outbrain.com/2012/10/hurricane-sandy-outbrain-service-updates/</link>
		<comments>http://techblog.outbrain.com/2012/10/hurricane-sandy-outbrain-service-updates/#comments</comments>
		<pubDate>Sun, 28 Oct 2012 21:52:08 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=192</guid>
		<description><![CDATA[Hi all As Hurricane Sandy is about to hit the east coast US, and as Outbrain&#8217;s main Datacenter is located in downtown Manhattan, we are taking measures to make as little service interruption as possible for our partners and customers. Outbrain is normally serving from 3 data centers and in case of NY data center loss, [...]]]></description>
			<content:encoded><![CDATA[<p>Hi all As Hurricane Sandy is about to hit the east coast US, and as Outbrain&#8217;s main Datacenter is located in downtown Manhattan, we are taking measures to make as little service interruption as possible for our partners and customers. Outbrain is normally serving from 3 data centers and in case of NY data center loss, we will supply the service from one the other data centers. On this page, below &#8211; we will update on any service interruption and ETAs for problem solving. We assume all will go well and we will not have to update but&#8230; just in case <img src='http://techblog.outbrain.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>[UPDATE - Nov 3rd 3:45pm EST] - </strong>At this time Utility power is back to all our datacenters and HQ office. It is now time to restore the service from NY and get the office back to work. This will take some time but systems will gradually be put back up over the next week or so. There should be no effect on users, publishers or clients.</p>
<p>Our HQ will also start working gradually depending on the availability of public transportation.</p>
<p>We are here closing this reporting post &#8211; if you see any issues, please report to am@outbrain.com or your rep.</p>
<p>I hope the storm of the century will be the last one for the next century (at least).</p>
<p><strong>[UPDATE - Nov 1st  9:30am EST]</strong> - Our HQ, located on 13<sup>th</sup> between 5<sup>th</sup> and 6<sup>th</sup> in downtown New York City is still without power and therefore closed. Thankfully, our NY-based team is safe and in dry locations, and will continue to try and work as best they can. We highly appreciate the concern and best wishes we received from our partners and clients across the globe; thank you!</p>
<p>We are doing our best to continue to provide the best in class service, one we hope you’ve come to expect from us. As an update, our datacenter in NY is still without power and we expect it to be down for a few more days. We will continue to serve from our other datacenters located in Chicago and Los Angeles. To reiterate, our service did not go down, and we are currently still serving across our client’s sites. As of this morning, we recovered and updated all our reporting capabilities, so we should be back to 100%.</p>
<p>If you are experiencing any difficulties or seeing different, please reach out to your respective contacts. We’ll also continue to operate under emergency mode until Monday, you can reach us 24/7 at <a href="mailto:am-emergency-support@outbrain.com" target="_blank">am-emergency-support@outbrain.<wbr>com</wbr></a> (am = Account Management).</p>
<p><strong>[UPDATE - Oct 31st 6:46am EST] </strong>- Serving still holds strong from our LA and Chicago data centers and we are not aware of any disruption to our service. We are working hard to recover our dashboard reporting capabilities, but it will probably take a couple more days before we’re able to get back to normal mode. Sorry for any inconvenience caused by this. Send us a note to <a href="mailto:am-emergency-support@outbrain.com" target="_blank">am-emergency-support@outbrain.<wbr>com</wbr></a> if you have any request, and one of us from around the world will respond as soon as possible.</p>
<p><strong>[UPDATE - 6:51pm EST] </strong> - Again, not much to update &#8211; All is stable with both LA and Chicago datacenters. It&#8217;s the end of the day here in Israel and we are trying to get some rest. Our team mates in the US are keeping an eye on the system and will alert us if there is anything wrong. Good night.</p>
<p><strong>[UPDATE - 3:35am EST] </strong>- Actually not much to update about the service. All is pretty much stable. we are safely serving from LA and Chicago. most back-end services are running in LA Datacenter and our tech team in Israel and NY are monitoring and handling issues as they raise. Our Datacenter vendors in NY are working with FDNY to pump the water from the flooded generator room so it will take a while to recover this datacenter <img src='http://techblog.outbrain.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><strong>[UPDATE - 10:50am EST] - </strong>The clients dashboard is back up.</p>
<p><strong>[UPDATE - 10am EST] &#8211; </strong>The clients dashboard on our site is periodically down &#8211; we are handling the issues there and will update soon.</p>
<p><strong>[UPDATE - 5am EST]</strong> Our NY Data center went down. Our service is fully operational and we are  serving through our Chicago and LA Data centers. If you’re accessing your Outbrain dashboard you may experience some delays in data freshness. We are working to resolve this issue and will continue to update.</p>
<p><strong>[UPDATE - 2am EST]</strong> &#8211; Our NY Data center went completely off &#8211; We are fully serving from our Chicago and LA Data centers. External reports on our site are still down but we are working to fail over all services from the LA Datacenter. &#8211; we will follow with updates.</p>
<p><strong>[Update - 12:50am EST]</strong> &#8211; power just went all off in our NY Datacenter  and provider has evacuated the facility &#8211; we are taking our measures to move all functionality to other datacenters.</p>
<p><strong>[UPDATE]  - at 9pm EST]</strong>  commercial power went down on our NY Datacenter. Provider failed over to generator and we continue to serve smoothly from this Datacenter. We continue to monitor the service closely and ready to take actions if needed.</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2012/10/hurricane-sandy-outbrain-service-updates/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to &#8220;Outbrain&#8221; Selenium Tests with Ext framework</title>
		<link>http://techblog.outbrain.com/2011/12/how-to-outbrain-selenium-tests-with-ext-framework/</link>
		<comments>http://techblog.outbrain.com/2011/12/how-to-outbrain-selenium-tests-with-ext-framework/#comments</comments>
		<pubDate>Thu, 22 Dec 2011 18:15:11 +0000</pubDate>
		<dc:creator>asaf</dc:creator>
				<category><![CDATA[Dev Methods]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=167</guid>
		<description><![CDATA[link to github. Many of our internal applications were developed using the Extjs framework. Extjs Is a very powerful JavaScript framework and one of the most popular javascript user interface open source framework , however when it comes to automated test with selenium the real challenge begin. It is very difficult to write automated test [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><img class="alignnone" title="Extjs" src="http://t3.gstatic.com/images?q=tbn:ANd9GcTHAz7s8dj10o_au1V19lSz0wD44ZkFc65xWKOdQTsaMXqVnUU9" alt="" width="123" height="56" /> <img class="alignnone" src="http://t0.gstatic.com/images?q=tbn:ANd9GcQvMGLkx_Uaoo7daCtYbP_iydrdKx0wljIaA6VVDsKV1tMKdY9K" alt="" width="81" height="81" /><img class="alignnone" title="seleniumhq" src="http://seleniumhq.org/images/selenium-logo.png" alt="" width="64" height="64" /></p>
<p><a title="selenuimExtend source code" href="https://github.com/asaflevy/SelenuimExtend" target="_blank">link to github</a>.</p>
</div>
<p>Many of our internal applications were developed using the <a title="Extjs" href="http://www.sencha.com/">Extjs </a>framework.</p>
<p>Extjs Is a very powerful JavaScript framework and one of the most popular javascript user interface open source framework , however when it comes to automated test with selenium the real challenge begin.</p>
<p>It is very difficult to write automated test to Ext application with selenium because Ext generates many &lt;div&gt; and &lt;span&gt; tags with an automatically-generated ID (something like &#8220;ext-comp-11xx&#8221;). Accessing these tags through <a title="Selenium" href="http://seleniumhq.org/">Selenium </a>is the big challenge we are trying to solve. We wanted to find a way to get these automatically-generated IDs automatically.<br />
<span style="text-decoration: underline;"><strong><em>How do we approach this?</em></strong></span></p>
<p>Ext has a component manager, where all of the developers&#8217; components are being saved.  We can &#8220;ask&#8221; the component manager for the component ID by sending it a descriptor of the component. To simplify – we (the selenium server) tell the component manager &#8220;I need the ID of the current visible window which, btw, is labeled as &#8216;campaign editor&#8217;&#8221;.</p>
<p>This will look something like:</p>
<p><code><br />
<span style="color: #888888;">ComponentLocatorFactory<em>  extjsCmpLoc</em> = <strong>new</strong> ComponentLocatorFactory(<em>selenuim</em>);</span></code></p>
<p><span style="color: #888888;">Window testWin = <strong>new</strong> Window(<em>extjsCmpLoc</em>.createLocator(&#8221;campaign editor&#8221;Xtype.<em>WINDOW</em>));</span><br />
Then we can to use Ext window method like close -&gt; testWin.close();</p>
<p><em><strong>Anther Example :</strong></em><br />
<code><br />
<span style="color: #888888;">ComponentLocatorFactory<em>  extjsCmpLoc</em> = <strong>new</strong> ComponentLocatorFactory(<em>selenuim</em>);</span></code></p>
<p><span style="color: #888888;">Button newButton = <strong>new </strong> Button(<em>extjsCmpLoc</em>.createLocator(&#8220;Add Campaign&#8221;, ExtjsUtils.Xtype.<em>BUTTON</em>));</span></p>
<p><span style="color: #888888;">newButton.click();</span></p>
<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/12/New-button.png"><img class="alignnone size-full wp-image-171" title="New button" src="http://techblog.outbrain.com/wp-content/uploads/2011/12/New-button.png" alt="" width="467" height="95" /></a></p>
<p>&nbsp;</p>
<p>You can ask for all of the visible components by type, by label or both:</p>
<p>&nbsp;</p>
<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/12/DatesField.png"><img class="alignnone size-full wp-image-172" title="DatesField" src="http://techblog.outbrain.com/wp-content/uploads/2011/12/DatesField.png" alt="" width="682" height="162" /></a></p>
<p><code>T<span style="color: #888888;">extField flyfromdate = <strong>new</strong> TextField( <em>extjsCmpLoc</em>.createLocator(ExtjsUtils.Xtype.<em>DATEFIELD</em>, <span style="text-decoration: underline;"><strong><span style="color: #ff9900; text-decoration: underline;">0</span></strong></span>));</span></code></p>
<p><span style="color: #888888;">flyfromdate.setValue(&#8220;10/12/2011&#8243;);</span></p>
<p><span style="color: #888888;">TextField flytodate = <strong>new</strong> TextField(<em>extjsCmpLoc</em>.createLocator(ExtjsUtils.Xtype.<em>DATEFIELD</em>,  <span style="text-decoration: underline;"><strong><span style="color: #ff9900; text-decoration: underline;">1</span></strong></span>));</span></p>
<p><span style="color: #888888;">flytodate.setValue(&#8220;10/31/2011&#8243;);</span></p>
<p>&nbsp;</p>
<p><strong><span style="text-decoration: underline;">Here&#8217;s a simple diagram of our solution:</span></strong></p>
<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/12/selnuimExtjs1.jpg"><img class="size-full wp-image-169" title="selnuimExtjs" src="http://techblog.outbrain.com/wp-content/uploads/2011/12/selnuimExtjs1-e1324321888424.jpg" alt="" width="545" height="289" /></a></p>
<p>&nbsp;</p>
<p><a title="ExtExtend" href="https://github.com/simbal/SelenuimExtend" target="_blank">link to project in git-hub</a> : https://github.com/simbal/SelenuimExtend</p>
<p style="text-align: left;">This solution is Open Source. In the meantime, if you have any questions, feel free to contact me directly. Asaf at outbrain dot com.</p>
<p>&nbsp;</p>
<p>Asaf Levy</p>
<p>Asaf@outbrain.com</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/12/how-to-outbrain-selenium-tests-with-ext-framework/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Slides &#8211; Cassandra for Sysadmins</title>
		<link>http://techblog.outbrain.com/2011/08/slides-cassandra-for-sysadmins/</link>
		<comments>http://techblog.outbrain.com/2011/08/slides-cassandra-for-sysadmins/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 16:57:59 +0000</pubDate>
		<dc:creator>Nathan Milford</dc:creator>
				<category><![CDATA[IT/Ops]]></category>
		<category><![CDATA[apache]]></category>
		<category><![CDATA[cassandra]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=146</guid>
		<description><![CDATA[At outbrain, we like things that are awesome. Cassandra is awesome. Ergo, we like Cassandra. We&#8217;ve had it in production for a few years now. I won&#8217;t delve into why the developers like it, but as a Sysadmin on-call in the evenings, I can tell you straight out I&#8217;m glad it has my back. We [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/08/cassandra_logo.png"><img class="aligncenter size-medium wp-image-154" title="cassandra_logo" src="http://techblog.outbrain.com/wp-content/uploads/2011/08/cassandra_logo-300x60.png" alt="" width="300" height="60" /></a></p>
<p>At outbrain, we like things that are awesome.</p>
<p><a href="http://cassandra.apache.org/" target="_blank">Cassandra</a> is awesome.</p>
<p>Ergo, we like Cassandra.</p>
<p>We&#8217;ve had it in production for a few years now.</p>
<p>I won&#8217;t delve into why the developers like it, but as a Sysadmin on-call in the evenings, I can tell you straight out I&#8217;m glad it has my back.</p>
<p>We have MySQL deployed pretty heavily, and it is fantastic at what it does.  However, MySQL has a bit of an administrative overhead compared to a lot of the new alternative data stores out there, especially when making MySQL work in a large geographically distributed environment.</p>
<p>If you can model your data in Cassandra, are educated about the trade-offs, and have an undying wish not to have to worry too deeply about managing replication and sharding, it is a no-brainer.</p>
<p>I did a presentation on Cassandra (with <a href="https://twitter.com//tjake" target="_blank">Jake Luciani</a> from <a href="http://www.datastax.com/" target="_blank">Datastax</a>) to the NYC Chapter of the League of Professional System Administrators  (<a href="https://lopsa.org/" target="_blank">LOPSA</a>) from the standpoint of an Admin.</p>
<p>Us Sysadmins fear change, because it is our butt on the line if there is an outage.  With executives anxiously pacing behind us and revenue flushing down the drain, we&#8217;re the last line of defense if there is an issue and we&#8217;re the ones who will be torn away from families in the evenings to handle an outage.</p>
<p>So, yeah&#8230; we&#8217;re a conservative lot <img src='http://techblog.outbrain.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>That being said, change and progress can be good, especially when it frees you up.  Cassandra is resilient, fault-graceful and elastic. Once you understand how so, you&#8217;ll be slightly less surly.  Your developers might not even recognize you!</p>
<p>These slides are for the Sys Admin, noble fellow, to assuage his fears and get him started with Cassandra.</p>
<p>&nbsp;</p>
<p><center></p>
<div style="width:425px" id="__ss_8810365"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/nmilford/cassandra-for-sysadmins" title="Cassandra for Sysadmins" target="_blank">Cassandra for Sysadmins</a></strong> <object id="__sse8810365" width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandraforsysadmins-110809110459-phpapp01&#038;stripped_title=cassandra-for-sysadmins&#038;userName=nmilford" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse8810365" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=cassandraforsysadmins-110809110459-phpapp01&#038;stripped_title=cassandra-for-sysadmins&#038;userName=nmilford" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed></object>
<div style="padding:5px 0 12px"> View more <a href="http://www.slideshare.net/" target="_blank">presentations</a> from <a href="http://www.slideshare.net/nmilford" target="_blank">Nathan Milford</a> </div>
</p></div>
<p><center></p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/08/slides-cassandra-for-sysadmins/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Visualizing Our Deployment Pipeline</title>
		<link>http://techblog.outbrain.com/2011/08/visualizing-our-deployment-pipeline/</link>
		<comments>http://techblog.outbrain.com/2011/08/visualizing-our-deployment-pipeline/#comments</comments>
		<pubDate>Mon, 01 Aug 2011 08:28:57 +0000</pubDate>
		<dc:creator>Ran Tavory</dc:creator>
				<category><![CDATA[Dev Methods]]></category>
		<category><![CDATA[IT/Ops]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=135</guid>
		<description><![CDATA[(This is a cross post from Ran&#8217;s blog) When large numbers start piling up, in order to make sense of them,  they need to be visualized. I still work as a consultant at Outbrain about one day a week, and most of the time I&#8217;m in charge of the deployment system last described here. The challenges that [...]]]></description>
			<content:encoded><![CDATA[<div><em>(This is a cross post from <a href="http://prettyprint.me/2011/07/31/visualizing-our-deployment-pipeline/" target="_blank">Ran&#8217;s blog)</a></em></div>
<div><em><br />
</em></div>
<div>When large numbers start piling up, in order to make sense of them,  they need to be visualized.</div>
<div>I still work as a consultant at Outbrain about one day a week, and most of the time I&#8217;m in charge of the deployment system <a href="http://prettyprint.me/2011/01/24/continuous-deployment-at-outbrain/">last described here</a>. The challenges that are encountered when we develop the system are good challenges, and every day we have too many deployments to be easily followed, so I decided to visualize them.</div>
<div>On an average day, we usually have  a dozen or two deployment (to production, not including test clusters) so I figured why don&#8217;t I use my <a href="http://code.google.com/apis/chart/">google-visualization</a>-fo0 and draw some nice graphs. Here are the results and explanations follow.</div>
<div>Before I begin, just to put things in context, Outbrain had been practicing  <a href="http://www.startuplessonslearned.com/search/label/continuous%20deployment">Continuous Deployment</a> for a while (6 months or so) and although there are a few systems that helped us get there, one of the main pillars was a relatively new tool written by the fine folks at LinkedIn (and in particular <a href="http://www.linkedin.com/in/pujante">Yan</a>&#8211; Thanks Yan!), so just wanted to give a fair shout out to them and thank Yan for the nice tool, API and ongoing awesome support. If you&#8217;re looking for a deployment tool do give <a href="https://github.com/linkedin/glu/wiki">glu</a> a try, it&#8217;s pretty awesome! Without glu and it&#8217;s API all the nice graphs and the rest of the system would not have seen the light of day.</div>
<p>&nbsp;</p>
<p><strong>The Annotated Timeline</strong><br />
This graph may seem intimidating at first, so don&#8217;t be scared and let&#8217;s dive right into it&#8230; BTW, you may click on the image to enlarge it.</p>
<div><a href="http://prettyprint.me/wp-content/uploads/2011/07/versions-1.png"><img class="alignnone size-medium wp-image-430" title="versions-1" src="http://prettyprint.me/wp-content/uploads/2011/07/versions-1-300x152.png" alt="" width="300" height="152" /></a></div>
<div>First, let&#8217;s zoom in to the right hand side of the graph. This graph uses Google&#8217;s <a href="http://code.google.com/apis/chart/interactive/docs/gallery/annotatedtimeline.html">annotated timeline graph</a> which is really cool for showing how things change over time and correlate them to events, which is what I do here &#8212; the events are the deployments and the x axis is the time while the y is the version of the deployed module.</div>
<div><a href="http://prettyprint.me/wp-content/uploads/2011/07/rhs-1.png"><img class="alignnone size-full wp-image-431" title="rhs-1" src="http://prettyprint.me/wp-content/uploads/2011/07/rhs-1.png" alt="" width="310" height="114" /></a></div>
<div>On the right hand side you see a list of deployment events &#8212;  for example, the one at the top has &#8220;ERROR www @tom&#8230;&#8221; and the one next is &#8220;BehavioralEngine @yatirb&#8230;&#8221; etc. This list can be filtered so if you type a name of one of the developers such as @tom or @yatirb you see only the deployments made by him (of course all deployments are made by devs, not by ops, hey, we&#8217;re <a href="http://en.wikipedia.org/wiki/DevOps">devops</a>y, remember?).</div>
<div>If you type into the filter box only www you see all the deployments for the www component, which by no surprise is just our website.</div>
<div>If you type ERROR you see all deployments that had errors (and yes, this happens too, not a big deal).</div>
<div>The nice thing about this graph from is first that while you filter the elements on the graph that are filtered out dissapear, so for example let&#8217;s see only deployments to www (click on the image to enlarge):</div>
<div><a href="http://prettyprint.me/wp-content/uploads/2011/07/www-1.png"><img class="alignnone size-medium wp-image-432" title="www-1" src="http://prettyprint.me/wp-content/uploads/2011/07/www-1-300x150.png" alt="" width="300" height="150" /></a></div>
<div>You&#8217;d notice that not only the right hand side list is shrunk and contains only deployments to www, but also the left hand side graph now only has the appropriate markers. The rest of the lines are still there but only the markers for the www line are on the graph right now.</div>
<div>Now let&#8217;s have a look at the graph. One of the coolest things is that you can zoom in to a specific timespan using the controls at the lower part of the graph. (click to enlarge)</div>
<div><a href="http://prettyprint.me/wp-content/uploads/2011/07/zoom.png"><img class="alignnone size-medium wp-image-418" title="zoom" src="http://prettyprint.me/wp-content/uploads/2011/07/zoom-300x219.png" alt="" width="300" height="219" /></a></div>
<p>In this graph the x axis shows the time (date and time of day) and the y axis shows the svn revision number. Each colored line represents a single module (so we have one line for www and one line for the BehavioralEngine etc).</p>
<p>What you would usually see is for each line (representing a module) a monotonically increasing value over time, a line from the bottom left corner towards the top right corner, however, in relatively rare cases where a developer wants to deploy an older version of his module, then you clearly see it by the line suddenly dropping down a bit instead of climbing up; this is really nice, helps find unusual events.<br />
<strong> </strong></p>
<p>&nbsp;</p>
<p><strong>The Histogram</strong><br />
In the next graph you see an overview of deployments per day. (click to enlarge)</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2011/07/histo.png"><img class="alignnone size-medium wp-image-420" title="histo" src="http://prettyprint.me/wp-content/uploads/2011/07/histo-300x225.png" alt="" width="300" height="225" /></a></p>
<p>This is more of a holistic view of how things went the last couple of days, it just shows how many deployments took place each day (counts production clusters only) and colors the successful ones in green and the failed ones in red.</p>
<p>This graph is like an executive summary that can tell the story of &#8211; in case there are too many reds (or there are reds at all), then someone needs to take that seriously and figure out what needs to be fixed (usually that someone is me&#8230;) &#8211; or in case the bars aren&#8217;t high enough, then someone needs to kick developer&#8217;s buts and get them deploying somethin already&#8230;</p>
<p>Like many other graphs from Google&#8217;s library (this one&#8217;s a Stacked <a href="http://code.google.com/apis/chart/interactive/docs/gallery/columnchart.html">Column Chart</a>, BTW), it shows nice tooltips when hovering over any of the columns with their x values (the date) and their y value (number of successful/failed deployments)</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2011/07/hover.png"><img class="alignnone size-full wp-image-422" title="hover" src="http://prettyprint.me/wp-content/uploads/2011/07/hover.png" alt="" width="183" height="208" /></a><br />
<strong> </strong></p>
<p>&nbsp;</p>
<p><strong>Versions DNA Mapping</strong><br />
The following graph shows the current variety of versions that we have in our production systems for each and every module. It was attributed as a DNA mapping by one of our developers b/c of the similarity in how they look but that&#8217;s how far this similarity goes&#8230;</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2011/07/versions.png"><img class="alignnone size-medium wp-image-424" title="versions" src="http://prettyprint.me/wp-content/uploads/2011/07/versions-300x147.png" alt="" width="300" height="147" /></a></p>
<p>The x axis lists the different modules that we have (names were intentionally left out, but you can imaging having www and other folks there). The y axis shows the svn versions of them in production. It uses glu&#8217;s live model as reported by glu&#8217;s agents to zookeeper.</p>
<p>Let&#8217;s zoom in a bit:</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2011/07/versions-zoom.png"><img class="alignnone size-full wp-image-425" title="versions zoom" src="http://prettyprint.me/wp-content/uploads/2011/07/versions-zoom.png" alt="" width="227" height="428" /></a></p>
<p>What this diagram tells us is that the module www has versions starting from 41268 up to 41463 in production. This is normal as we don&#8217;t necessarily deploy everything to all servers at once, but this graph helps us easily find hosts that are left behind for too long, so for example if one of the modules had not been deployed in a while then you&#8217;d see it falling behind low on the graph. Similarly, if a module has a large variability in versions in production, chances are that you want to close that gap pretty soon. The following graph illustrates both cases:</p>
<p><a href="http://prettyprint.me/wp-content/uploads/2011/07/behind.png"><img class="alignnone size-medium wp-image-426" title="behind" src="http://prettyprint.me/wp-content/uploads/2011/07/behind-300x106.png" alt="" width="300" height="106" /></a></p>
<p>To implement this graph I used a crippled version of the <a href="http://code.google.com/apis/chart/interactive/docs/gallery/candlestickchart.html">Candle Stick Chart</a>, which is normally used for showing stock values; it&#8217;s not ideal for this use case but it&#8217;s the closest I could find.</p>
<p>That&#8217;s all, three charts is enough for now and there are other news regarding our evolving deployment system, but they are not as visual; if you have any questions or suggestions for other types of graphs that could be useful don&#8217;t be shy to comment or tweet (@<a href="http://twitter.com/#!/rantav" target="_blank">rantav</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/08/visualizing-our-deployment-pipeline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Leader Election with Zookeeper</title>
		<link>http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/</link>
		<comments>http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 08:19:36 +0000</pubDate>
		<dc:creator>Erez Mazor</dc:creator>
				<category><![CDATA[Dev Methods]]></category>
		<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[leader election]]></category>
		<category><![CDATA[spring]]></category>
		<category><![CDATA[zookeeper]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=106</guid>
		<description><![CDATA[Recently we had to implement an active-passive redundancy of a singleton service in our production environment where the general rule is always have &#8220;more than one of anything&#8221;. The main motivation is to alleviate the need to manually monitor and manage these services, whose presence is crucial to the overall health of the site. This [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://techblog.outbrain.com/wp-content/uploads/2011/07/Zoo.jpg" alt="Zoo" align="right" width="250" /></p>
<p>Recently we had to implement an active-passive redundancy of a singleton service in our production environment where the general rule is always have &#8220;more than one of anything&#8221;. The main motivation is to alleviate the need to manually monitor and manage these services, whose presence is crucial to the overall health of the site.</p>
<p>This means that we sometime have a service installed on several machines for redundancy, but only one of the is active at any given moment. If the active services goes down for some reason, another service rises to do its work. This is actually called <a href="http://en.wikipedia.org/wiki/Leader_election" title="leader election" target="_blank">leader election</a>. One of the most prominent open source implementation facilitating the process of leader election is <a href="http://zookeeper.apache.org/" title="Apache Zookeeper" target="_blank">Zookeeper</a>. So what is Zookeeper?</p>
<p>Originally developed by <a href="http://research.yahoo.com/project/1849" title="Yahoo reasearch" target="_blank">Yahoo reasearch</a>, Zookeepr acts as a service providing reliable distributed coordination. It is highly concurrent, very fast and suitable mainly for read-heavy access patterns.  Reads can be done against any node of a Zookeeper cluster while writes a quorum-based. To reach a quorum, Zookeeper utilizes an <a href="http://en.wikipedia.org/wiki/Atomic_broadcast" title="Atomic Broadcast" target="_blank">atomic broadcast protocol</a>. So how does it work?</p>
<p><span id="more-106"></span></p>
<h2>Connectivity, State and Sessions</h2>
<p>Zookeeper maintains an active connection with all its clients using a heartbeat mechanism. Furthermore, Zookeeper keeps a <a href="http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#ch_zkSessions" title="Zookeeper sessions" target="_blank">session</a> for each active client that is connected to it. When a client is disconnected from Zookeeper for more than a specified timeout the session expires. This means that Zookeeper has a pretty good picture of all the animals in its zoo. </p>
<h2>Data Model</h2>
<p>The Zookeeper <a href="http://zookeeper.apache.org/doc/r3.2.2/zookeeperProgrammers.html#ch_zkDataModel" title="Zookeeper data model" target="_blank">data model</a> consists of  a hierarchy of nodes, called <strong>ZNodes</strong>. ZNodes can hold a relatively small (efficiency is key here) amount of data, they are versioned and timestamped . There are several properties a ZNode can have that make them particularly useful for different use cases. Each node in Zookeeper can have the persistent, ephemeral and sequential flags. These determine the naming of the node and its behavior with respect to the client session. </p>
<ul>
<li>The <strong>persistent</strong> node is basically a managed data bin</li>
<li>The <strong>ephemeral</strong> node exists for the lifetime of its client session</li>
<li>The <strong>sequential</strong> node, when created, gets a unique number (sequence) suffixed to its name</li>
</ul>
<p>The latter two provide the means to implementing a variety of distribution tasks such as locks, queues, barriers, transactions, elections and other synchronization related tasks. </p>
<p>Here&#8217;s what an election path looks like in the Solr Cloud Admin Console:</p>
<p><img src="http://techblog.outbrain.com/wp-content/uploads/2011/07/Zookeeper-Election-Path-SolrCloudAdmin.jpg" alt="Service Election Path in Solr Clould Zookeeper Admin page" align="center" width="250" /></p>
<h2>Events</h2>
<p>Zookeeper allows its clients to watch for different events in its node hierarchy. This way clients can get notified of different changes in the distributed state of affairs and act accordingly. These watches are one timers and should be persisted again by the client after notification. The client is also responsible of handling session expiration which means that ephemeral nodes should be re-persisted after an expiration. </p>
<h2>Client Implementation</h2>
<p>Zookeeper requires a lot of boiler plate code, mostly around connectivity and for the majority of the time you will be doing the same things over and over. Luckily <a href="http://www.datameer.com" title="Stefan Groschupf's Blog" target="_blank">Stefan Groschupf</a> and <a href="http://phunt1.wordpress.com/" title="Patrick Hunt's Blog" target="_blank">Patrick Hunt</a> wrote a client abstraction called <a href="https://github.com/sgroschupf/zkclient" title="ZkClient on GitHub" target="_blank">ZkClient</a>.  I published a maven artifact for this on <a href="https://oss.sonatype.org/index.html#nexus-search;quick~zkclient" title="ZkClient on OSS Maven Repository" target="_blank">OSS</a> so it&#8217;s available to our build system.  The library also provides a persistent event notification mechanism in the form of listeners. </p>
<p>The next thing to do was to cook up a Spring factory bean for <strong>ZkClient</strong> and a template style class to act as an abstraction layer to Zookeeper operations. This ties in nicely into the Spring container which we use extensively:</p>
<div id="gist1073533" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-zkclient-xml-L1" rel="file-zkclient-xml-L1">1</span>
          <span class="line-number" id="file-zkclient-xml-L2" rel="file-zkclient-xml-L2">2</span>
          <span class="line-number" id="file-zkclient-xml-L3" rel="file-zkclient-xml-L3">3</span>
          <span class="line-number" id="file-zkclient-xml-L4" rel="file-zkclient-xml-L4">4</span>
          <span class="line-number" id="file-zkclient-xml-L5" rel="file-zkclient-xml-L5">5</span>
          <span class="line-number" id="file-zkclient-xml-L6" rel="file-zkclient-xml-L6">6</span>
          <span class="line-number" id="file-zkclient-xml-L7" rel="file-zkclient-xml-L7">7</span>
          <span class="line-number" id="file-zkclient-xml-L8" rel="file-zkclient-xml-L8">8</span>
          <span class="line-number" id="file-zkclient-xml-L9" rel="file-zkclient-xml-L9">9</span>
          <span class="line-number" id="file-zkclient-xml-L10" rel="file-zkclient-xml-L10">10</span>
          <span class="line-number" id="file-zkclient-xml-L11" rel="file-zkclient-xml-L11">11</span>
          <span class="line-number" id="file-zkclient-xml-L12" rel="file-zkclient-xml-L12">12</span>
          <span class="line-number" id="file-zkclient-xml-L13" rel="file-zkclient-xml-L13">13</span>
          <span class="line-number" id="file-zkclient-xml-L14" rel="file-zkclient-xml-L14">14</span>
          <span class="line-number" id="file-zkclient-xml-L15" rel="file-zkclient-xml-L15">15</span>
          <span class="line-number" id="file-zkclient-xml-L16" rel="file-zkclient-xml-L16">16</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-zkclient-xml-LC1">    <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;zkClient&quot;</span> <span class="na">class=</span><span class="s">&quot;org.projectx.zookeeper.ZkClientFactoryBean&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-zkclient-xml-LC2">        <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;ensemble&quot;</span> <span class="na">value=</span><span class="s">&quot;localhost:2181,localhost:2182,localhost:2183&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC3">        <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;connectionTimeout&quot;</span> <span class="na">value=</span><span class="s">&quot;2000&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC4">        <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;sessionTimeout&quot;</span> <span class="na">value=</span><span class="s">&quot;10000&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC5">        <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;stateListeners&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-zkclient-xml-LC6">            <span class="nt">&lt;list&gt;</span></div><div class="line" id="file-zkclient-xml-LC7">                <span class="nt">&lt;ref</span> <span class="na">local=</span><span class="s">&quot;zkStatsCollector&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC8">            <span class="nt">&lt;/list&gt;</span></div><div class="line" id="file-zkclient-xml-LC9">        <span class="nt">&lt;/property&gt;</span></div><div class="line" id="file-zkclient-xml-LC10">    <span class="nt">&lt;/bean&gt;</span></div><div class="line" id="file-zkclient-xml-LC11">    </div><div class="line" id="file-zkclient-xml-LC12">    <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;zkStatsCollector&quot;</span> <span class="na">class=</span><span class="s">&quot;org.projectx.zookeeper.ZookeeperClientStatsCollector&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC13">&nbsp;</div><div class="line" id="file-zkclient-xml-LC14">    <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;zkTemplate&quot;</span> <span class="na">class=</span><span class="s">&quot;org.projectx.zookeeper.ZookeeperTemplate&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-zkclient-xml-LC15">        <span class="nt">&lt;constructor-arg</span> <span class="na">ref=</span><span class="s">&quot;zkClient&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-zkclient-xml-LC16">    <span class="nt">&lt;/bean&gt;</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/erezmazor/1073533/raw/f1b83f5c5bda450091c7c53c0eb05dabd54bbf2a/ZkClient.xml" style="float:right">view raw</a>
          <a href="https://gist.github.com/erezmazor/1073533#file-zkclient-xml" style="float:right; margin-right:10px; color:#666;">ZkClient.xml</a>
          <a href="https://gist.github.com/erezmazor/1073533">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>The <strong>ZooKeeperClientStatsCollector</strong> is a listener implementation which collects stats about session connects/disconnects, exported to JMX as an MBean. </p>
<p>Now that we have a working data access layer we can start with the good stuff.</p>
<h2>Leader Election</h2>
<p>The <a href="http://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection" title="Leader Election - Zookeeper Documentation" target="_blank">Zookeeper documentation</a> describes in general terms how leader election is to be performed. The general idea is that all participants of the election process create an ephemeral-sequential node on the same election path. The node with the smallest sequence number is the leader. Each &#8220;follower&#8221; node listens to the node with the next lower sequence number to prevent a herding effect when the leader goes away. In effect this creates a linked list of nodes. When a node&#8217;s local leader dies it goes to election either find a smaller node or becoming the leader if it has the lowest sequence number.</p>
<p>The following image describes a scenario with 3 clients participating in the election process:</p>
<p><img src="http://techblog.outbrain.com/wp-content/uploads/2011/07/Zookeeper-Leader-Election.jpg" alt="Leader Election with Zookeeper" align="center" /></p>
<p> Each client participating in this process has to:</p>
<ol>
<li> Create an ephemeral-sequential node to participate under the election path</li>
<li>Find its leader and follow (watch) it</li>
<li>Upon leader removal go to election and find a new leader, or become the leader if no leader is to be found</li>
<li> Upon session expiration check the election state and go to election if needed</li>
</ol>
<p>One thing to consider here is the nature of the work being done by the leader. Make sure it&#8217;s state can be preserved if its leadership is revoked. Leader loss could be caused by any number of reasons including initiated restarts due to maintenance and releases. It could also be brought about by network partitioning. </p>
<p>Designing services for graceful recovery is a requirement for distributed systems not leader election. </p>
<p>Spring helps here because interception can be used to suppress method invocations of various services based on leadership status. Below is an example of an interception based leadership control:</p>
<div id="gist1073533" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L1" rel="file-leaderelectioninterceptor-xml-L1">1</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L2" rel="file-leaderelectioninterceptor-xml-L2">2</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L3" rel="file-leaderelectioninterceptor-xml-L3">3</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L4" rel="file-leaderelectioninterceptor-xml-L4">4</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L5" rel="file-leaderelectioninterceptor-xml-L5">5</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L6" rel="file-leaderelectioninterceptor-xml-L6">6</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L7" rel="file-leaderelectioninterceptor-xml-L7">7</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L8" rel="file-leaderelectioninterceptor-xml-L8">8</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L9" rel="file-leaderelectioninterceptor-xml-L9">9</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L10" rel="file-leaderelectioninterceptor-xml-L10">10</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L11" rel="file-leaderelectioninterceptor-xml-L11">11</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L12" rel="file-leaderelectioninterceptor-xml-L12">12</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L13" rel="file-leaderelectioninterceptor-xml-L13">13</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L14" rel="file-leaderelectioninterceptor-xml-L14">14</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L15" rel="file-leaderelectioninterceptor-xml-L15">15</span>
          <span class="line-number" id="file-leaderelectioninterceptor-xml-L16" rel="file-leaderelectioninterceptor-xml-L16">16</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-leaderelectioninterceptor-xml-LC1"><span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;leaderElectionProxyTemplate&quot;</span> <span class="na">class=</span><span class="s">&quot;org.springframework.aop.framework.ProxyFactoryBean&quot;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC2">    <span class="na">abstract=</span><span class="s">&quot;true&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC3">    <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;interceptorNames&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC4">        <span class="nt">&lt;list&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC5">            <span class="nt">&lt;value&gt;</span>leaderElectionTarget<span class="nt">&lt;/value&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC6">        <span class="nt">&lt;/list&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC7">    <span class="nt">&lt;/property&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC8"><span class="nt">&lt;/bean&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC9"> </div><div class="line" id="file-leaderelectioninterceptor-xml-LC10"><span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;leaderElectionTarget&quot;</span>  <span class="na">class=</span><span class="s">&quot;org.projectx.zookeeper.election.LeaderElectionTargetInterceptor&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC11"> </div><div class="line" id="file-leaderelectioninterceptor-xml-LC12"><span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;myService&quot;</span> <span class="na">parent=</span><span class="s">&quot;leaderElectionProxyTemplate&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC13">    <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;target&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC14">        <span class="nt">&lt;bean</span> <span class="na">class=</span><span class="s">&quot;org.projetx.service.MyService&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC15">    <span class="nt">&lt;/property&gt;</span></div><div class="line" id="file-leaderelectioninterceptor-xml-LC16"><span class="nt">&lt;/bean&gt;</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/erezmazor/1073533/raw/307a321a008a5a3bb8a8d6cdeb87b2fe15124a72/LeaderElectionInterceptor.xml" style="float:right">view raw</a>
          <a href="https://gist.github.com/erezmazor/1073533#file-leaderelectioninterceptor-xml" style="float:right; margin-right:10px; color:#666;">LeaderElectionInterceptor.xml</a>
          <a href="https://gist.github.com/erezmazor/1073533">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>The service <strong>myService</strong> is the one controlled by leader election, all it&#8217;s method are going to be suppressed or invoked based on leadership status.</p>
<p>Another implementation uses a quartz scheduler instance as its target target:</p>
<div id="gist1073533" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-leaderelectionquartzscheduler-xml-L1" rel="file-leaderelectionquartzscheduler-xml-L1">1</span>
          <span class="line-number" id="file-leaderelectionquartzscheduler-xml-L2" rel="file-leaderelectionquartzscheduler-xml-L2">2</span>
          <span class="line-number" id="file-leaderelectionquartzscheduler-xml-L3" rel="file-leaderelectionquartzscheduler-xml-L3">3</span>
          <span class="line-number" id="file-leaderelectionquartzscheduler-xml-L4" rel="file-leaderelectionquartzscheduler-xml-L4">4</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-leaderelectionquartzscheduler-xml-LC1"><span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;leaderElectionTarget&quot;</span></div><div class="line" id="file-leaderelectionquartzscheduler-xml-LC2">    <span class="na">class=</span><span class="s">&quot;org.projectx.zookeeper.election.quartz.SchedulerElectionTarget&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-leaderelectionquartzscheduler-xml-LC3">    <span class="nt">&lt;constructor-arg</span> <span class="na">ref=</span><span class="s">&quot;myScheduler&quot;</span> <span class="nt">/&gt;</span></div><div class="line" id="file-leaderelectionquartzscheduler-xml-LC4"><span class="nt">&lt;/bean&gt;</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/erezmazor/1073533/raw/15e4e8c2bf463fbb4645927b1ab2efd6154a1736/LeaderElectionQuartzScheduler.xml" style="float:right">view raw</a>
          <a href="https://gist.github.com/erezmazor/1073533#file-leaderelectionquartzscheduler-xml" style="float:right; margin-right:10px; color:#666;">LeaderElectionQuartzScheduler.xml</a>
          <a href="https://gist.github.com/erezmazor/1073533">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>This implementation puts a quartz scheduler on standby mode when leadership is revoked and resumes it when it&#8217;s granted (notice it will not actually stop running tasks, this will be allowed their natural completion, so in effect you may have a scheduled task running on two services due to partitioning scenarios. This means that whatever is scheduled has to be aware of another service possibly doing the same work. This problem can be easily solved with a Zookeeper barrier implementation,  more on that in another post. </p>
<p>But there&#8217;s more than leader election you could do with Zookeeper</p>
<p>If you wish to run your data center the democratic way, where important decisions are made in coordination with other stakeholders, Zookeeper certainly helps. </p>
<p>Leader Election with Spring is on <a href="https://github.com/erezmazor/projectx/tree/master/org.projectx.zookeeper" title="Zookeeper Leader Election with Spring on GitHub" target="_blank">GitHub</a>, Source shown in this post can be found on <a href="https://gist.github.com/1073533" title="Zookeeper Post Snippets on Gist" target="_blank">Gist</a>.</p>
<p>Zookeeper  <a href="http://zookeeper.apache.org/doc/current/" title="Zookeeper documentation" target="_blank">documentation</a> and <a href="http://wiki.apache.org/hadoop/ZooKeeper" target="_blank" title="Zookeeper Wiki">wiki</a>.</p>
<p>Happy Zookeeping!</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Feature Flags Made Easy</title>
		<link>http://techblog.outbrain.com/2011/07/feature-flags-made-easy/</link>
		<comments>http://techblog.outbrain.com/2011/07/feature-flags-made-easy/#comments</comments>
		<pubDate>Tue, 05 Jul 2011 08:20:48 +0000</pubDate>
		<dc:creator>Eran Harel</dc:creator>
				<category><![CDATA[Dev Methods]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[Feature-Flags]]></category>
		<category><![CDATA[Polymorphism]]></category>
		<category><![CDATA[spring]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=83</guid>
		<description><![CDATA[I recently participated in the ILTechTalk week. Most of the talks discussed issues like Scalability, Software Quality, Company Culture, and Continuous Deployment (CD). Since the talks were hosted at Outbrain, we got many direct questions about our concrete implementations. Some of the questions and statements claimed that Feature Flags complicate your code. What bothered most [...]]]></description>
			<content:encoded><![CDATA[<p>I recently participated in the <a href="http://www.iltt.org.il/">ILTechTalk</a> week. Most of the talks discussed issues like Scalability, Software Quality, Company Culture, and Continuous Deployment (CD). Since the talks were hosted at <a href="http://www.outbrain.com">Outbrain</a>, we got many direct questions about our concrete implementations. Some of the questions and statements claimed that Feature Flags complicate your code. What bothered most participants was that committing code directly to trunk requires addition of feature flags in some cases and that it may make their code base more complex.</p>
<p>While in some cases, feature flags may make the code slightly more complicated, it shouldn&#8217;t be so in most cases. The main idea I&#8217;m presenting here is that conditional logic can be easily replaced with polymorphic code. In fact, conditional logic can <strong>always</strong> be replaced by polymorphism.</p>
<p>Enough with the abstract talk&#8230;<br />
<!-- more --><br />
Suppose we have an application that contains some imaginary feature, and we want to introduce a feature flag. Below is a code snippet that developers normally come up with:</p>
<div id="gist1060044" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-ifelseapplication-java-L1" rel="file-ifelseapplication-java-L1">1</span>
          <span class="line-number" id="file-ifelseapplication-java-L2" rel="file-ifelseapplication-java-L2">2</span>
          <span class="line-number" id="file-ifelseapplication-java-L3" rel="file-ifelseapplication-java-L3">3</span>
          <span class="line-number" id="file-ifelseapplication-java-L4" rel="file-ifelseapplication-java-L4">4</span>
          <span class="line-number" id="file-ifelseapplication-java-L5" rel="file-ifelseapplication-java-L5">5</span>
          <span class="line-number" id="file-ifelseapplication-java-L6" rel="file-ifelseapplication-java-L6">6</span>
          <span class="line-number" id="file-ifelseapplication-java-L7" rel="file-ifelseapplication-java-L7">7</span>
          <span class="line-number" id="file-ifelseapplication-java-L8" rel="file-ifelseapplication-java-L8">8</span>
          <span class="line-number" id="file-ifelseapplication-java-L9" rel="file-ifelseapplication-java-L9">9</span>
          <span class="line-number" id="file-ifelseapplication-java-L10" rel="file-ifelseapplication-java-L10">10</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-ifelseapplication-java-LC1">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">runApplication</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-ifelseapplication-java-LC2">&nbsp;</div><div class="line" id="file-ifelseapplication-java-LC3">    <span class="c1">// ...</span></div><div class="line" id="file-ifelseapplication-java-LC4">    <span class="k">if</span> <span class="o">(</span><span class="n">useNewImplementation</span><span class="o">)</span> <span class="o">{</span></div><div class="line" id="file-ifelseapplication-java-LC5">      <span class="n">executeNewImaginaryFeatureImplementation</span><span class="o">();</span></div><div class="line" id="file-ifelseapplication-java-LC6">    <span class="o">}</span> <span class="k">else</span> <span class="o">{</span></div><div class="line" id="file-ifelseapplication-java-LC7">      <span class="n">executeOldImaginaryFeatureImplementation</span><span class="o">();</span></div><div class="line" id="file-ifelseapplication-java-LC8">    <span class="o">}</span></div><div class="line" id="file-ifelseapplication-java-LC9">    <span class="c1">// ...</span></div><div class="line" id="file-ifelseapplication-java-LC10">  <span class="o">}</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/eranharel/1060044/raw/c050a49aba1e3253f9e000bea997a781d9d78c0e/IfElseApplication.java" style="float:right">view raw</a>
          <a href="https://gist.github.com/eranharel/1060044#file-ifelseapplication-java" style="float:right; margin-right:10px; color:#666;">IfElseApplication.java</a>
          <a href="https://gist.github.com/eranharel/1060044">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>While this is a legitimate implementations in some cases, it does complicate your code base by increasing the <a href="http://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of your code. In some cases, the test for activation of the feature may recur in many place in the code, so this approach can quickly turn into a maintenance nightmare.</p>
<p>Luckily, implementing a feature flag using polymorphism is pretty easy. First, let&#8217;s define an interface for the imaginary feature and two implementations (old and new):</p>
<div id="gist1060044" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-imaginaryfeature-java-L1" rel="file-imaginaryfeature-java-L1">1</span>
          <span class="line-number" id="file-imaginaryfeature-java-L2" rel="file-imaginaryfeature-java-L2">2</span>
          <span class="line-number" id="file-imaginaryfeature-java-L3" rel="file-imaginaryfeature-java-L3">3</span>
          <span class="line-number" id="file-imaginaryfeature-java-L4" rel="file-imaginaryfeature-java-L4">4</span>
          <span class="line-number" id="file-imaginaryfeature-java-L5" rel="file-imaginaryfeature-java-L5">5</span>
          <span class="line-number" id="file-imaginaryfeature-java-L6" rel="file-imaginaryfeature-java-L6">6</span>
          <span class="line-number" id="file-imaginaryfeature-java-L7" rel="file-imaginaryfeature-java-L7">7</span>
          <span class="line-number" id="file-imaginaryfeature-java-L8" rel="file-imaginaryfeature-java-L8">8</span>
          <span class="line-number" id="file-imaginaryfeature-java-L9" rel="file-imaginaryfeature-java-L9">9</span>
          <span class="line-number" id="file-imaginaryfeature-java-L10" rel="file-imaginaryfeature-java-L10">10</span>
          <span class="line-number" id="file-imaginaryfeature-java-L11" rel="file-imaginaryfeature-java-L11">11</span>
          <span class="line-number" id="file-imaginaryfeature-java-L12" rel="file-imaginaryfeature-java-L12">12</span>
          <span class="line-number" id="file-imaginaryfeature-java-L13" rel="file-imaginaryfeature-java-L13">13</span>
          <span class="line-number" id="file-imaginaryfeature-java-L14" rel="file-imaginaryfeature-java-L14">14</span>
          <span class="line-number" id="file-imaginaryfeature-java-L15" rel="file-imaginaryfeature-java-L15">15</span>
          <span class="line-number" id="file-imaginaryfeature-java-L16" rel="file-imaginaryfeature-java-L16">16</span>
          <span class="line-number" id="file-imaginaryfeature-java-L17" rel="file-imaginaryfeature-java-L17">17</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-imaginaryfeature-java-LC1"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">ImaginaryFeature</span> <span class="o">{</span></div><div class="line" id="file-imaginaryfeature-java-LC2">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">executeFeature</span><span class="o">();</span></div><div class="line" id="file-imaginaryfeature-java-LC3"><span class="o">}</span></div><div class="line" id="file-imaginaryfeature-java-LC4">&nbsp;</div><div class="line" id="file-imaginaryfeature-java-LC5"><span class="kd">class</span> <span class="nc">OldImaginaryFeature</span> <span class="kd">implements</span> <span class="n">ImaginaryFeature</span> <span class="o">{</span></div><div class="line" id="file-imaginaryfeature-java-LC6">  <span class="nd">@Override</span></div><div class="line" id="file-imaginaryfeature-java-LC7">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">executeFeature</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-imaginaryfeature-java-LC8">    <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;old feature implementation&quot;</span><span class="o">);</span></div><div class="line" id="file-imaginaryfeature-java-LC9">  <span class="o">}</span></div><div class="line" id="file-imaginaryfeature-java-LC10"><span class="o">}</span></div><div class="line" id="file-imaginaryfeature-java-LC11">&nbsp;</div><div class="line" id="file-imaginaryfeature-java-LC12"><span class="kd">class</span> <span class="nc">NewImaginaryFeature</span> <span class="kd">implements</span> <span class="n">ImaginaryFeature</span> <span class="o">{</span></div><div class="line" id="file-imaginaryfeature-java-LC13">  <span class="nd">@Override</span></div><div class="line" id="file-imaginaryfeature-java-LC14">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">executeFeature</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-imaginaryfeature-java-LC15">    <span class="n">System</span><span class="o">.</span><span class="na">out</span><span class="o">.</span><span class="na">println</span><span class="o">(</span><span class="s">&quot;new feature implementation&quot;</span><span class="o">);</span></div><div class="line" id="file-imaginaryfeature-java-LC16">  <span class="o">}</span></div><div class="line" id="file-imaginaryfeature-java-LC17"><span class="o">}</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/eranharel/1060044/raw/585af7fe3ace5ecc7f4ba8a738d5632b72514242/ImaginaryFeature.java" style="float:right">view raw</a>
          <a href="https://gist.github.com/eranharel/1060044#file-imaginaryfeature-java" style="float:right; margin-right:10px; color:#666;">ImaginaryFeature.java</a>
          <a href="https://gist.github.com/eranharel/1060044">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>Now, let&#8217;s use the feature in our application, selecting the implementation at runtime:</p>
<div id="gist1060044" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-polymorphicapplication-java-L1" rel="file-polymorphicapplication-java-L1">1</span>
          <span class="line-number" id="file-polymorphicapplication-java-L2" rel="file-polymorphicapplication-java-L2">2</span>
          <span class="line-number" id="file-polymorphicapplication-java-L3" rel="file-polymorphicapplication-java-L3">3</span>
          <span class="line-number" id="file-polymorphicapplication-java-L4" rel="file-polymorphicapplication-java-L4">4</span>
          <span class="line-number" id="file-polymorphicapplication-java-L5" rel="file-polymorphicapplication-java-L5">5</span>
          <span class="line-number" id="file-polymorphicapplication-java-L6" rel="file-polymorphicapplication-java-L6">6</span>
          <span class="line-number" id="file-polymorphicapplication-java-L7" rel="file-polymorphicapplication-java-L7">7</span>
          <span class="line-number" id="file-polymorphicapplication-java-L8" rel="file-polymorphicapplication-java-L8">8</span>
          <span class="line-number" id="file-polymorphicapplication-java-L9" rel="file-polymorphicapplication-java-L9">9</span>
          <span class="line-number" id="file-polymorphicapplication-java-L10" rel="file-polymorphicapplication-java-L10">10</span>
          <span class="line-number" id="file-polymorphicapplication-java-L11" rel="file-polymorphicapplication-java-L11">11</span>
          <span class="line-number" id="file-polymorphicapplication-java-L12" rel="file-polymorphicapplication-java-L12">12</span>
          <span class="line-number" id="file-polymorphicapplication-java-L13" rel="file-polymorphicapplication-java-L13">13</span>
          <span class="line-number" id="file-polymorphicapplication-java-L14" rel="file-polymorphicapplication-java-L14">14</span>
          <span class="line-number" id="file-polymorphicapplication-java-L15" rel="file-polymorphicapplication-java-L15">15</span>
          <span class="line-number" id="file-polymorphicapplication-java-L16" rel="file-polymorphicapplication-java-L16">16</span>
          <span class="line-number" id="file-polymorphicapplication-java-L17" rel="file-polymorphicapplication-java-L17">17</span>
          <span class="line-number" id="file-polymorphicapplication-java-L18" rel="file-polymorphicapplication-java-L18">18</span>
          <span class="line-number" id="file-polymorphicapplication-java-L19" rel="file-polymorphicapplication-java-L19">19</span>
          <span class="line-number" id="file-polymorphicapplication-java-L20" rel="file-polymorphicapplication-java-L20">20</span>
          <span class="line-number" id="file-polymorphicapplication-java-L21" rel="file-polymorphicapplication-java-L21">21</span>
          <span class="line-number" id="file-polymorphicapplication-java-L22" rel="file-polymorphicapplication-java-L22">22</span>
          <span class="line-number" id="file-polymorphicapplication-java-L23" rel="file-polymorphicapplication-java-L23">23</span>
          <span class="line-number" id="file-polymorphicapplication-java-L24" rel="file-polymorphicapplication-java-L24">24</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-polymorphicapplication-java-LC1"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">PolymorphicApplication</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC2">  </div><div class="line" id="file-polymorphicapplication-java-LC3">  <span class="kd">private</span> <span class="kd">final</span> <span class="n">ImaginaryFeature</span> <span class="n">imaginaryFeature</span><span class="o">;</span></div><div class="line" id="file-polymorphicapplication-java-LC4">  </div><div class="line" id="file-polymorphicapplication-java-LC5">  <span class="kd">public</span> <span class="nf">PolymorphicApplication</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC6">    <span class="k">this</span><span class="o">.</span><span class="na">imaginaryFeature</span> <span class="o">=</span> <span class="n">createImaginaryFeature</span><span class="o">();</span></div><div class="line" id="file-polymorphicapplication-java-LC7">  <span class="o">}</span></div><div class="line" id="file-polymorphicapplication-java-LC8">&nbsp;</div><div class="line" id="file-polymorphicapplication-java-LC9">  <span class="kd">private</span> <span class="n">ImaginaryFeature</span> <span class="nf">createImaginaryFeature</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC10">    <span class="kd">final</span> <span class="n">String</span> <span class="n">featureClass</span> <span class="o">=</span> <span class="n">System</span><span class="o">.</span><span class="na">getProperty</span><span class="o">(</span><span class="s">&quot;PolymorphicApplication.imaginaryFeature.class&quot;</span><span class="o">);</span></div><div class="line" id="file-polymorphicapplication-java-LC11">    <span class="k">try</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC12">      <span class="k">return</span> <span class="o">(</span><span class="n">ImaginaryFeature</span><span class="o">)</span> <span class="n">Class</span><span class="o">.</span><span class="na">forName</span><span class="o">(</span><span class="n">featureClass</span><span class="o">).</span><span class="na">newInstance</span><span class="o">();</span></div><div class="line" id="file-polymorphicapplication-java-LC13">    <span class="o">}</span> <span class="k">catch</span> <span class="o">(</span><span class="kd">final</span> <span class="n">Exception</span> <span class="n">e</span><span class="o">)</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC14">      <span class="k">throw</span> <span class="k">new</span> <span class="nf">IllegalStateException</span><span class="o">(</span><span class="s">&quot;Failed to create ImaginaryFeature of class &quot;</span> <span class="o">+</span> <span class="n">featureClass</span><span class="o">,</span> <span class="n">e</span><span class="o">);</span></div><div class="line" id="file-polymorphicapplication-java-LC15">    <span class="o">}</span></div><div class="line" id="file-polymorphicapplication-java-LC16">  <span class="o">}</span></div><div class="line" id="file-polymorphicapplication-java-LC17">&nbsp;</div><div class="line" id="file-polymorphicapplication-java-LC18">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">runApplication</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-polymorphicapplication-java-LC19">&nbsp;</div><div class="line" id="file-polymorphicapplication-java-LC20">    <span class="c1">// ...</span></div><div class="line" id="file-polymorphicapplication-java-LC21">    <span class="n">imaginaryFeature</span><span class="o">.</span><span class="na">executeFeature</span><span class="o">();</span></div><div class="line" id="file-polymorphicapplication-java-LC22">    <span class="c1">// ...</span></div><div class="line" id="file-polymorphicapplication-java-LC23">  <span class="o">}</span></div><div class="line" id="file-polymorphicapplication-java-LC24"><span class="o">}</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/eranharel/1060044/raw/78411f44f0932d0d445926a3f30e5acd6da164a7/PolymorphicApplication.java" style="float:right">view raw</a>
          <a href="https://gist.github.com/eranharel/1060044#file-polymorphicapplication-java" style="float:right; margin-right:10px; color:#666;">PolymorphicApplication.java</a>
          <a href="https://gist.github.com/eranharel/1060044">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>Here, we initialized the imaginary feature member by reflection, using a class name specified as a system property. The <em>createImaginaryFeature()</em> method above is usually abstracted into a factory but kept as is here for brevity. But we&#8217;re still not done. Most of the readers would probably say that the introduction of a factory and reflection makes the code less readable and less maintainable. I have to agree &#8212; and apart from that, adding dependencies to the concrete implementations will complicate the code even more. Luckily, I have a secret weapon at my disposal. It is called <a href="http://en.wikipedia.org/wiki/Inversion_of_control">IoC</a>, (or DI). When using an IoC container such as <a href="http://www.springsource.org/">Spring</a> or <a href="http://code.google.com/p/google-guice/">Guice</a>, your code can be made extremely flexible, and implementing feature flags becomes a walk in the park.</p>
<p>Below is a rewrite of the PolymorphicApplication using Spring dependency injection:</p>
<div id="gist1060044" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-springpolymorphicapplication-java-L1" rel="file-springpolymorphicapplication-java-L1">1</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L2" rel="file-springpolymorphicapplication-java-L2">2</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L3" rel="file-springpolymorphicapplication-java-L3">3</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L4" rel="file-springpolymorphicapplication-java-L4">4</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L5" rel="file-springpolymorphicapplication-java-L5">5</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L6" rel="file-springpolymorphicapplication-java-L6">6</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L7" rel="file-springpolymorphicapplication-java-L7">7</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L8" rel="file-springpolymorphicapplication-java-L8">8</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L9" rel="file-springpolymorphicapplication-java-L9">9</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L10" rel="file-springpolymorphicapplication-java-L10">10</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L11" rel="file-springpolymorphicapplication-java-L11">11</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L12" rel="file-springpolymorphicapplication-java-L12">12</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L13" rel="file-springpolymorphicapplication-java-L13">13</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L14" rel="file-springpolymorphicapplication-java-L14">14</span>
          <span class="line-number" id="file-springpolymorphicapplication-java-L15" rel="file-springpolymorphicapplication-java-L15">15</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-springpolymorphicapplication-java-LC1"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">SpringPolymorphicApplication</span> <span class="o">{</span></div><div class="line" id="file-springpolymorphicapplication-java-LC2">&nbsp;</div><div class="line" id="file-springpolymorphicapplication-java-LC3">  <span class="kd">private</span> <span class="kd">final</span> <span class="n">ImaginaryFeature</span> <span class="n">imaginaryFeature</span><span class="o">;</span></div><div class="line" id="file-springpolymorphicapplication-java-LC4">&nbsp;</div><div class="line" id="file-springpolymorphicapplication-java-LC5">  <span class="kd">public</span> <span class="nf">SpringPolymorphicApplication</span><span class="o">(</span><span class="kd">final</span> <span class="n">ImaginaryFeature</span> <span class="n">imaginaryFeature</span><span class="o">)</span> <span class="o">{</span></div><div class="line" id="file-springpolymorphicapplication-java-LC6">    <span class="k">this</span><span class="o">.</span><span class="na">imaginaryFeature</span> <span class="o">=</span> <span class="n">imaginaryFeature</span><span class="o">;</span></div><div class="line" id="file-springpolymorphicapplication-java-LC7">  <span class="o">}</span></div><div class="line" id="file-springpolymorphicapplication-java-LC8">&nbsp;</div><div class="line" id="file-springpolymorphicapplication-java-LC9">  <span class="kd">public</span> <span class="kt">void</span> <span class="nf">runApplication</span><span class="o">()</span> <span class="o">{</span></div><div class="line" id="file-springpolymorphicapplication-java-LC10">&nbsp;</div><div class="line" id="file-springpolymorphicapplication-java-LC11">    <span class="c1">// ...</span></div><div class="line" id="file-springpolymorphicapplication-java-LC12">    <span class="n">imaginaryFeature</span><span class="o">.</span><span class="na">executeFeature</span><span class="o">();</span></div><div class="line" id="file-springpolymorphicapplication-java-LC13">    <span class="c1">// ...</span></div><div class="line" id="file-springpolymorphicapplication-java-LC14">  <span class="o">}</span></div><div class="line" id="file-springpolymorphicapplication-java-LC15"><span class="o">}</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/eranharel/1060044/raw/04a68fe0ef40e02cd4521aac371561d5cbf425d0/SpringPolymorphicApplication.java" style="float:right">view raw</a>
          <a href="https://gist.github.com/eranharel/1060044#file-springpolymorphicapplication-java" style="float:right; margin-right:10px; color:#666;">SpringPolymorphicApplication.java</a>
          <a href="https://gist.github.com/eranharel/1060044">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<div id="gist1060044" class="gist">
      <div class="gist-file">
        <div class="gist-data gist-syntax">



  <div class="file-data">
    <table cellpadding="0" cellspacing="0" class="lines highlight">
      <tr>
        <td class="line-numbers">
          <span class="line-number" id="file-applicationcontext-xml-L1" rel="file-applicationcontext-xml-L1">1</span>
          <span class="line-number" id="file-applicationcontext-xml-L2" rel="file-applicationcontext-xml-L2">2</span>
          <span class="line-number" id="file-applicationcontext-xml-L3" rel="file-applicationcontext-xml-L3">3</span>
          <span class="line-number" id="file-applicationcontext-xml-L4" rel="file-applicationcontext-xml-L4">4</span>
          <span class="line-number" id="file-applicationcontext-xml-L5" rel="file-applicationcontext-xml-L5">5</span>
          <span class="line-number" id="file-applicationcontext-xml-L6" rel="file-applicationcontext-xml-L6">6</span>
          <span class="line-number" id="file-applicationcontext-xml-L7" rel="file-applicationcontext-xml-L7">7</span>
          <span class="line-number" id="file-applicationcontext-xml-L8" rel="file-applicationcontext-xml-L8">8</span>
          <span class="line-number" id="file-applicationcontext-xml-L9" rel="file-applicationcontext-xml-L9">9</span>
          <span class="line-number" id="file-applicationcontext-xml-L10" rel="file-applicationcontext-xml-L10">10</span>
          <span class="line-number" id="file-applicationcontext-xml-L11" rel="file-applicationcontext-xml-L11">11</span>
          <span class="line-number" id="file-applicationcontext-xml-L12" rel="file-applicationcontext-xml-L12">12</span>
          <span class="line-number" id="file-applicationcontext-xml-L13" rel="file-applicationcontext-xml-L13">13</span>
          <span class="line-number" id="file-applicationcontext-xml-L14" rel="file-applicationcontext-xml-L14">14</span>
          <span class="line-number" id="file-applicationcontext-xml-L15" rel="file-applicationcontext-xml-L15">15</span>
          <span class="line-number" id="file-applicationcontext-xml-L16" rel="file-applicationcontext-xml-L16">16</span>
          <span class="line-number" id="file-applicationcontext-xml-L17" rel="file-applicationcontext-xml-L17">17</span>
          <span class="line-number" id="file-applicationcontext-xml-L18" rel="file-applicationcontext-xml-L18">18</span>
          <span class="line-number" id="file-applicationcontext-xml-L19" rel="file-applicationcontext-xml-L19">19</span>
          <span class="line-number" id="file-applicationcontext-xml-L20" rel="file-applicationcontext-xml-L20">20</span>
          <span class="line-number" id="file-applicationcontext-xml-L21" rel="file-applicationcontext-xml-L21">21</span>
          <span class="line-number" id="file-applicationcontext-xml-L22" rel="file-applicationcontext-xml-L22">22</span>
          <span class="line-number" id="file-applicationcontext-xml-L23" rel="file-applicationcontext-xml-L23">23</span>
          <span class="line-number" id="file-applicationcontext-xml-L24" rel="file-applicationcontext-xml-L24">24</span>
        </td>
        <td class="line-data">
          <pre class="line-pre"><div class="line" id="file-applicationcontext-xml-LC1"><span class="cp">&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC2"><span class="nt">&lt;beans</span> <span class="na">xmlns=</span><span class="s">&quot;http://www.springframework.org/schema/beans&quot;</span></div><div class="line" id="file-applicationcontext-xml-LC3">  <span class="na">xmlns:xsi=</span><span class="s">&quot;http://www.w3.org/2001/XMLSchema-instance&quot;</span></div><div class="line" id="file-applicationcontext-xml-LC4">  <span class="na">xsi:schemaLocation=</span><span class="s">&quot;http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.0.xsd&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC5">&nbsp;</div><div class="line" id="file-applicationcontext-xml-LC6">  <span class="nt">&lt;bean</span> <span class="na">class=</span><span class="s">&quot;org.springframework.beans.factory.config.PropertyPlaceholderConfigurer&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC7">    <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;systemPropertiesModeName&quot;</span> <span class="na">value=</span><span class="s">&quot;SYSTEM_PROPERTIES_MODE_OVERRIDE&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC8">    <span class="nt">&lt;property</span> <span class="na">name=</span><span class="s">&quot;properties&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC9">      <span class="nt">&lt;map&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC10">        <span class="nt">&lt;entry</span> <span class="na">key=</span><span class="s">&quot;imaginaryFeature.implementation.bean&quot;</span> <span class="na">value=</span><span class="s">&quot;oldImaginaryFeature&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC11">      <span class="nt">&lt;/map&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC12">    <span class="nt">&lt;/property&gt;</span>  </div><div class="line" id="file-applicationcontext-xml-LC13">  <span class="nt">&lt;/bean&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC14">&nbsp;</div><div class="line" id="file-applicationcontext-xml-LC15">  <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;application&quot;</span> <span class="na">class=</span><span class="s">&quot;com.eranharel.SpringPolymorphicApplication&quot;</span><span class="nt">&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC16">    <span class="nt">&lt;constructor-arg</span> <span class="na">ref=</span><span class="s">&quot;imaginaryFeature&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC17">  <span class="nt">&lt;/bean&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC18">  </div><div class="line" id="file-applicationcontext-xml-LC19">  <span class="nt">&lt;alias</span> <span class="na">name=</span><span class="s">&quot;${imaginaryFeature.implementation.bean}&quot;</span> <span class="na">alias=</span><span class="s">&quot;imaginaryFeature&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC20">  </div><div class="line" id="file-applicationcontext-xml-LC21">  <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;newImaginaryFeature&quot;</span> <span class="na">class=</span><span class="s">&quot;com.eranharel.NewImaginaryFeature&quot;</span> <span class="na">lazy-init=</span><span class="s">&quot;true&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC22">  </div><div class="line" id="file-applicationcontext-xml-LC23">  <span class="nt">&lt;bean</span> <span class="na">id=</span><span class="s">&quot;oldImaginaryFeature&quot;</span> <span class="na">class=</span><span class="s">&quot;com.eranharel.OldImaginaryFeature&quot;</span> <span class="na">lazy-init=</span><span class="s">&quot;true&quot;</span><span class="nt">/&gt;</span></div><div class="line" id="file-applicationcontext-xml-LC24"><span class="nt">&lt;/beans&gt;</span></div></pre>
        </td>
      </tr>
    </table>
  </div>

        </div>

        <div class="gist-meta">
          <a href="https://gist.github.com/eranharel/1060044/raw/d37c62234575171d20f339257ddae3e9a2637f4d/ApplicationContext.xml" style="float:right">view raw</a>
          <a href="https://gist.github.com/eranharel/1060044#file-applicationcontext-xml" style="float:right; margin-right:10px; color:#666;">ApplicationContext.xml</a>
          <a href="https://gist.github.com/eranharel/1060044">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
        </div>
      </div>
</div>

<p>The spring code above defines an application and 2 imaginary feature implementations. By default, the application is initialized with the <em>oldImaginaryFeature</em>, but this behavior can be overridden by specifying a <em>-DimaginaryFeature.implementation.bean=newImaginaryFeature</em> command line argument. Only a single feature implementation will be initialized by Spring, and the implementations may have dependencies.</p>
<p>Bottom line is: with a bit of extra preparation and correct design decisions, feature flags shouldn&#8217;t be a burden on your code base. By extra preparation, I mean extracting interfaces for your domain objects, using an IoC container, etc, which is something we should be doing in most cases anyway.</p>
<p>&nbsp;</p>
<p><em>Eran Harel is a Senior Software Developer at Outbrain.</em></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/07/feature-flags-made-easy/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Monitoring a Wild Beast</title>
		<link>http://techblog.outbrain.com/2011/05/monitoring-a-wild-beast/</link>
		<comments>http://techblog.outbrain.com/2011/05/monitoring-a-wild-beast/#comments</comments>
		<pubDate>Mon, 30 May 2011 14:45:59 +0000</pubDate>
		<dc:creator>Ori Lahav</dc:creator>
				<category><![CDATA[IT/Ops]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=57</guid>
		<description><![CDATA[by Marco Supino and Ori Lahav&#160; Yeah &#8212; I know, monitoring is a “must have” tool for every web application/operation functionality. If you have clients or partners that are dependant on your system, you don’t want to hurt their business (or your business) and react in time to issues. At Outbrain, we acknowledge that it [...]]]></description>
			<content:encoded><![CDATA[<div><em>by Marco Supino and Ori Lahav</em>&nbsp;</p>
<p>Yeah &#8212; I know, monitoring is a “must have” tool for every web application/operation functionality. If you have clients or partners that are dependant on your system, you don’t want to hurt their business (or your business) and react in time to issues. At Outbrain, we acknowledge that it is a tech system we are running on and tech systems are bound to fail. All you need is to catch the failure soon enough, understand the reason, react and fix. On DevOps terminology it is called TTD (time to detect) and TTR (time to recover).  To accomplish that, you need a good system that will tell the story and wake you up if something is wrong long before it effects the business.</p>
<p>This is the main reason why we invested a lot in a highly capable monitoring system. With it, we are doing Continuous Deployment and a superb monitoring system is integral part of the Immune System that allows us to react fast to flaws in the continuous stream of system changes.</p>
<p><span id="more-57"></span></p>
<p><strong>Note:</strong> Most of the stuff we have used to build it are open-source parts and projects that we gathered for our use.</p>
<p>A monitoring system is usually worthless if no one is looking at it. The common practice here is to have a NOC that is staffed 24/7. At Outbrain, we took another approach where instead of seating low level engineers in front of the screen all the time and alerting high level engineers when something happens, we built a system which is smart enough to alert the high level engineers in shift and tell them the story. We assume this alert can catch them wherever they are (at the playground with the kids, at the supermarket, etc&#8230;) and they can react to it if needed.</p>
<p>The only thing an Ops engineer in shift needs next to him is his smartphone and his notebook. Most of the time in order to understand the issue &#8212; he needs no more then a smartphone Gmail mailbox that looks like this:<br />
<a href="http://techblog.outbrain.com/wp-content/uploads/2011/05/device.png"><img class="size-medium wp-image-58 aligncenter" style="border: 1px solid black;" title="device" src="http://techblog.outbrain.com/wp-content/uploads/2011/05/device-180x300.png" alt="" width="180" height="300" /></a></p>
<p>&nbsp;</p>
<p>Gmail Labels are used to color alerts according to the alert type &#8212; making best use of the short space available on the Android smartphone.</p>
<p>Nagios alerts are also sent via Jabber&#8230; to handle situations where Mail is down. The Nagios alert that comes with it includes the details about the alert and if it is a trend graph that is going over a threshold it also has the graph that shows the trend and tells the story.</p>
<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/05/Screen-shot-2011-05-30-at-12.17.16-PM1.png"><img class="size-full wp-image-62 aligncenter" style="border: 1px solid black;" title="Screen shot 2011-05-30 at 12.17.16 PM" src="http://techblog.outbrain.com/wp-content/uploads/2011/05/Screen-shot-2011-05-30-at-12.17.16-PM1.png" alt="" width="640" height="395" /></a></p>
<p>&nbsp;</p>
<p>And&#8230;. yes &#8212; we use Nagios, one of the oldest and most robust monitoring systems. Its biggest advantage is that it doesn’t know to do ANYTHING. That system just says “script for me anything you want me to monitor” and by that gives you full freedom to monitor everything you wish. We did great things with Nagios from monitoring the most basic vitals of the machines up to each and every JMX metric that our services expose and on into the bandwidth between our data centers.</p>
<p>One of the things that Nagios does not do well enough is supplying an aggregated view of the system state. For that we have put a “Nagios dashboard” application that puts the state in front of the engineer looking at it.</p>
<p>Our dashboard is based on <a href="https://github.com/saz/Naglite3">NagLite3</a> with some local modifications to differently handle some situations and our naming scheme.<br />
<a href="http://techblog.outbrain.com/wp-content/uploads/2011/05/Screen-shot-2011-05-30-at-1.44.22-PM.png"><img class="size-full wp-image-63 aligncenter" style="border: 1px solid black;" title="Screen shot 2011-05-30 at 1.44.22 PM" src="http://techblog.outbrain.com/wp-content/uploads/2011/05/Screen-shot-2011-05-30-at-1.44.22-PM.png" alt="" width="640" height="332" /></a></p>
</div>
<div></div>
<div>Dashboard is fine but sometimes the engineer is not in front of a computer and has only his smartphone. How about chatting with the monitoring system and asking it “wassup?”</div>
<div>
We have created a custom Jabber agent, based on the Net::Jabber::Bot Perl module, and the Nagios::Status module to gather information from the running Nagios instance.<br />
The options we support are “wassup”,”ack”,”resched”,”dis/ena” etc.<br />
Alerts sent from Nagios have a uniq ID added to the subject, for example:</div>
<div>
<strong>Subject: 	mysql17.ladc1/Mysql Health Slave Check CRITICAL [ID#:367768]</strong></div>
<div><strong> </strong><br />
The event number can be used in the Jabber interface to acknowledge/resched that particular EventID.<br />
The great part of using Jabber is that it&#8217;s available on our smartphones (android based), so it&#8217;s easy to communicate with Nagios from everywhere (where Internet is available).</div>
<div>
<p>&nbsp;</p>
<p>As I said above, working with Continuous Deployments raises a lot of challenges regarding faults and the ability to investigate them, identifying the root cause and fixing it. At Outbrain, every engineer can change the production environment at any time. Deployments are usually very small and fast and the new functionality introduced is very isolated and can be found in the SVN commit note. Regular process is for the developer to commit the code with the proper note and flags inside it. Then the TeamCity builds it and runs all tests. If tests went well there is a GLU feeder that catches the SVN message hook and starts the deployment process.</p>
<p>As part of the deployment it hits 2 APIs:<br />
One for Yammer where this deployment is logged.</p>
<p><img class="aligncenter" style="border: 1px solid black;" src="https://lh4.googleusercontent.com/zjA-4SsEyy3rCYVjvW7ZR_QEuRwad2K-i78mdVLUc6kUJM5XFe2me3kRZ9NZgkKyPIrttKD_DV7ReaNymTMC4P5vjfgsrgfMVsRJkLxR7Mh229WBGQ" alt="" width="678" height="88" /></p>
<p>And the second one is to the Nagios where it is registered and from now on it will be shown as vertical line on each of the Nagios graphs. So, in case there is a problem that will be visible in the Nagios graphs, we will be able to attribute it to the deployment in proximity.<br />
The graphs are exposing 2 pieces of data: the revision number that can be searched in Yammer and the name of the engineer responsible for it so we can refer to him to ask questions. <strong>Note: </strong>this functionality was inspired from Etsy engineering &#8211; <a href="http://codeascraft.etsy.com/2010/12/08/track-every-release/">Track Every Release</a>.<br />
<img class="aligncenter" style="border: 1px solid black;" src="https://lh5.googleusercontent.com/03pHgTOOrxIt2Ef_FTWi5PYhKuo4HwwBdlM7kifdeElf_qmn4DP5XVuRf95v5yN0CZN8rbUnd-n9QG80q48q9uIA3_n83wljPyi_hLke946ipUem-A" alt="" width="637" height="219" /></p>
<p>&nbsp;</p>
<p>GLU Deployments also disable/enable notifications in Nagios for the host currently being deployed to, using <a href="http://exchange.nagios.org/directory/Addons/Passive-Checks/NRDP--2D-Nagios-Remote-Data-Processor/details">NRDP</a> &#8212; again, heavily modified to match our requirements, in order to avoid false alerts while nodes are restarted because of a version roll-out.</p>
<p>Another thing that sometimes help to analyze issues or at least distinguish between true and false alarm is to see what were the values at the same day last week (WOW -WeekOverWeek).<br />
In some of our Nagios graphs (where it is relevant), there is a red line showing how this metric behaved at the same time, last week.</p>
<p><img class="aligncenter" style="border: 1px solid black;" src="https://lh6.googleusercontent.com/eFBtjhBuIl6LoMP33H5tu_TQcTVdxRYjnzeQFWeByyqOgv1mQWkxefhpLscE4saFr6-pbQjvPUq4qOObxz6OSfIFSA8RfPPWCdwebR2P7nBSqOi6yQ" alt="" width="631" height="220" /></p>
<p>&nbsp;</p>
<p>Scaling Nagios:<br />
Our current Nagios implementation handles around 8.5k services and 500 hosts (physical/logical), and can grow much larger.</p>
<p>We run it across 3 DC’s around the US,  and our Nagios runs in a distributed mode. A Nagios node in each DC handles checks of the services in its “domain” (DNS domain in our case), and sends outputs to a central Nagios. The central Nagios is responsible for sending alerts and creating graphs. In case the “remote” Nagios nodes can’t contact the central Nagios, they will enable notifications and start sending alerts on their own, until the central comes back. Some cross-DC checks are employed, but we try not to use them on regular checks, only for sanity checks.<br />
In case a “remote” Nagios is not sending alerts on time, the Central Nagios will start polling the services for that “remote” agent that&#8217;s not working using Nagios’s “Freshness checking” option.<br />
<a href="http://nagios.sourceforge.net/docs/3_0/images/distributed.png">This</a> image from the Nagios Site shows a bit of the architecture, but because of limits in NSCA daemon, we used <a href="http://code.google.com/p/nrd/">NRD</a> which has some great features and works much better then NSCA.</p>
<p>&nbsp;</p>
<p>Some more tricks to make Nagios faster:<br />
1. Hold all the /etc of Nagios in RamDisk (measures needs to be taken to be able to restore it if the machine crashes).<br />
2. Hold the Nagios status file (status.dat) in RamDisk.<br />
NOTE: tmpfs can be swapped, so we chose the Linux RamDisk and increased the RamDisk size to 128mb to hold the storage we need.<br />
3. Use <a href="http://nagios.sourceforge.net/docs/3_0/embeddedperl.html">NagiosEmbbededPerl</a> where possible, and try to make it possible&#8230; the more the better. (assuming Perl is your favorite Nagios-plugin language).<br />
4. Use multiple Nagios instances, on different machines, and as close to the monitored service as possible.<br />
** The average service check latency here is &lt;0.5sec on all nodes running Nagios.</p>
<p>Graphs are based on NagiosGraph, again, with many custom modifications, like the “Week over Week” line, and the Deployment vertical marks.<br />
NagiosGraph is based on RRD, and multiple Perl cgi’s to make it run/view, we update ~8.5k graphs in every 5 minutes cycle.<br />
RRD Files can be used in other applications &#8212;  for example, we use it to build a Network Weather-Map, based on <a href="http://www.network-weathermap.com/">PHP WeatherMap</a>, to create a viewable image of the links between our DC’s and the network load and latency between them, graphs and WoW info.</p>
<p><img class="aligncenter" style="border: 1px solid black;" src="https://lh3.googleusercontent.com/lh8-27eEJzAZ-QkZQ1-HfpOpByqqXprJPtm0CKyAFMvB6C--36cye80fwgEjIcS0EBnOlnTplbXqBlcjbVs1mF2CPY-QoBut5R25NuJ2pufuEsz6RQ" alt="" width="560" height="343" /></p>
<p>Summary:<br />
These are just few examples of monitoring improvements we have put to the system to take it to the level of comfort that will ensure that we catch problems before our customers and partners and can react fast to solve them.<br />
If you have more questions suggestions or comments &#8211; please do not hesitate to write a comment below.</p>
<p>Acknowledgements/Applications used:<br />
<a href="http://www.nagios.org/">Nagios</a> &#8211; The Industry Standard In IT Infrastructure Monitoring<br />
<a href="http://nagiosgraph.sourceforge.net/">NagiosGraph</a> &#8211; Data collection and graphing for Nagios<br />
Nagios Notifications &#8211; based on <a href="http://nagios.frank4dd.com/howto/nagios-flexible-notifications.htm">Frank4dd.com</a> scripts, with changes.<br />
<a href="http://code.google.com/p/nrd/">NRD</a> &#8211; NSCA Replcament.<br />
<a href="http://exchange.nagios.org/directory/Addons/Passive-Checks/NRDP--2D-Nagios-Remote-Data-Processor/details">NRDP</a> &#8211; Nagios-Remote-Data-Processor<br />
<a href="http://www.network-weathermap.com/">PHP WeatherMap</a> &#8211; Network WeatherMap<br />
<a href="https://github.com/saz/Naglite3">NagLite3</a> &#8211; Nagios status monitor for a NOC or operations room.<br />
<a href="https://www.yammer.com/login">Yammer</a> &#8211; The Enterprise Social Network<br />
<a href="http://en.wikipedia.org/wiki/Extensible_Messaging_and_Presence_Protocol">Jabber</a><br />
<a href="https://github.com/linkedin/glu">GLU</a> &#8211; Deployment and monitoring automation platform &#8211; by LinkedIn<br />
and many more great OSS projects&#8230;</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/05/monitoring-a-wild-beast/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Under the Hood of Our Algorithmic Engine &#8211; How We Serve Content Recommendations</title>
		<link>http://techblog.outbrain.com/2011/04/under-the-hood-of-our-algorithmic-engine-how-we-serve-content-recommendations/</link>
		<comments>http://techblog.outbrain.com/2011/04/under-the-hood-of-our-algorithmic-engine-how-we-serve-content-recommendations/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 08:59:13 +0000</pubDate>
		<dc:creator>Shlomy Boshy</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=24</guid>
		<description><![CDATA[Let me tell you a little on how we actually give content recommendations here at Outbrain. This will be only a short introduction. We might elaborate on some of the below issues in future posts. Our main goal is to serve good content recommendations to readers on the Internet. The typical situation is a user reading [...]]]></description>
			<content:encoded><![CDATA[<div style="margin: 5pt 1em 1em 5pt; float: left;"><a href="http://techblog.outbrain.com/wp-content/uploads/2011/04/IMG_00222.jpg"><img class="alignnone size-medium wp-image-42" title="Outbrain Algorithms Team" src="http://techblog.outbrain.com/wp-content/uploads/2011/04/IMG_00222-300x225.jpg" alt="Outbrain Algorithms Team" width="300" height="225" /></a></div>
<p><a href="http://techblog.outbrain.com/wp-content/uploads/2011/04/IMG_00222.jpg"></a>Let me tell you a little on how we actually give content recommendations here at Outbrain. This will be only a short introduction. We might elaborate on some of the below issues in future posts.</p>
<p>Our main goal is to serve good content recommendations to readers on the Internet. The typical situation is a user reading a content page. We want to recommend content for further reading, which is a &#8220;good&#8221; recommendation.</p>
<p><span id="more-24"></span></p>
<p>What is a good recommendation? We believe a good recommendation is one that is interesting to the user and is both timely and relevant. The user should not only want to click the recommendation title or image we put on the recommendations widget, but also like the page that he/she sees after the click and even want to continue investigating more content on the recommendation&#8217;s site. All recommendations should give a good experience to the user so he/she will become familar with our widget and know that it gives good recommendations. So with this goal in mind,  we do not use any &#8220;click traps&#8221; &#8212; we want a long-term relationship with the user, not a single click.</p>
<p>Note the target here is to serve content which is interesting to the user, and not to serve content which is relevant to content the user is already reading. Relevancy becomes just one of the methods to get interesting recommendations, not the target.</p>
<p>In order to serve good recommendations we run a large set of algorithms in parallel and get a set of candidate recommendations. Then, we decide which recommendations to serve to the user by machine learning techniques. I.e., We try to learn what the user or a group of users (the simplest group can be the readers of a specific site, but it can be more complex) like to read and serve it more often.</p>
<p>&nbsp;</p>
<p>We can divide the algorithmic methods used to<strong> contextual</strong> algorithms, <strong>behavioral</strong> algorithms and<strong> personal </strong>algorithms:</p>
<p><strong>Contextual algorithms</strong> analyze the context the user is reading now and finds relevant content. Relevant content can be interesting to the user. We use the <a href="http://lucene.apache.org/solr/">Solr</a> search engine with some enhacements we did here for the search, and we can also classify content into categories and use categories matching instead of a search.</p>
<p>&nbsp;</p>
<p><strong>Behavioral algorithms</strong> learn a set of statistical behaviors of groups of users. The simplest algorithms can bring the most visited documents in a site, the most rated documents, the ones with most social sharing events and so on. More complex algorithms can apply colleborative filtering methods to get other content which people who &#8220;liked&#8221; this content also liked.</p>
<p>From our experience we have seen behavioral algorithms perform differently than contextual algorithms. The best performance comes from giving a few recommendations from each type. On different sites, different algorithms give different results.<br />
<strong>Personal algorithms</strong> are a specific subtype of behavioral algorithms, which learn the properties and history of a user, or a group of users, and give recommendations which will be interesting specifically to them. Note personalization brings with it both privacy issues and scalability issues. For example, we need to decide where to save the user data for fast access. We currently use cookies for this purpose (which are limited in size) but are considering a storage in the server level in the future. A user can opt-out from saving data on his actions.</p>
<p>&nbsp;</p>
<p>Scalability is an issue in recommendations serving. We serve recommendations in an average of about 30 milliseconds. To achieve this fast serving time we do most processing in offline, saving results in a memory cache tool <a href="http://memcached.org/">Memcached</a>. We use key-value databases (like <a href="http://cassandra.apache.org/">Cassandra</a>) on top of traditional rational databases (<a href="http://www.mysql.com/">MySql</a>) to get a good response time for getting offline prefetched answers to queries (Data needed to calculate recommendations for documents, for example).</p>
<p>Time relevancy is a big issue &#8212; how do you decide if a document is still relevant? Some documents are always good, &#8220;evergreen&#8221; as we can them, but many age very fast. An article on a future sports event will age when the event happens. Stock market status reports become irrelevant very fast. We have some behavioral methods to try to understand users like these recommendations less over time, thus we stop serving them. Still some titles will make people click over and over even when the content is not relevant any more. Identifying relevancy is an interesting challenge.<br />
We are totally measureable. We use <a href="http://hive.apache.org/">Hive/Hadoop</a> to create statistics about various aspects of the system. As an example, we know how good each algorithm performed in any environment (e.g. data center) in any source every hour, so we can always monitor logical performance of our algorithms and make intelligence decisions. We use the historic data for research. We even have learning algorithms that analyze current algorithmic performance and give best performing recommendations in a page or site more often for this page or site.</p>
<p>&nbsp;</p>
<p>Development is done mostly in Java. We develop really Agile and fast. We use continuous deployment and have a staging environment to which we can deploy new algorithms and ideas very quickly. This means we can see how a new algorithm performs on real production data (a small fraction of it) a very short time after it was developed. We can do AB testing on algorithm properties and decide which value works best for each parameter.</p>
<p>&nbsp;</p>
<p><em>Shlomy Boshy is Outbrain&#8217;s Algorithms Team Leader</em></p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/04/under-the-hood-of-our-algorithmic-engine-how-we-serve-content-recommendations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slideshow &#8211; Continuous Deployment in Outbrain</title>
		<link>http://techblog.outbrain.com/2011/04/slideshow-continuous-deployment-in-outbrain/</link>
		<comments>http://techblog.outbrain.com/2011/04/slideshow-continuous-deployment-in-outbrain/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 20:27:52 +0000</pubDate>
		<dc:creator>Ori Lahav</dc:creator>
				<category><![CDATA[Dev Methods]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=19</guid>
		<description><![CDATA[Itai, our head of R&#38;D gave a presentation this week about Continuous Deployment and how we actually do it. Here is is: Itai Hochman &#8211; Continuous Deployment in Outbrain &#8211; AgileIL11&#160; View more presentations from AgileSparks Enjoy!!!]]></description>
			<content:encoded><![CDATA[<p>Itai, our head of R&amp;D gave a presentation this week about Continuous Deployment and how we actually do it.<br />
Here is is:</p>
<div id="__ss_7598761" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"><a title="Itai Hochman - Continuous Deployment in Outbrain - AgileIL11" href="http://www.slideshare.net/AgileSparks/itai-hochman-continues-deployment-in-outbrain-agile2011">Itai Hochman &#8211; Continuous Deployment in Outbrain &#8211; AgileIL11</a></strong>&nbsp;</p>
<div style="padding: 5px 0 12px;">View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/AgileSparks">AgileSparks</a></div>
</div>
<p>Enjoy!!!</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/04/slideshow-continuous-deployment-in-outbrain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LEGO Bricks &#8211; Our Data Center Architecture</title>
		<link>http://techblog.outbrain.com/2011/04/lego-bricks-our-data-center-architechture/</link>
		<comments>http://techblog.outbrain.com/2011/04/lego-bricks-our-data-center-architechture/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 20:59:55 +0000</pubDate>
		<dc:creator>Ori Lahav</dc:creator>
				<category><![CDATA[IT/Ops]]></category>

		<guid isPermaLink="false">http://techblog.outbrain.com/?p=10</guid>
		<description><![CDATA[Some of you might ask, &#8221;why is he telling us about datacenter architecture? Don&#8217;t Cloud Services solve this already?&#8221; and some of you that already know me and what my opinions are on the subject will not be surprised. Yes, I&#8217;m not a fan of the Cloud Services and that is another discussion, however, there are some [...]]]></description>
			<content:encoded><![CDATA[<div style="margin: 5pt 1em 1em 5pt; float: left;"><img class="alignright" src="http://blog.milford.io/wp-content/uploads/2010/07/SIP-CLOSEUP-sm.jpg" alt="" width="300" /></div>
<p>Some of you might ask, &#8221;why is he telling us about datacenter architecture? Don&#8217;t Cloud Services solve this already?&#8221; and some of you that already know me and what my opinions are on the subject will not be surprised. Yes, I&#8217;m not a fan of the Cloud Services and that is <a href="https://groups.google.com/group/iltechtalks/browse_thread/thread/1473339f9ac00d3d" target="_blank">another discussion</a>, however, there are some advantages for using Cloud Services that giving them up by establishing a datacenter felt somehow wrong for us.</p>
<p>Here are 2 of them:</p>
<p>1. Grow As You Go &#8211; When you build a datacenter you take on commitments for space (racks or cages) and high profile network gear that are investments you have to pay for in advance or before you really need to use them. This is not an issue for a Cloud-based setup because as you grow you spin up more instances.</p>
<p>2. Disaster Recovery Headroom &#8211; With a datacenter-based setup, in order to properly handle disaster recovery you need to double your setup so you can always move all your traffic to the other datacenter in case of disaster, which means doubling the hardware you buy. In the Cloud, this is also a non-issue.</p>
<p>These 2 arguments are very much correct, however even taking those into consideration, our setup is much more efficient in cost then any Cloud offering. The logic behind it is what I want to share here.</p>
<p>Traditionally, when a company&#8217;s business grows, a single rack or maybe 2 are not sufficient and you have the need to allocate adjacent racks space in a co-located datacenter. This makes your recurring expenses grow since you actually pay for reserved space that you don&#8217;t really use. It&#8217;s a big waste of your $$$. Once we managed to set more than one location for our service we found out that it will be much cheaper to build multiple small datacenters with a small space footprint than committing to a large space that we will not use most of the time. Adjacent space of at least 4 racks is much easier to find in most co-location facilities.  More than that, our <a href="http://atlanticmetro.net/" target="_blank">co-location provider</a> agreed to give us 2 active racks with first right of refusal for the adjacent 2 racks so we actually pay for those we use.</p>
<p>This architecture also simplified much of our network gear requirements. Assuming each &#8220;LEGO Brick&#8221; is small, it needs to handle only a portion of the traffic and not all of it. This does not require high profile network gear and very cheap Linux machines are sufficient for handling most of the network roles including load balancing, etc.</p>
<p>We continued this approach for choosing the intra-LEGO Brick switching gear. Here we decided to use Brocade stackable switching technology. In general, it means that you can put a switch per cabinet and wire all the machines to it. When you add another cabinet you simply connect them in a chain that looks and acts like a single switch. You can grow such a stack up to 8 switches. At Outbrain, we try to eliminate single points of failure, so we have 2 stacks and machines are connected to both of them. Again, the stacking technology gave us the ability to not pay for network gear before we actually need it.</p>
<p>But what about Disaster Recovery (DR) headroom? (We decided to implement more than one location for disaster recovery as soon as we started generating revenue for our partners.)  As I said, this is a valid argument. When we had 2 datacenters, 50% of our computing power was dedicated to DR and not used in normal time. This was not ideal and we needed to improve that. Actually, the LEGO bricks helped here once again. This week we opened our 3rd datacenter in Chicago. The math is simple, by adding another location our headroom dropped to only 33% which is a lot of $$$ savings when your business grows. When we add the 4th it will drop to 25%, etc.</p>
<p>I guess now you understand the logic and we can mention some fun info about the DC implementation itself:</p>
<ul>
<li>Datacenters communicate via a dedicated link, powered by our co-location vendor.</li>
<li>We use a <a href="http://www.cotendo.com/" target="_blank">Global DNS service</a> to balance traffic between the datacenters.</li>
<li>In our newer datacenters, the power billing is a pay-per-use &#8212; no flat fees which again enable us to not pay for power we don&#8217;t use. It also motivates us to power off unneeded hardware and save power costs while saving the planet <img src='http://techblog.outbrain.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>Power is 208V which is more efficient than the regular 110v.</li>
<li>All servers are connected to a KVM to enable remote access to BIOS config if needed &#8212; much easier to manage from Israel and in general.</li>
<li>We have a lot of <a href="http://www.dell.com/us/en/enterprise/servers/poweredge-c6100/pd.aspx?refid=poweredge-c6100&amp;cs=555&amp;s=biz" target="_blank">Dell C6100s</a> in our datacenters so each node there is also connected to an IPMI network in order to remotely restart each node without rebooting all 4 nodes in that chassis.</li>
<li>You can read more about assembling these C6100s in <a href="http://blog.milford.io/2010/07/some-notes-on-dells-c6100-multi-node-server-chassis/" target="_blank">Nathan&#8217;s detailed post</a>.</li>
</ul>
<p>I guess your question is &#8220;what does it take to manage this in terms of labor?&#8221; That answer is&#8230; not too much.</p>
<p>&nbsp;</p>
<p>The Outbrain Operations team is a group of 4 Ops engineers. Most of the time they are not doing much related to the physical infrastructure, but like other ops teams, most of the time they handle the regular tasks of configuring infrastructure softwares (we use all of them from open source like MySQL, Cassandra, Hadoop, Hive, ActiveMQ, etc&#8230;), monitoring, code and system deployment (we heavily use <a href="http://www.opscode.com/chef/" target="_blank">Chef</a>) etc.</p>
<p>In general, Operations&#8217; role in the company is to keep the serving fast, reliable and (very important) cost-efficient.  This is the main reason why we invest time, knowledge and innovation in architecting our datacenters wisely.</p>
<p>I guess one of the next posts will be about our new Chicago datacenter and the concept of the &#8220;Dataless Datacenter.&#8221;</p>
<p>Ori</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://techblog.outbrain.com/2011/04/lego-bricks-our-data-center-architechture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
