<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark W. Shead &#187; Programming</title>
	<atom:link href="http://blog.markwshead.com/category/technology/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.markwshead.com</link>
	<description>Mark's thoughts on being Mark Shead and other random subjects</description>
	<lastBuildDate>Thu, 29 Jul 2010 03:22:33 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Good and Bad Programming Languages</title>
		<link>http://blog.markwshead.com/513/good-and-bad-programming-languages/</link>
		<comments>http://blog.markwshead.com/513/good-and-bad-programming-languages/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 22:33:34 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/513/good-and-bad-programming-languages/</guid>
		<description><![CDATA[Often when someone says a particular programming language is bad, they are referring more to the common practice associated with that language than the language itself. Many times they are really complaining about their own poor programming habits more than the specific language. Sometimes these habits are shared by the entire culture built around a [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Often when someone says a particular programming language is bad, they are referring more to the common practice associated with that language than the language itself. Many times they are really complaining about their own poor programming habits more than the specific language. Sometimes these habits are shared by the entire culture built around a particular language.</p>
<p>Perl is a good example of this. People complain about how difficult it is to read and then proceed to write awful unreadable code. Perl can be very readable, but its terseness makes it easy for people to write huge lines of code that do 10 or 11 different things. You can do the same thing in Java, but most people try to avoid a single line that is 500 characters long because it is a pain to scroll back and forth sideways to read the code.</p>
<p>Sometimes the lack of a particular restriant can inspire horrible code.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/513/good-and-bad-programming-languages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Who Writes Wikipedia Algorithm</title>
		<link>http://blog.markwshead.com/356/who-write-wikipedia-algorithm/</link>
		<comments>http://blog.markwshead.com/356/who-write-wikipedia-algorithm/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 17:50:16 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[mediawiki. wikipedia]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/?p=356</guid>
		<description><![CDATA[Based on the number of edits, Wikipedia appears to be written by a small number of people.  Aaron Swartz did some testing and came to the conclusion most content in the final revision comes from people who don&#8217;t even have a login. I have been looking for a way to determine the percentage each author [...]]]></description>
			<content:encoded><![CDATA[<p><a class="post_image_link" href="http://blog.markwshead.com/356/who-write-wikipedia-algorithm/" title="Permanent link to Who Writes Wikipedia Algorithm"><img class="post_image alignnone" src="http://blog.markwshead.com/wp-content/uploads/2010/07/wikipedia.png" width="401" height="169" alt="Post image for Who Writes Wikipedia Algorithm" /></a>
</p><p>Based on the number of <strong>edits</strong>, Wikipedia appears to be written by a small number of people.  Aaron Swartz did some <a href="http://www.aaronsw.com/weblog/whowriteswikipedia">testing</a> and came to the conclusion most <strong>content</strong> in the final revision comes from people who don&#8217;t even have a login.</p>
<p>I have been looking for a way to determine the percentage each author contributed to a particular page in Wikipedia or MediaWiki, something I assumed would be trivial, but it turns out is much more complicate. For performance reasons, MediaWiki (the software that powers Wikipedia) saves the entire text of a new revision&#8211;not just what has changed so every edit results in a completely new copy of the page being saved.  This means to see who contributed what, requires going back through each edit and comparing it to see what was included in the final version.</p>
<p>I started looking at Aaron&#8217;s method to see if it might be useful.  What he did is briefly described <a href="http://www.aaronsw.com/weblog/whowriteswikipedia_fn2">here</a>. To the best of my understanding this is the basic process he used:</p>
<blockquote><p>Find the longest matching string between the first revision and the final revision from the Wikipedia dumpfile.  Mark that string in the final revision as having come from the author of the first revision.  Continue this with the first revision and next largest matching string until there are no more matching strings that haven&#8217;t been marked.  Move to the second revision and repeat the process.  When you finish, you should have a version where every character is marked based on where it came from.</p></blockquote>
<p>So lets look at an example:</p>
<blockquote><p>Bob Revision 1: The fox jumped the hound.<br />
Joe Revision 2: The quick brown fox jumped over the lazy hound.</p></blockquote>
<p>So our final version is:</p>
<blockquote><p>The quick brown fox jumped over the lazy hound.</p></blockquote>
<p>Now lets find the longest matching string between the final version and Bob&#8217;s initial edit.  We&#8217;ll mark Bob&#8217;s contributions in Red:</p>
<blockquote><p>The<span style="color: #ff0000;"> fox jumped</span> the hound.<br />
The quick brown <span style="color: #ff0000;">fox jumped </span>over the lazy hound.</p></blockquote>
<p>Ok so that is the longest string, now lets find the second longest:</p>
<blockquote><p>The <span style="color: #ff0000;">fox  jumped</span> the <span style="color: #ff0000;">ho<span style="color: #ff0000;">und</span></span><span style="color: #ff0000;">.</span><br />
The quick brown <span style="color: #ff0000;">fox jumped</span> over the lazy <span style="color: #ff0000;">hou<span style="color: #ff0000;">nd</span></span><span style="color: #ff0000;">.</span></p></blockquote>
<p>Repeat:</p>
<blockquote><p>The<span style="color: #ff0000;"> fox   jumped</span> <span style="color: #ff0000;">the</span> <span style="color: #ff0000;">ho<span style="color: #ff0000;">und</span></span><span style="color: #ff0000;">.</span><br />
The quick brown <span style="color: #ff0000;">fox jumped</span> over <span style="color: #ff0000;">the</span> lazy <span style="color: #ff0000;">ho<span style="color: #ff0000;">und</span></span><span style="color: #ff0000;">.</span></p></blockquote>
<p>And once more:</p>
<blockquote><p><span style="color: #ff0000;">The fox    jumped the hound.</span><br />
<span style="color: #ff0000;">The</span> quick brown <span style="color: #ff0000;">fox jumped</span> over <span style="color: #ff0000;">the</span> lazy <span style="color: #ff0000;">hound.</span></p></blockquote>
<p><span style="color: #ff0000;"><span style="color: #000000;">Going through the same process marking Joe&#8217;s revision in blue produces:</span></span></p>
<blockquote><p><span style="color: #ff0000;">The</span> <span style="color: #0000ff;">quick brown</span> <span style="color: #ff0000;">fox jumped</span> <span style="color: #0000ff;">over</span> <span style="color: #ff0000;">the</span> <span style="color: #0000ff;">lazy</span> <span style="color: #ff0000;">hound.</span></p></blockquote>
<p><span style="color: #ff0000;"><span style="color: #000000;">So we can easily see that Joe contributed 18 non space characters and Bob contributed 21 non space characters that made it into the final revision.  This is just fine if you are simply adding information.  It gets a bit more tricky when you are removing words because parts of words that have been removed will match words that have been added later. </span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">Consider this scenario:</span></span></p>
<blockquote><p>Bob Revision 1: The icky quaint rowdy fox jumped over the hound.<br />
Joe Revision 2: The quick brown fox jumped over the lazy hound.</p></blockquote>
<p>Now if we do the same process we get:</p>
<blockquote><p><span style="color: #ff0000;">The</span> <span style="color: #ff0000;">qu</span><span style="color: #ff0000;">ick</span> <span style="color: #0000ff;">b</span><span style="color: #ff0000;">rown fox jumped over the</span> <span style="color: #0000ff;">laz</span><span style="color: #ff0000;">y hound.</span></p></blockquote>
<p><span style="color: #ff0000;"><span style="color: #000000;">Even though the words &#8220;icky&#8221;, &#8220;quaint&#8221; and &#8220;rowdy&#8221; added by Bob have been removed in the final version, they are still matching parts of words.  ICK of &#8220;icky&#8221; is matching the last part of quICK.  The QU of &#8220;quaint&#8221; is matching the first part of QUick, etc.</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">This approach is strongly biased toward the person who started the article or contribute early on&#8211;particularly if they added a lot of text that was later removed.  What would happen if the person who originally edited the article also pasted in a copy of the alphabet several hundred times at the bottom of their text?  It would match everything added later regardless of who added it. Now people probably aren&#8217;t doing that, but if they add a bunch of text that eventually gets removed, their removed text will still match a lot of text in subsequent revisions.</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">Still, this approach isn&#8217;t unreasonable and probably gives fair results as long as someone isn&#8217;t specifically trying to game the system.  Aaron&#8217;s method of recursively looking for the longest string is important if you need to see who did what.  If you just need to know how many characters each person contributed (and you are fine with the level of accuracy discussed above), there is a much more efficient approach.</span></span></p>
<h3><span style="color: #ff0000;"><span style="color: #000000;">More efficient method<br />
</span></span></h3>
<p><span style="color: #ff0000;"><span style="color: #000000;">The trick is to realize that this method is going to attribute any character in the final revision to the earliest revision to introduce that character.  So if revision 1 has three a&#8217;s, three a&#8217;s of the final revision will be credited to the author of the first revision&#8211;regardless of where they occur. Because of this, there is no advantage of recursively matching the longest string unless you are trying to produce an annotated version showing who wrote what block of text. Even then  you run into the problem shown above where subsequent edits don&#8217;t get full credit. </span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">In other words, we can get the same results simply by counting the number of times each letter appears in each revision starting with the earliest revision.  That revision then receives credit for the occurrence of those letters in the final version as long as those letters haven&#8217;t been already credited to another revision.</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">Here is a simple example:</span></span></p>
<blockquote><p><span style="color: #ff0000;"><span style="color: #000000;">Revision 1: AB BA BAB<br />
Revision 2: AB BAB BAB BABA</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">Revision 1: A = 3 B = 4<br />
Revision 2: A = 5 B = 7</span></span></p></blockquote>
<p><span style="color: #ff0000;"><span style="color: #000000;">So the first version gets credit for:<br />
A = 3 B = 4 total: 7 characters or 7/12th of the final version<br />
</span></span></p>
<p><span style="color: #ff0000;"><span style="color: #000000;">Revision 2 gets credit for:<br />
A = 2 B = 3 total: 5 characters or 5/12ths of the final version<br />
</span></span></p>
<h3>More accurate methods</h3>
<p>As shown, this approach gives an extraordinary amount of weight to early contributors&#8211;particularly if they are verbose&#8211;regardless of how much of their actual content makes its way into the final version. Its greatest strength is that it handles cases where text is moved from one position to another.  It gets this strength at the expense of giving &#8220;false credit&#8221; to people who contribute in earlier revisions.</p>
<h4>Levenshtein Distance</h4>
<p>Another possibility would be to use the Levenshtein Distance.  This basically counts the number of changes necessary to convert one string into another. I did some testing using Levenshtein Distance doing the following:</p>
<blockquote><p>Starting at the oldest revision, calculate the Levenshtein Distance from the final revision.  This number represents how many characters from the revision appear in the final.  Moving to the next oldest revision, calculate the Levenshtein distance, but discount any credit already given to previous revisions.</p></blockquote>
<p>This handles inserts, where words are inserted between existing words, but it doesn&#8217;t handle situations where words, sentences or paragraphs have their positions substituted.  If you have ABC, the revision that changes it to CBA gets credit for the contents of C and A even though all it did was move text around.</p>
<h4>History Flow and the sentenced based method</h4>
<p>To get a more accurate picture you have to use a slightly longer unit than characters.  The two simplest ways would be to use words or sentences instead of individual characters.   IBM has done some analysis using a tool called <a href="http://www.research.ibm.com/visual/projects/history_flow/index.htm">History Flow</a> that uses the sentence as the fundamental unit. As they point out, a revision that adds a comma will get credit for the entire sentence that contains the comma.</p>
<h4>Line based approach</h4>
<p>Jeff Atwood uses the &#8220;<a href="http://www.codinghorror.com/blog/2009/02/mixing-oil-and-water-authorship-in-a-wiki-world.html">line</a>&#8221; as his fundamental unit. This is a very reasonable approach if you are working with code or something where you are likely to have a lot of new lines.  However, for long paragraphs it is a bit problematic.  Either they get treated as a single line and you have the same issue with the comma as IBM&#8217;s method applied to an entire paragraph or you break the paragraph into lines at specific points and adding in content can reorder the entire paragraph making it all appear new.</p>
<h4>Word based approach</h4>
<p>A good balance might be to use individual words as the fundamental unit being compared.  This drastically reduces the &#8220;false credit&#8221; problem associated with character based matching while minimizing the &#8220;comma problem&#8221;. There is still going to be a bit of &#8220;false credit&#8221; especially for common words. If someone writes 500 words to start an article, all of their original text is deleted and new text added, they are going to get credit for a number of words like &#8220;the&#8221;, &#8220;and&#8221;, etc.  If what they wrote was one topic, they will get credit for even more because they are likely to have used a lot of keywords that will be in the final revision.  Simply pasting in the dictionary a few times would give them significant credit for text that will not appear in the final revision. Still it represents a very reasonable approach, particularly if people aren&#8217;t trying to game the system.</p>
<h3>Spam and methods</h3>
<p>In this type of analysis of a Wikipedia or a different MediaWiki, one crucial thing to consider is spam. Character based and word based analysis is going to be heavily skewed by spam entries&#8211;even if they are immediately reverted.  Sentence based approaches are probably going to be more accurate if revisions contain spam while word based methods are likely to be more accurate in closed systems where spam isn&#8217;t an issue.</p>
<h4>People Found This When Looking For:</h4><ul><li>wikipedia (1)</li><li>wikipedia algorithm (1)</li></ul><!-- SEO SearchTerms Tagging 2 plugin took 0.456 ms -->]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/356/who-write-wikipedia-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Selenium No Display Specified</title>
		<link>http://blog.markwshead.com/392/selenium-no-display-specified/</link>
		<comments>http://blog.markwshead.com/392/selenium-no-display-specified/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 17:57:45 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[selenium. linux]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/?p=392</guid>
		<description><![CDATA[I was using Selenium to automate some tasks beyond testing and needed to set it up to run with a cron job. A shell script calls the appropriate Maven command, but I kept getting the error: Error: no display specified The fix was to add this to the script before calling Maven: export DISPLAY=:0 Evidently [...]]]></description>
			<content:encoded><![CDATA[<p><a class="post_image_link" href="http://blog.markwshead.com/392/selenium-no-display-specified/" title="Permanent link to Selenium No Display Specified"><img class="post_image alignright remove_bottom_margin" src="http://blog.markwshead.com/wp-content/uploads/2010/06/Selenium-Logo.png" width="200" height="181" alt="Post image for Selenium No Display Specified" /></a>
</p><p>I was using Selenium to automate some tasks beyond testing and needed to set it up to run with a cron job.  A shell script calls the appropriate Maven command, but I kept getting the error:<br />
<code>Error: no display specified</code></p>
<p>The fix was to add this to the script before calling Maven:<br />
<code>export DISPLAY=:0</code></p>
<p>Evidently when Selenium is started from cron, it doesn&#8217;t know what display to use.  This code tells it to use display 0 and it runs normally.</p>
<h4>People Found This When Looking For:</h4><ul><li>Error: no display specified (17)</li><li>selenium Error: no display specified (13)</li><li>selenium no display specified (5)</li><li>selenium display (3)</li><li>error no display specified selenium (3)</li><li>selenium cron jobs (2)</li><li>no display specified (2)</li><li>selenium DISPLAY Error: no display specified (1)</li><li>selenium display=:1 (1)</li><li>selenium error (1)</li></ul><!-- SEO SearchTerms Tagging 2 plugin took 0.993 ms -->]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/392/selenium-no-display-specified/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Changing User Agent in Rome</title>
		<link>http://blog.markwshead.com/104/changing-user-agent-in-rome/</link>
		<comments>http://blog.markwshead.com/104/changing-user-agent-in-rome/#comments</comments>
		<pubDate>Wed, 19 Oct 2005 16:06:28 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/10/19/changing-user-agent-in-rome.html</guid>
		<description><![CDATA[Changing the user agent in Rome FeedFetcher doesn't quite work like you'd expect.  Here is how to get it to work so you can change it so you aren't stuck with "Java/1.5.0_04".  Some sites actually block "Java/1.5.0_04" so this is something worth changing if you want to pull in RSS feeds.]]></description>
			<content:encoded><![CDATA[<p></p><p>If you are trying to use Rome and Rome Feed Fetcher, you may want to change the user agent.  However, the following will not change the default user agent:</p>
<pre><code>
FeedFetcher feedFetcher = new HttpURLFeedFetcher();
feedFetcher.setUserAgent("User Agent 007");
SyndFeed feed = null;
feedURL = new URL(rssUrl);
feed = feedFetcher.retrieveFeed(feedURL);
List entries = feed.getEntries();
</code></pre>
<p>To change the user agent you must use the InfoCache as shown:</p>
<pre><code>
FeedFetcherCache feedInfoCache = HashMapFeedInfoCache.getInstance();
FeedFetcher feedFetcher = new HttpURLFeedFetcher(feedInfoCache);
feedFetcher.setUserAgent("User Agent 007");
SyndFeed feed = null;
feedURL = new URL(rssUrl);
feed = feedFetcher.retrieveFeed(feedURL);
List entries = feed.getEntries();
</code></pre>
<p>Otherwise the User agent is set to &#8220;Java/1.5.0_04&#8243;.  This is odd because the default client for Rome is &#8220;Rome Client (http://tinyurl.com/64t5n) Ver: 0.7&#8243;.  It seems like an attempt to change the user agent without having a HashMapFeedInfoCache will result in changing the user agent, but somehow it reverts to &#8220;Java/1.5.0_04&#8243; instead of whatever you set it to.</p>
<h4>People Found This When Looking For:</h4><ul><li>rome client http://tinyurl com/64t5n (1)</li><li>user agent rome (1)</li><li>user agent Rome Client (1)</li></ul><!-- SEO SearchTerms Tagging 2 plugin took 1.032 ms -->]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/104/changing-user-agent-in-rome/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inventing in software</title>
		<link>http://blog.markwshead.com/103/inventing-in-software/</link>
		<comments>http://blog.markwshead.com/103/inventing-in-software/#comments</comments>
		<pubDate>Sun, 09 Oct 2005 17:23:15 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/10/09/inventing-in-software.html</guid>
		<description><![CDATA[To invent, you need a good imagination and a pile of junk. &#8211; Thomas Edison This is what is so facinating about programming. Your &#8220;pile of junk&#8221; consists of digital assest instead of physical matterial, so the raw materials are not limited by the normal laws of supply and demand. In software, you are limited [...]]]></description>
			<content:encoded><![CDATA[<p></p><blockquote><p>To invent, you need a good imagination and a pile of junk.</p>
<p>		&#8211; Thomas Edison
</p></blockquote>
<p>This is what is so facinating about programming.  Your &#8220;pile of junk&#8221; consists of digital assest instead of physical matterial, so the raw materials are not limited by the normal laws of supply and demand.  In software, you are limited only by your imagination.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/103/inventing-in-software/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Storing your Maven Repository in CVS/Subversion</title>
		<link>http://blog.markwshead.com/100/storing-your-maven-repository-in-cvssubversion/</link>
		<comments>http://blog.markwshead.com/100/storing-your-maven-repository-in-cvssubversion/#comments</comments>
		<pubDate>Tue, 20 Sep 2005 01:24:11 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/09/19/storing-your-maven-repository-in-cvssubversion.html</guid>
		<description><![CDATA[Brett Porter has hacked together a tool that will let you use a CVS or Subversion repository as your maven repository. Brett Porter &#8211; Storing your Maven Repository in CVS/Subversion It&#8217;s pretty rough, but is a working prototype that makes Maven 1.1/2.0 downloads a checkout/update, and deploy is an add/commit. I see this would be [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Brett Porter has hacked together a tool that will let you use a CVS or Subversion repository as your maven repository.</p>
<blockquote><p><a href="http://blogs.codehaus.org/people/brett/archives/001066_storing_your_maven_repository_in_cvssubversion.html">Brett Porter &#8211; Storing your Maven Repository in CVS/Subversion</a><br />
It&#8217;s pretty rough, but is a working prototype that makes Maven 1.1/2.0 downloads a checkout/update, and deploy is an add/commit. I see this would be useful for snapshot repositories, where you could use one filename instead of transforming the version, so getting the latest would literally be an svn update.</p></blockquote>
<p>If you are using Subversion with Apache, it is pretty easy to achieve most of this.  The problem that I&#8217;m faced with is the fact that Maven can&#8217;t handle repositories that use SSL and a login.</p>
<p>Currently, I&#8217;m using a separate server to host our Maven repository because the Subversion server is using SSL.  I hope that Maven will eventually come up with a way to work around this, but right now it looks like most of their efforts are being spent on Maven 2.</p>
<h4>People Found This When Looking For:</h4><ul><li>maven subversion repository (3)</li><li>maven repository in svn (2)</li><li>maven svn (2)</li><li>using a subversion repo as a maven repo (1)</li><li>use subversion as maven repository (1)</li><li>svn post commit maven (1)</li><li>subversion maven repository (1)</li><li>storing maven repository in svn (1)</li><li>maven using svn as maven repository (1)</li><li>check out from CVS using maven version 1 1 (1)</li></ul><!-- SEO SearchTerms Tagging 2 plugin took 0.949 ms -->]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/100/storing-your-maven-repository-in-cvssubversion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ignoring Build Problems</title>
		<link>http://blog.markwshead.com/99/ignoring-build-problems/</link>
		<comments>http://blog.markwshead.com/99/ignoring-build-problems/#comments</comments>
		<pubDate>Thu, 15 Sep 2005 21:56:59 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/09/15/ignoring-build-problems.html</guid>
		<description><![CDATA[I ran across this blog post that is probably typical of many people who are managing software projects. Musings of a Software Development Manager » Blog Archive » CruiseControl Warnings I get about 48 email messages from Cruisecontrol each day for one of our projects. This is not something I’m proud of since this situation [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>I ran across this blog post that is probably typical of many people who are managing software projects.</p>
<blockquote><p><a href="http://edgibbs.com/2005/08/17/cruisecontrol-warnings/">Musings of a Software Development Manager » Blog Archive » CruiseControl Warnings</a><br />
I get about 48 email messages from Cruisecontrol each day for one of our projects. This is not something I’m proud of since this situation has existed for at least 4 weeks now, we’ve had a broken build. The problem stems from some nasty functional tests that no one wants to investigate and we’ve sort of let our process slip.</p></blockquote>
<p>There is a simple solution to this.  Turn off the tests that are failing.  People&#8217;s first reaction to this is &#8220;Oh no, we can&#8217;t turn off the tests!   They indicate that something is wrong.  Eventually we&#8217;ll have time to fix it.&#8221;</p>
<p>If you are actually going to fix it go ahead, but if something has been broken for more than a week, chances are no one is going to fix it any time soon.  You should turn it off so it starts building without errors again.</p>
<p>Why is this better?  If your team gets 10 emails each day saying that something is broken, they are going to ignore it.  No one is really responsible for all of the problems, so no individual really works on fixing it.  However, if the build is working correctly and someone checks in code that breaks a unit test and everyone gets and email, that person is probably going to try to fix it because it shows that he is responsible for the problem.</p>
<p>Think of it another way.  Lets say I have 3 smoke alarms, 1 gas alarm, 1 CO2 alarm, and a flooded basement alarm in my house and they all sound pretty much the same.  Now lets say that the flooded basement alarm goes off and I decide that it isn&#8217;t important enough to fix the cause&#8230;. So I just let the alarm go off.  How likely do you think I am to notice if another alarm goes off once I get used to ignoring the first alarm.</p>
<p>If I&#8217;m not going to fix the problem, the best thing I can do is disable the flooded basement alarm until I have a chance to fix it.  After a week of ignoring the alarm and nothing bad happening, I&#8217;m not suddenly notice it and decide I should do something about it.</p>
<p>One of the first things I did when I started at my current job, is go through and renamed every test that failed our automatic build process as &#8220;pending&#8221;.  By the time the test would run, I had disabled about 2/3 of the tests.  Since they were failing we ignored them anyway, so marking them as pending didn&#8217;t change anything.  Before they were turned off, it would have been impossible to notice if one of the tests that were previously working broke because of a change.</p>
<p>Over time we&#8217;ve turned most of the pending tests back on one at a time as we&#8217;ve had more time to fix the code or fix the test.</p>
<p>When your tests fail, it should be unusual.  I setup our builds to break if any test fails.  I&#8217;ve got a lava lamp above my cubicle and everyone in the company know what it means.  If something breaks people start asking the developers about it until it gets fixed.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/99/ignoring-build-problems/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Why Subversion Rocks</title>
		<link>http://blog.markwshead.com/96/why-subversion-rocks/</link>
		<comments>http://blog.markwshead.com/96/why-subversion-rocks/#comments</comments>
		<pubDate>Mon, 25 Jul 2005 02:53:02 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/07/24/why-subversion-rocks.html</guid>
		<description><![CDATA[This guy says that using Subversion and Cruisecontrol cut their costs by 92%. It would be interesting to see how he calculated this. Regardless, it is amazing how many quality tools are available for free now. Why Subversion Rocks In a recent study I performed on my development groups process improvement over the past 5 [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>This guy says that using Subversion and Cruisecontrol cut their costs by 92%.  It would be interesting to see how he calculated this.  Regardless, it is amazing how many quality tools are available for free now.</p>
<blockquote><p><a href="http://www.bieberlabs.com/wordpress/archives/2005/07/24/zdot-podcast-why-subversion-rocks">Why Subversion Rocks</a><br />
In a recent study I performed on my development groups process improvement over the past 5 years, we found that we had cut the cost of managing our build and release process by approximately 92% by incorporating Subversion and related tools like CruiseControl, ViewCVS, and other custom software (and major process changes that accompany them) to integrate and automate our release management processes. </p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/96/why-subversion-rocks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dangers of Design Patterns</title>
		<link>http://blog.markwshead.com/95/dangers-of-design-patterns/</link>
		<comments>http://blog.markwshead.com/95/dangers-of-design-patterns/#comments</comments>
		<pubDate>Sun, 24 Jul 2005 03:13:40 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/07/23/dangers-of-design-patterns.html</guid>
		<description><![CDATA[Here is an interesting discussion about the dangers of design patterns. Design patterns can be a great way to solve problems, but many times they become an excuse to over-architect solutions. Design patterns should reduce complexity. If they introduce more complexity, you are doing something wrong. The myth of design patterns is that they will [...]]]></description>
			<content:encoded><![CDATA[<p></p><p><a href="http://www.parand.com/say/index.php/2005/07/18/i-hate-patterns/">Here</a> is an interesting discussion about the dangers of design patterns.  Design patterns can be a great way to solve problems, but many times they become an excuse to over-architect solutions. Design patterns should reduce complexity.  If they introduce more complexity, you are doing something wrong.</p>
<p>The myth of design patterns is that they will make it easy to solve problems that you haven&#8217;t through  of yet.  This sounds great in  theory, but it is easy to end up &#8220;gold plating&#8221; the software by building in flexibility that will never be used.</p>
<p>The real power of design patterns is in solving existing problems.  A lot of the times this means they will be introduced when you are re-factoring code.  When the code starts becoming complex and a design pattern makes things simpler, then it might be a good time to use one.</p>
<p>I&#8217;ve inherited a project that makes extensive use of design patterns.  In some places the patterns make the code simpler.  For example the chain of command pattern is used in several places, so once you understand how the authentication piece works, you understand the principle behind several other key components.</p>
<p>Another pattern that shows up a lot is the factory pattern.  While this is a great way to decouple code, it is only beneficial if you gain some type of benefit by having the code loosely coupled.  I ran into an piece of code the other day where a factory created an object and I needed to know what it was.  I had to look through 5 other java classes and two configuration files to figure it out.  In this case, using a pattern made things much more complicated and there weren&#8217;t any significant benefits.</p>
<p>Patterns need to be used to solve real problems.  If you try to use them to solve every potential future problem, they become a liability.</p>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://technorati.com/tag/programming" rel="tag">programming</a>, <a href="http://technorati.com/tag/software" rel="tag">software</a></p>
<p><!-- technorati tags end --></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/95/dangers-of-design-patterns/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nightmare Programming Project</title>
		<link>http://blog.markwshead.com/93/nightmare-programming-project/</link>
		<comments>http://blog.markwshead.com/93/nightmare-programming-project/#comments</comments>
		<pubDate>Tue, 19 Jul 2005 06:47:39 +0000</pubDate>
		<dc:creator>Mark Shead</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://blog.markwshead.com/archives/2005/07/19/nightmare-programming-project.html</guid>
		<description><![CDATA[Here is a scary scenario for a programmer. Unfortunately it is probably not too uncommon. Imagine you are starting a new job. Before accepting the position you ask all the right questions, but on your first day you discover the following: The lead programer left two months ago. (You already knew this.) The only form [...]]]></description>
			<content:encoded><![CDATA[<p></p><p>Here is a scary scenario for a programmer.  Unfortunately it is probably not too uncommon.  Imagine you are starting a new job.  Before accepting the position you ask all the right questions, but on your first day you discover the following:</p>
<ol>
<li>The lead programer left two months ago. (You already knew this.)  The only form of documentation is a few comments and javadocs. (You didn&#8217;t know this.)</li>
<li>While the project has it&#8217;s own CVS server, the backups were done without shutting down CVS and they had never been tested.  The server crashed and all the backups were corrupt.  There is a copy of the source code that someone had on their local hard drive, but no history information and there isn&#8217;t code of the last stable release.</li>
<li>The 30% of the unit tests fail.  Another 30% have errors.  Some of the tests are just wrong, others rely on having resources configured in a specific way.  There are also some tests that aren&#8217;t actually testing code, but instead do things that populate the databases, etc.</li>
<li>The core framework used in the project is based on a 6 year old proprietary binary library that doesn&#8217;t have any documentation.</li>
<li>The bug tracking data was lost in the server crash.  Since you don&#8217;t have the source code from a stable release, there are a bunch of little bugs, but none of them are documented.</li>
</ol>
<p>Sound like a nightmare?  Definitely.  But it is also an opportunity.  Here are the steps I would take under the above circumstances.  I&#8217;d enjoy hearing what others have to say as well.</p>
<ol>
<li><strong>Implement Version Control.</strong>  Personally I prefer Subversion, but regardless of what you use, getting it up and running should be your first priority.  And just having it functional isn&#8217;t adequate.  It needs to be running, automatically backing up, and you must test doing a restore from the backup data.  If you haven&#8217;t tested your backup you are begging disaster to strike. (more on <a href="http://blog.markwshead.com/archives/2005/07/19/thinking-about-version-control.html">version control</a>)</li>
<li><strong>Setup an Issue Tracker.</strong>  You need a place to keep track of the problems you find.  You don&#8217;t want to remember a nasty bug when a customer calls to tell you about it.  The issue tracker should be used for more than just bugs.  It should be your central repository for everything the developers are working on.  That means it should hold features, tasks, ideas for improvements, etc.  Set milestones and assign issues to those milestones.  If you are tracking estimated time for each issue, you&#8217;ll be able to set realistic schedules.</li>
<li><strong>Document your setup. </strong> You need to know how to setup the build and run environment from scratch.  I would suggest starting with a clean machine and setting everything up.  If your project builds fine on the old developer&#8217;s machine, but not on any new machines you setup, you have a problem.  You may find yourself spending several days hunting down all the dependencies and configuration settings.</li>
<li><strong>Fix the Unit Tests.</strong>  If tests are failing the developers will ignore them.  All of your active tests have to pass every time you run them.  If this means you go from 400 tests down to 50 that is fine.  Tests that don&#8217;t pass can be prefixed with &#8220;pending&#8221; so they won&#8217;t run and you can re-implement them over time.  If a test fails you need to consider the build to be broken.</li>
<li><strong>Setup Automatic Builds.</strong>  This doesn&#8217;t need to be anything fancy, but if code compiles and tests fine on your machine, but fails on others, you need to know about it right away.  Ideally the code should recompile whenever there are changes in the version control system.  Also you should make the build fail if the tests fail.  When a build fails, you can have it send email, flash the lights, or turn on the overhead sprinklers, but you should not ignore it.</li>
<li><strong>Document the code.</strong>  Digging through someone else&#8217;s code and trying to make sense of it can be extremely painful.  But you really can&#8217;t start writing your own code until  you understand how the old code works.  Even if you want to replace parts of the old code, you will need to understand what the code you plan to replace is actually doing.  Make sure you record your findings, so they can benefit others.  You may get called away to rescue another troubled project and you want to make sure you successors don&#8217;t have to start over.</li>
</ol>
<p><!-- technorati tags start -->
<p style="text-align:right;font-size:10px;">Technorati Tags: <a href="http://technorati.com/tag/programming" rel="tag">programming</a>, <a href="http://technorati.com/tag/software" rel="tag">software</a>, <a href="http://technorati.com/tag/version control" rel="tag">version control</a></p>
<p><!-- technorati tags end --></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.markwshead.com/93/nightmare-programming-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
