<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Hadoop Analysis of Apache Logs Using Flume-NG, Hive and Pig</title>
	<atom:link href="http://cuddletech.com/blog/?feed=rss2&#038;p=795" rel="self" type="application/rss+xml" />
	<link>http://cuddletech.com/blog/?p=795&#038;utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=hadoop-analysis-of-apache-logs-using-flume-ng-hive-and-pig</link>
	<description>The Blog of Ben Rockwood</description>
	<lastBuildDate>Sat, 18 May 2013 03:46:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
	<item>
		<title>By: lilufmj</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28174</link>
		<dc:creator>lilufmj</dc:creator>
		<pubDate>Wed, 16 Jan 2013 16:12:34 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28174</guid>
		<description>&#124;
&#124;
&#124;</description>
		<content:encoded><![CDATA[<p>|<br />
|<br />
|</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lilulzm</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28173</link>
		<dc:creator>lilulzm</dc:creator>
		<pubDate>Wed, 16 Jan 2013 14:57:52 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28173</guid>
		<description>&#124;
&#124;
&#124;</description>
		<content:encoded><![CDATA[<p>|<br />
|<br />
|</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kristoffer</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28113</link>
		<dc:creator>Kristoffer</dc:creator>
		<pubDate>Thu, 10 Jan 2013 16:28:52 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28113</guid>
		<description>I would recommend taking a look at ELSA aswell,: 
http://enterprise-log-search-and-archive.googlecode.com/
Example for using ELSA to parse bro ids log files: http://blog.bro-ids.org/2012/01/monster-logs.html</description>
		<content:encoded><![CDATA[<p>I would recommend taking a look at ELSA aswell,:<br />
<a href="http://enterprise-log-search-and-archive.googlecode.com/" rel="nofollow">http://enterprise-log-search-and-archive.googlecode.com/</a><br />
Example for using ELSA to parse bro ids log files: <a href="http://blog.bro-ids.org/2012/01/monster-logs.html" rel="nofollow">http://blog.bro-ids.org/2012/01/monster-logs.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: benr</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28057</link>
		<dc:creator>benr</dc:creator>
		<pubDate>Sat, 29 Dec 2012 05:13:08 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28057</guid>
		<description>There is a lot of good info out there, such as this Cloudera post (http://blog.cloudera.com/blog/2009/02/the-small-files-problem/).  I&#039;m not making a judgement call here as to whether lots of files are good or bad, but by default Flume creates a lot and therefore something to think about.  

As for the realistic nature of Logstash/etc versus Hadoop... in the real world, for most shops, you likely should be sending data into an ElasticSearch like solution (Splunk, Graylog2, Logstash, etc) for a period of 2-4 weeks and then archiving the logs out to Hadoop for data warehousing.</description>
		<content:encoded><![CDATA[<p>There is a lot of good info out there, such as this Cloudera post (<a href="http://blog.cloudera.com/blog/2009/02/the-small-files-problem/" rel="nofollow">http://blog.cloudera.com/blog/2009/02/the-small-files-problem/</a>).  I&#8217;m not making a judgement call here as to whether lots of files are good or bad, but by default Flume creates a lot and therefore something to think about.  </p>
<p>As for the realistic nature of Logstash/etc versus Hadoop&#8230; in the real world, for most shops, you likely should be sending data into an ElasticSearch like solution (Splunk, Graylog2, Logstash, etc) for a period of 2-4 weeks and then archiving the logs out to Hadoop for data warehousing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: benr</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28056</link>
		<dc:creator>benr</dc:creator>
		<pubDate>Sat, 29 Dec 2012 05:05:28 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28056</guid>
		<description>I&#039;d add to that SplunkStorm and SumoLogic, which I think are the leading options in the space.  I really should do a blog post on SaaS Logging solutions.

Remember that Splunk is free for less than 500MB per day, which is enough for many small shops.  And Splunk is an incredibly kool company and very supportive of SA&#039;s, so despite the big bucks they ask (and get) for their software, I have nothin&#039; but love for Splunk.</description>
		<content:encoded><![CDATA[<p>I&#8217;d add to that SplunkStorm and SumoLogic, which I think are the leading options in the space.  I really should do a blog post on SaaS Logging solutions.</p>
<p>Remember that Splunk is free for less than 500MB per day, which is enough for many small shops.  And Splunk is an incredibly kool company and very supportive of SA&#8217;s, so despite the big bucks they ask (and get) for their software, I have nothin&#8217; but love for Splunk.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bryan w. berry</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28046</link>
		<dc:creator>bryan w. berry</dc:creator>
		<pubDate>Fri, 28 Dec 2012 06:42:04 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28046</guid>
		<description>thanks for the great article benr

&quot; you end up with a LOT of HDFS files, which may or may not be what you want. Setting any value to 0 disables that type of rolling.&quot;

If I understand correctly the default block size for HDFS is 64 MB. Lots of small files would cause incredible waste. When would you want lots of small files?

re: logstash vs. flume/hadoop/hive/pig, I could see the hadoop solution being useful over logstash if you have non-sysadmins who are familliar w/ pig/hive and want to use those skills to interact w/ the webserver logs.</description>
		<content:encoded><![CDATA[<p>thanks for the great article benr</p>
<p>&#8221; you end up with a LOT of HDFS files, which may or may not be what you want. Setting any value to 0 disables that type of rolling.&#8221;</p>
<p>If I understand correctly the default block size for HDFS is 64 MB. Lots of small files would cause incredible waste. When would you want lots of small files?</p>
<p>re: logstash vs. flume/hadoop/hive/pig, I could see the hadoop solution being useful over logstash if you have non-sysadmins who are familliar w/ pig/hive and want to use those skills to interact w/ the webserver logs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: gazarsgo</title>
		<link>http://cuddletech.com/blog/?p=795#comment-28044</link>
		<dc:creator>gazarsgo</dc:creator>
		<pubDate>Fri, 28 Dec 2012 04:27:30 +0000</pubDate>
		<guid isPermaLink="false">http://cuddletech.com/blog/?p=795#comment-28044</guid>
		<description>both papertrailapp.com and loggly.com are agent-less alternatives to Splunk -- I assume I can&#039;t afford Splunk since they don&#039;t advertise their pricing.</description>
		<content:encoded><![CDATA[<p>both papertrailapp.com and loggly.com are agent-less alternatives to Splunk &#8212; I assume I can&#8217;t afford Splunk since they don&#8217;t advertise their pricing.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
